Spark Azure Microsoft

Spark Summit will be kicking off in San Francisco this week and Microsoft will be at the event discussing its growing commitments to Apache Spark, the open source in-memory-oriented Big Data platform. The company has been increasingly embracing the platform and there are expected to be some key announcements made at Spark Summit.

Apache Spark for Azure HDInsight is now generally available, Microsoft announced today, which the company says will make big data more accessible. Cortana Intelligence, Redmond’s big data intelligence tool, is included within Spark for Azure. The release comes a year after Microsoft first announced its cloud version Hadoop big-data framework.

Interestingly, Microsoft is a Hadoop-based distributer, but the company is expanding to accommodate Spark, and the integration of Spark in the Hadoop-based HDInsight is another example of this. To achieve this, Redmond reversed the Windows based distro and based it on the open source Linux platform.

Today, we are pleased to announce that Apache Spark v1.6.1 for Azure HDInsight is generally available. Since we announced the public preview, Spark for HDInsight has gained rapid adoption and is now 50% of all new HDInsight clusters deployed.

R Server for HDInsight will be released later in the summer, leaving public preview to provide Spark integration for in-house operations and in the cloud. Customers can also expect the integration of R distribution into SQL Server 2016 later in the summer. SQL Server 2016 was made generally available last week.

At Spark Summit this week, Microsoft will also discuss its Power Bi support for Spark streaming, which has already been released. The company discussed the use of BI tools within Apache Spark back in March, and the service will now include Spark Streaming.

The company notes in an official post that its integration of Spark is to make big data easier to manage:

Our goal with big data is to make it accessible for everybody. With Spark for HDInsight, we have designed new productivity experiences for the different audiences that use Spark including the data engineer working on ETL jobs, the data scientists who are performing experimentation and the business analysts who are creating dashboards.