Here we discuss the Azure Synapse Analytics service and its features. We also create a Synapse workspace to help us get started with the service.
In today’s business landscape, enterprises need to leverage the power of data for analytics and insights. Data can exhibit market trends and patterns, aiding a new era of efficiency and innovation in business. Data-savvy businesses gain a competitive edge over other players in the market.
Big Data and its challenges — such as volume, variety, velocity, and veracity — mean it is no longer possible to tackle this data unaided. Software providers have proposed solutions to ingest, prepare, manage, and analyze this enormous amount of data. But, these independently-designed systems lack coordination between one another.
For these systems to be effective, analytics must work irrespective of scale and data type (unstructured, semi-structured, or structured). Data pipelines must stitch together data warehousing technologies with data across relational databases and data lakes.
It is also challenging to query large-scale data after the fact. The situation becomes even more complicated when we want to immediately update our set of analyzed data as new data comes in.
Streaming analytics help solve this challenge. Microsoft introduced Azure Synapse Analytics to help organizations make the most of their Big Data. Azure Synapse Analytics is an integrated analytics service that combines the power of data lakes with enterprise data warehousing and Big Data analytics. It provides a unified experience to ingest, prepare, manage, and serve data for immediate and enhanced machine learning and business intelligence capabilities.
This series of articles will discuss Azure Synapse Analytics and demonstrate how to implement a streaming analytics pipeline. We will start with a comprehensive and straightforward introduction. Then, we will review step-by-step instructions for building an end-to-end streaming analytics solution with the help of Spark streaming in Azure Synapse Analytics. To demonstrate what Azure Synapse Analytics can do, we will analyze New York taxi data, including trip duration and cost.
First, let’s explore Azure Synapse Analytics and its components and learn how to set it up.
What is Azure Synapse Analytics?
As mentioned earlier, Azure Synapse Analytics is an integrated analytics service used to process, manage, monitor, serve, and secure data in a single place. It gives us the freedom to query data on our terms. It also enables enterprise-wide descriptive, diagnostic, predictive, and prescriptive analytics.
Azure Synapse Analytics is the rebranded Azure SQL Data Warehouse (SQL DW) with improved performance and capabilities. Microsoft designed this analytics service to support the continuously growing DevOps ecosystem.
To better understand this service, let’s briefly discuss some of its main features.
Azure Synapse Studio
Azure Synapse Studio is a web-based set of tools enabling developers to work with all aspects of Azure Synapse Analytics from a single hub. This software as a service (SaaS) solution provides features for debugging, optimization, and continuous integration and continuous deployment (CI/CD) integration. It also helps lifecycle management of the analytics solution, creating workspaces, data ingestion, analysis, and more. We will use Synapse Studio throughout the later part of this series.
Data Exploration
Synapse Studio enables us to work with all aspects of Azure Synapse Analytics, including data exploration. We can easily browse and explore data in SQL and Spark tables and data lakes without knowing the underlying schema.
Data Integration
Azure Synapse Analytics’ data integration service comes with an integrated orchestration engine to load, transform, and create data pipelines within your Azure Synapse workspace. We can use built-in templates from Synapse Studio to integrate data from various sources. For example, we can use Azure-based or cross-cloud, file-based, open-source, NoSQL, or almost any other data provider application or service.
Synapse SQL Pool
An Azure Synapse Analytics SQL pool (formerly SQL DW) provides provisioned and serverless data warehousing features. Import data into Azure Synapse Analytics using a service of our choice, such as Polybase, Data Factory, and more.
Azure Synapse Analytics stores this data in a columnar format and leverages its distributed querying capabilities to enable fast querying and analysis. Moreover, it comes with built-in support for data streaming, artificial intelligence (AI), and machine learning (ML).
We will discuss how to set up and leverage a dedicated SQL pool later in this article series.
Apache Spark for Azure Synapse Analytics
Azure Synapse Analytics also offers Spark runtime for data warehousing tasks like data loading, processing, and extract, transform, and load (ETL). We will not have to provision any additional or separate clusters since they are an integral part of the Azure Synapse Analytics environment.
Azure Synapse Analytics supports multiple languages, including C#, Python, SQL, and Scala. Its spark-based processing supports features like .NET for Apache Spark, Spark ML (MLlib), and Spark Streaming.
Other Azure Synapse Analytics Features
In addition to the features mentioned above, Azure Synapse Analytics supports many other analytics and security features. It covers the entire spectrum of services, processes, and tasks in an end-to-end analytical solution’s lifecycle.
The following image summarizes Microsoft’s various tools and services under the Azure Synapse Analytics umbrella to help us understand all it can do.
Since we now understand some of Azure Synapse Analytics’ components and capabilities, let’s set up the service.
Getting Started with Azure Synapse Analytics
To get started using Azure Synapse Analytics, we first create an Azure Synapse workspace. We can easily do this from the Azure portal.
To begin, we’ll need to have an active Azure account. If you don’t have one, sign up for an Azure account now to enjoy 12 months of free popular services and $200 credit to explore Azure fully for 30 days.
Creating a Synapse Workspace
From the Azure portal, we click Create a resource and search for "Azure Synapse Analytics."
On the Azure Synapse Analytics page, we click Create then start entering our Basics project details. We select the subscription we want to use to create a workspace. We can either choose to create a new resource group (like in the image above) or manage a previously-created resource group. We enter a name for our workspace, select Data Lake Gen2 From Subscription, choose or create a new storage account and file system, then click Next: Security.
On the next page, Security, we enter SQL administrator credentials.
Next, we review our Networking settings.
Make sure to check Allow connections from all IP addresses. This setting is required to connect Azure Synapse Studio or any other client tools to the workspace endpoint. We can restrict and allow or disallow specific IP addresses later once we provision the workspace successfully.
Next, we can optionally create Tags. We then click Review + create to make the workspace.
The workspace deployment may take a few minutes. We can monitor the deployment status in the progress bar at the top — or just get ourselves a nice cup of coffee.
Once the deployment is complete, we open the resource group and click on the workspace we just created. Here, we can see the workspace web URL, primary ADLS Gen2 storage account URL and file system, dedicated and serverless SQL endpoints, and a development endpoint.
We can also change firewall settings from here and allow or disallow any specific IP address that we want. Also, note that we can choose to create a dedicated SQL or Apache Spark pool from here.
We can also create pools from Synapse Studio. There’s no need to be overwhelmed wondering what an SQL or Spark pool is, why we need one, or how to make one. We will cover that in detail in the following article!
Next Steps
We have discussed the Azure Synapse Analytics service and its features. We also created a Synapse workspace to help us get started with the service.
We have not yet discussed any use cases for Azure Synapse Analytics. We will discuss how we can use Azure Synapse Analytics for a full-fledged streaming analytics solution in the following article. Next, we will explore how to create dedicated SQL pools, and then we’ll create data streams and explore some New York taxi data.
Continue to the next article in this series or register to view the Hands-on Training Series for Azure Synapse Analytics.