Azure Data Explorer is a high-performance analysis service offered by Microsoft, designed to quickly explore and analyze large volumes of data. Used primarily for applications that require the management and processing of data in real time, this tool allows companies to obtain deep insights through advanced queries and visualizations. In this article we are going to better see what Azure Data Explorer is, we will better understand how it works also through a practical example and we will take a look at the factors that influence its cost.
Companies generate and store huge amounts of data, every day.
This data can be unstructured (such as audio, video), semi-structured (such as XML, JSON) or structured (such as numbers, dates, strings) and the professionals who work there are relentlessly looking for effective techniques to manage massive volumes of all these types of information.
While we can certainly do it with traditional data warehouses, analysis tools like Hadoop, Spark, etc., this would involve the conventional approach of ETL on terabytes and petabytes of data before being able to explore and analyze it.
In many cases, a platform is needed that allows users to quickly use and analyze different raw data with rapid ingestion and optimal performance, and Microsoft could have an interesting proposition for all data professionals who find themselves in this type of situation.
Azure Data Explorer, also known as ADX, is a fast, highly scalable, and fully managed data analysis service for log, telemetry, and streaming data.
This data exploration service allows you to collect, store and query terabytes of data in a few seconds, giving the possibility to carry out quick ad hoc queries on heterogeneous data. It allows you to combine huge amounts of data from various sources and perform analysis on them, even in real time with streaming data and, like many Azure services, it is fast and can scale automatically as needed.
Let's take a closer look at it in the next sections.
Azure Data Explorer is a distributed database that runs on two clusters of computing nodes in Microsoft Azure (Engine cluster and Data Management cluster) that operate together. It is based on relational database management systems (RDBMS), supporting entities such as databases, tables, functions, and columns. It supports complex analytical query operators, such as calculated columns, row search and filtering, aggregations with group by and join.
At its base, ADX works like a micro-batch engine, where the data ingested (i.e. loaded the first time) that can come from various sources such as Azure Blob Storage, Azure Event Hub, Azure IoT Hub and SQL database, in different formats such as JSON, CSV, Parquet and Avro) are grouped in batches, stored in files known as “extents” (horizontal fragments of data that represent the fundamental unit of storage in ADX) and indexed.
ADX offers great horizontal scalability, managing petabytes of data and distributing the load across multiple nodes, while the use of indexes, caches and automatic optimization techniques guarantees fast response times even for complex queries.
As with almost all Azure services, you can do your work directly in the Azure Portal, from the command line through REST API or SDK for different platforms such as .NET, Java, Node, GO, PowerShell and others.
The platform has multiple components that work in harmony to offer an efficient data exploration service, which are respectively:
We have created the Infrastructure & Security team, focused on the Azure cloud, to better respond to the needs of our customers who involve us in technical and strategic decisions. In addition to configuring and managing the tenant, we also take care of:
With Dev4Side, you have a reliable partner that supports you across the entire Microsoft application ecosystem.
In general, when working with Azure Data Explorer, you follow these steps:
To better understand how Azure Data Explorer works, let's look at these basic steps one by one with a practical example.
In the Azure portal, after logging in with our credentials, we click on 'Create a resource' and search for Azure Data Explorer.
Click on the 'Create' button and on the page that appears and we provide the basic details. We will have to select the subscription, the resource group, provide a name for the cluster, select the region and then choose the calculation specifications from the resources available for the cluster.
There are 2 different calculation ranges to choose from. They are:
After providing the basic information, we click on the Next button to provide information on scaling the cluster.
You can choose between manual scaling or selecting the optimized autoscaling. Optimized autoscaling is the recommended method because resources will be automatically scaled based on the workload and predefined rules.
Let's click on the Next button to configure the settings.
Here we can enable the Azure Data Explorer cluster capabilities to set up streaming ingestion and cleaning. By default, they are off.
If you want to use these ADX capabilities, you can enable them from this tab, but they must be configured correctly after the cluster has been deployed.
The next step is to configure the security settings for the cluster and it is possible to choose to have a managed identity assigned to the system and the cluster will be registered with Azure Active Directory, allowing you to control its access to other Azure resources.
Disk encryption can be configured once the cluster deployment is complete.
The next step is to configure the network requirements. If you want the cluster to be connected to your virtual network, you can configure the details on the Networking tab.
Once the settings have been completed, you can provide the details for tagging (a good practice that should always be adopted) and finally click on Create.
After the deployment is complete, you will see the message stating that the distribution has finished. From here we can go to the page of the new ADX cluster created.
Now that the cluster is ready, it's time to create a database for the cluster. From the ADX cluster overview page, click on the 'Create Database' button.
Here we'll need to provide the name of the database, the retention period (in days), and the cache period. The retention period determines the number of days for which you want the data to be kept in the cluster, while the cache period determines how many days you want to keep the data in the hot cache for high performance queries.
These options can later be changed by going to the database settings screen.
Now that the steps for creating the ADX cluster and database are complete, the next step is to ingest the data.
At this point, it is necessary to describe the basics of data ingestion a little more thoroughly. There are two types of ingestion based on the nature of the data:
All data ingested in tables is partitioned into horizontal sections or shards. Each shard contains a few million records. The records in each shard are encoded and indexed. The shards are distributed across the different nodes of the cluster for efficient processing.
In addition, the data is cached on local SSDs and in memory, based on the caching policy selected when the cluster was created. The query engine performs highly distributed and parallel queries on the dataset for best performance.
With the explanatory parenthesis closed, there are two options for ingesting the data. In this case, we choose the quick option for ingesting data and click on the 'Ingest New Data' option.
On the next screen, we need to give a name to the table where the data can be inserted into the selected database. Here we are going to create a new table. Alternatively, you can also select an existing table for loading data.
Next, we'll need to select the type of data source from a drop-down list. In this case, we select the source as a file and then select the file after exploring the folder structure. In this case, we can attach up to 10 files. Here we attach a file called FILE_ESEMPIO.csv from the local disk.
Finally, press the 'Edit Schema' button.
On the next screen, the table schema is automatically created from the provided data file. This mapping is officially referred to as Schema Mapping. Column names and data types are created based on the columns in the file and their values. You can create, modify, or delete any column on this screen.
Even the data type of a column can be changed to suit your needs. This mapping between the file data and the table structure is given a name. At the end, we press the 'Start Ingestion' button.
Once the data ingestion is complete, the data is ready for previewing and querying.
The steps and details of ingestion are available on the process conclusion page for reference. There are also some quick options for queries, and in this case we're going to select the 'Number of rows' query option.
In the Query tab, the query and the result related to the number of rows in the lsdata table created and populated in the previous steps are shown.
A Kusto query is a read-only request to process data and return results. This query consists of a sequence of query statements, separated by a semicolon. In every query, there must be at least one tabular expression statement that returns data in table-column format. KQL allows you to query data and use control commands.
A query is a read-only request to process data. It returns the results of the processing, but does not modify the data or metadata. Control commands are required from Kusto to process and potentially modify data or metadata.
After writing the query, we press the 'Run' button. The query output will be generated in the lower window. In addition to the graphic output, the tabular output will also be generated. The query performance details are available along with the query result. You can also save and download query output data in files.
The Azure Data Explorer pricing model is based on a pay-as-you-go approach, where customers are billed based on their use of the service. The price is determined by factors such as the amount of data ingested, the amount of data stored, and the number of queries executed.
Customers can choose from different price ranges that offer varying levels of performance and functionality, and the service also offers reserved capacity options (Reserved Instances), allowing customers to reserve resources for a fixed period of time at a discounted price.
Azure Data Explorer also offers a free tier (Free Cluster) that allows you to explore the basic functionality of the platform without initial costs. This level is limited both in terms of computing capacity (for example, a small number of nodes or storage capacity) and in terms of data volume (for example, a limit on the number of GB you can ingest and keep). This is useful for tests and pilot projects. It does not require an Azure subscription or credit card and allows you to try the service for a year (possibly renewable).
The official page of the service on the Azure website includes a cost calculator (available hither) which makes it possible to estimate the prices of the service based on factors such as the type of workload, the region, the currency used and the time of use (hours or months).
To better understand the main factors that influence the pricing of Azure Data Explorer, let's take a closer look at them one by one.
The first thing we are going to examine is the use that is made and the chosen configuration of the ADX clusters.
The factors that influence the cost are:
The second factor that influences the overall cost of the service is that of data ingestion, in particular:
Finally, we have data retention and in this particular case the factors that will affect the overall cost of the service are:
For all companies, having the opportunity to make the most of the amount of information generated every day is not a marginal advantage, but a necessity to reckon with if you want your business to remain healthy and competitive.
Solutions such as Azure Data Explorer serve precisely this purpose and with its ability to analyze impressive amounts of data in real time it can give organizations of all types and sizes the opportunity to make quick decisions based on key insights and thus outline strategies adapted to their real needs.
The use of Kusto Querying Language then allows a more elegant approach to querying processes, giving even less experienced users the opportunity to work with the functionality of the platform without complicating their lives.
So all we have to do is invite you to try it out using the free tier made available by Microsoft and let the platform itself convince you if it's the right solution for your needs.
Azure Data Explorer, abbreviated as ADX, is a service managed by Microsoft that allows you to quickly analyze large amounts of data. It is particularly suitable for scenarios where it is necessary to explore and obtain insights from heterogeneous data such as logs, telemetry or real-time flows, using interactive queries and advanced visualization tools.
This service is used in contexts where speed of analysis is crucial, for example in the monitoring of IT infrastructures, in the management of IoT devices, in the collection of continuous telemetry or in research on unstructured data. Its ability to work in real time also makes it suitable for troubleshooting or user behavioral analysis.
Azure Data Explorer is able to process data of various kinds. It supports structured data such as tables with well-defined columns and types, semi-structured data as JSON or XML files, and it can also accept originally unstructured data, such as video or audio, as long as it is converted into metadata that can be used for analytical purposes.
The system allows you to upload data through batch or streaming mode. Batch data is loaded in batches, with greater ingestion capacity, while streaming data is processed almost in real time. All data is divided into fragments, indexed and distributed efficiently across cluster nodes to ensure high performance in the subsequent interrogation phases.
The queries are written using the KQL language, or Kusto Query Language. This language, designed specifically for data exploration, allows powerful and sophisticated queries to be performed in a clear and readable way, even by users with non-advanced technical skills.
The results of the queries can be viewed directly in the Azure portal through interactive dashboards. It is also possible to connect to external tools such as Power BI and Grafana to create customized visualizations, or export the data in tabular format for further processing.
The Azure Data Explorer pricing model is based on actual use of the service. The cost varies depending on the number and type of nodes used in the cluster, the amount of data loaded, the time for which it is kept and the type of memory used. Microsoft also provides an online calculator to estimate costs based on workload and region of use.
Yes, there is a free tier called Free Cluster that allows you to experience basic functionality without upfront costs. This level is designed for testing, learning or small pilot projects and can also be used without a credit card.
You can access this service directly from the Azure portal, but you also have the option of interacting with it through a REST API or using one of the many SDKs available for different languages and platforms, such as .NET, Java, Node.js, PowerShell and others.
The answer is yes. The portal interface, the clarity of the KQL syntax and the ability to quickly create sample queries make this platform accessible even to those who are not an expert in data analysis. The learning curve is gradual and well supported by official documentation.
The Infra & Security team focuses on the management and evolution of our customers' Microsoft Azure tenants. Besides configuring and managing these tenants, the team is responsible for creating application deployments through DevOps pipelines. It also monitors and manages all security aspects of the tenants and supports Security Operations Centers (SOC).