Microsoft Fabric just got an upgrade. Real-time intelligence in Fabric makes it easy for anyone, anywhere to unlock actionable data insights. #MSBuild pic.twitter.com/Dn4JZMeK6i
— Microsoft Developer (@msdev) May 21, 2024
Microsoft is working hard to remove friction from all aspects of data analytics with its Fabric end-to-end platform. In that pursuit, at its Build developer conference in Seattle, the company announced the public preview of significant enhancements to Fabric’s real-time data processing and analytics capabilities. These enhancements include the merging of Synapse Real-Time Analytics and Data Activator into the unified Real-Time Intelligence component, connectivity to a range of new streaming data sources, including those on Amazon Web Services and Google Cloud, new real-time dashboards and visual data exploration, and the introduction of the Real-Time hub to make streaming data sources more discoverable and integrate them with Fabric Lakehouses, addressing the disconnect between data-in-motion and data-at-rest.
A Step Further
Fabric’s Synapse Real-Time Analytics workload (module) has already made real-time analytics easier and more integrated with batch analytics, BI, and machine learning. The module’s powerful event stream abstraction, along with Fabric’s Data Activator component for data monitoring and alerting, and its KQL database (which has gradually been rebranded as “Event House”) had already created a compelling solution. However, each of these Fabric components has been somewhat disjointed, requiring some savvy and effort from customers to use them together. Ironically, it’s just this kind of burden on the customer that Fabric aims to eliminate, so the situation has been a bit of an anomaly.
In response, the Fabric team has worked to improve each of these components and integrate them more tightly, requiring less customer effort and expertise to use them together. As a result, the Fabric platform as a whole is now more event-driven, and its real-time capabilities are more accessible to business users and analysts. Microsoft has made improvements in the areas of streaming data ingestion and processing, discoverability, analysis, visualization and no-code exploration, and event-driven triggers. Highlights of each area of improvement follow, and I conclude with some thoughts on what all this means for Microsoft’s competitive position in the data analytics arena.
Streaming Data Ingestion and Processing
Already a powerful abstraction, Fabric event streams can now ingest data from Amazon Kinesis Data Streams, Google Pub/Sub, and even from Kafka topics on the Confluent Cloud platform, in addition to the Azure Event Hubs and IoT Hub connectivity they already had. Event streams can now also ingest from a range of Microsoft’s change data capture (CDC) sources, including Azure SQL Database (the cloud implementation of SQL Server), Azure Cosmos DB, and the Azure implementations of the open-source MySQL and PostgreSQL databases. Finally, event streams can receive event data from Azure Blob Storage (as well as Azure Data Lake Storage) and even from Fabric itself. While this last capability may seem somewhat niche, it’s actually quite significant. Responding to Fabric events means that the entire Fabric platform can become event-driven, enabling scenarios like ingesting data into the Lakehouse when a new file arrives in cloud storage or retraining a machine learning model when a Lakehouse is updated.
Event Stream Functionality
The functionality within event streams has also been enhanced. Transforming the data as it arrives is now easier; routing data based on such transformations or filters is now also possible. Creating “derived streams” based on the output of these transformations and filters for further downstream consumption has become trivial to implement. Event streams now also have distinct Edit and Preview modes. The Edit mode allows development to occur in a quasi-offline fashion that assures event streams in production won’t be disrupted. Once everything has been sufficiently tested, the new or updated event stream can be explicitly published.
Discoverability
The addition of the new Fabric Real-Time hub, alongside the previously implemented OneLake data hub, makes streaming data sources far easier to discover, consume, and analyze. Separate lists for data streams, Microsoft sources, and Fabric events are provided. The data streams list includes both default and derived streams from event streams, as well as tables in Event House databases. By default, these lists include everything to which the user has access, but filtering them is possible. Data streams can be filtered by workspace, owner, type (stream or table), or parent item (event stream or Event House database), and the Microsoft sources list can be filtered by source type or by Azure subscription, resource group, or region.
Analysis
While most of what’s being announced today is in private preview, Fabric’s Event House technology is now generally available (GA). Event Houses are at once a rebranding of KQL databases, based on Microsoft’s powerful “Kusto” time series database technology, and at the same time an enhancement to their functionality and management tooling. Event Houses allow multiple KQL databases to be used and managed together, enabling them to be federated and treated as a kind of partitioning mechanism. This is especially true since a single pool of compute can serve all of the individual KQL databases in an Event House. Also GA is the ability for Event Houses/KQL databases to replicate their data into OneLake, allowing all of the other engines in Fabric, including Fabric Data Warehouses, Apache Spark, and Power BI, to query and analyze the accumulated streaming data.
Visualization and No-Code Exploration
But if you are going to stick with the Event House technology, you need a way to visualize the data, explore it, and query it in an ad hoc fashion. And given that KQL is a separate query language from SQL, there can be learning curve impediments. Microsoft aims to overcome these impediments through three new capabilities: real-time dashboards, interactive visual data exploration, and a special Copilot that can generate KQL from natural language questions.
The dashboards are extremely interesting, as they resemble Power BI reports but are based on different technology. There are a few reasons this makes sense. First, KQL databases (and their Azure Data Explorer and Synapse Analytics Data Explorer pool precursors) have long had the ability to produce their own visualizations — doing so is a built-in primitive of KQL. Taking that capability and extending it to combining multiple tiled, query-based visualizations together into dashboards makes sense. Second, the analysis of time series data has its own semantics and demands specialized visualization. While basic ones like bar, column, pie, area, and line charts are part of the mix, so too are specialized viz types like time charts, anomaly charts, and stat/multi-stat charts. Meanwhile, like their Power BI counterparts, real-time dashboards support cross-filtering and drill-through.
Furthermore, since each visualization in a real-time dashboard is based on a distinct KQL query, it becomes easier to take any one of them in isolation and open it up for iterative tweaking and modification (including the addition of filters, creation of aggregations, and switching of visualization types) without editing the underlying queries. This forms the basis of Fabric Real-Time Intelligence’s visual data exploration. Users can make these tweaks through a user interface, and each modification manifests as a corresponding change to the underlying KQL query. This time series analysis-specific approach simply wouldn’t be possible in today’s Power BI, which is more focused on dimensional aggregation and drill down of tabular data.
If that’s still not good enough, Microsoft is launching a Copilot for Real-Time Intelligence, which is smart enough to take natural language (“plain English”) questions and produce KQL queries from them that can be pasted into a KQL Queryset editor and executed. This query-generation approach has the side effect of teaching KQL by example to less-technical users, enabling the power users among them to learn the language and eventually write those queries from scratch, should they feel interested and able.
Triggers
The last piece to the Fabric Real-Time Intelligence puzzle is the ability to create data-driven triggers and alerts much more easily than before. Rather than having to go to the Data Activator user interface, Fabric users can create triggers and alerts in context, right as they’re editing streams and dashboard tiles or while they’re in the Real-Time hub. Each such action will create a new Fabric Reflex object in the workspace, and these objects can do more than before. In addition to the pre-existing ability to send alerts as emails or messages in Teams, triggers can now kick off true units of work, including data pipelines, notebooks, and Spark job definitions. That means all of these executable packages graduate from just running on-demand or on a scheduled basis to being able to run on an event-driven basis too.
Conclusion
For years I’ve been saying that working with real-time, streaming event data has been a segregated specialty within analytics. Real-time analytics has demanded its own platforms and skill sets, therefore often requiring distinct personnel to work with it. This has made the notion of “360-degree analytics,” be it for the customer experience, predictive maintenance, or financial market analysis, challenging and often prone to failure. That’s always been frustrating, but in the age of AI, it’s become unacceptable.
Microsoft is working earnestly to close this gap. Whether this release really does that is up for debate. I happen to think there’s a lot more work to do, and that integration of the Eventstream, Data Activator/Reflex, and Eventhouse technologies has more distance to travel. I’m also concerned that the divergence of the real-time dashboards from Power BI could get dicey.
But I believe Microsoft is thinking more seriously than most of its competitors about how to bring streaming event data and data-at-rest together, in a way that’s usable and intuitive, to engineers, analysts, and business users. And this isn’t just abstract strategy — the company is building and shipping things, and this set of enhancements to Fabric proves it.
More information about it, visit Microsoft official website & The new stack .