Operational intelligence and the new frontier of data
Always-on businesses such as global retailers, social media apps, transportation platforms, and financial marketplaces have mission-critical use cases that require real-time decisions on operational event streams.
Target’s supply chains must adapt to changes in store inventory, Snap’s new app launches must be debugged, Lyft’s drivers must be predictively routed to riders, and at Paypal, fraudulent payments must be flagged and blocked.
These use cases for logistics, app monitoring, and fraud detection aren’t science fiction, they’re real-world examples powered by an emerging technology stack that combines event stream processing, fast OLAP databases, interactive dashboards and machine-learning applications. Fueled by real-time data coming from instrumented products and services, this technology stack is driving a distinct category of analytics called operational intelligence, which is complementary to traditional business intelligence.
I believe the need for operational intelligence will dramatically expand in the coming years. In this post we lay out why operational intelligence matters now, its salient differences with traditional business intelligence, and why it demands new technology architectures.
Operational intelligence: why now? A global digital nervous system is emerging
One consequence of ubiquitous computing and digital transformation is the promise of visibility into the previously invisible. There are now an estimated 20 billion connected devices on the planet, encompassing everything from smartphones and cars, to lamp posts and laundry machines. Previously unobservable actions in the analog world — a package delivery, a store purchase, a taxi ride — now throw off digital signals in the form of a barcode scan, a payment gateway request, or a stream of GPS heartbeats.
Collectively, these connected devices are the sensory scaffold for an emerging global digital nervous system, whose potential we are only just beginning to explore. However, some of the near-term opportunities are analogous to the advantages of our own nervous system: faster reaction times and better decision-making with richer sensory data.
Event streams are the new frontier of data While smart sensors have been widely deployed and digital services are more finely instrumented than ever, businesses are just now building the software infrastructure to capture and process the event streams from their products and services.
With this first wave of data plumbing complete, businesses now have an opportunity to build new kinds of workflows. For example, traditionally businesses compile and update a set of KPIs on a weekly or daily basis. What happens when it’s possible to track these metrics up-to-the-minute and down to an individual line item or customer? It means that small issues can be addressed before they become big challenges. Shipped software used to take months to detect and correct issues, but today Netflix continuously monitors the performance of new versions of its software and can revert software for specific users when issues are detected. Similarly, Lyft tracks rider conversion rates at a city block level and redirects drivers to a particular block when those rates drop.
Intelligence pushed to the edge: accelerating decision loops
Sensor data is generated on the edge by users on their devices, fleets of GPS-instrumented vehicles and servers in a data center. This data is then streamed over the cloud to a central processing system to synthesize KPIs. However, if the insights stopped here, with a downsloping graph on the dashboard of a business analyst, the value of these insights is limited.
One of the hallmarks of operational intelligence is the speed of feedback to the edge. In the real-life examples above, Netflix and Lyft aren’t waiting hours or days to make adjustments, they are acting within seconds to minutes. As operational intelligence systems mature, human analysts are excised from these decision loops, and these course-corrections are automated, powered by machine-learning algorithms.
To support the accelerated decision-cycles, businesses need to transform their technology stacks.
How operational and business intelligence are different
Thinking, fast and slow
The distinction between operational and business intelligence is analogous to the distinction between fast and slow thinking, as characterized by psychologist Daniel Kahneman in his paper Thinking, Fast and Slow. One system operates quickly and automatically for simple decisions and the other leverages slow and effortful deliberation for complex decisions). Per Kahneman’s own style, we share a few examples below that illustrate different use cases for operational and business intelligence.
Operational intelligence Business intelligence As a Lyft marketplace optimization algorithm, should I direct more drivers to Logan airport to increase conversions? As Lyft’s head of HR, should we hire more drivers in Boston this month, to match rising demand there? As a Netflix DevOps engineer, should I roll back this morning’s software release for tablets running Android version 10.0.0_r52, so we can reduce crash rates? As a Netflix product manager, should we delay the Android feature launch a few days for further testing? As a Pinterest campaign manager, should I increase our advertising budget by 5% in the next hour on YouTube? As a director of marketing, how should we allocate advertising budget across YouTube and Instagram next quarter?
Ultimately, the output of operational and business intelligence are decisions. Operational intelligence fuels fast, frequent decisions on real-time and near-time data by hands-on operators. Business intelligence drives complex decisions that occur daily or weekly, on fairly complete data sets by managers. We further unpack some of the distinguishing features below.
Operational intelligence Business intelligence
Decision features
Frequency Intraday, continuous Daily, weekly, quarterly
Scope Narrow, localized Broad, macro
Effort Low High
Decision makers
Organization roles Operators Managers
Process Decentralized Centralized
Algorithmic decisions Frequent Rare
Data inputs
Sources Raw event streams Data lakes
Freshness Up to real-time Updated hourly or daily
History retained Weeks to months Years to forever
Tools
Database engine OLAP databases Cloud warehouses
Speed of queries Sub-second to seconds Minutes to hours
Dashboard type Ad hoc, interactive Structured, limited interactivity
Managers can be operators too, albeit on a much higher level. Some of the biggest consumers of operational intelligence tools are CEOs, who track and react to intraday global KPIs for their businesses. Operational KPIs often have an organizational hierarchy, and at every level — from a district manager to field service provider — there are component KPIs that require continuous monitoring and decision-making.
Why operational intelligence requires new technologies Operational intelligence provides a set of decision-making capabilities that are complementary to business intelligence. But its unique performance requirements also demand a novel stack of distinct technologies which are complementary and sit adjacent to existing business intelligence stacks.
Analytics technology stacks can be thought of as data flowing into a three-layered cake consisting of ETL, databases, and applications. The requirements for an operational intelligence stack is that it supports:
high speed of data from ETL to application high frequency, low-latency queries between the database and application layer In the diagram below we illustrate two common examples for technologies used in operational and business intelligence stacks.
On the left hand side we see an operational intelligence stack. The high speed requirements of operational intelligence are met by ensuring the data stays in flight and never hits disk. This is made possible with a combination of Kafka, a streaming processing engine like Flink, and Apache Druid. This is not achievable with the data lake architecture shown on the right-hand side of our diagram, where data lands in the cloud data lake (S3), is transformed by Spark batch processing jobs, and finally loaded into a data warehouse.
Regarding the requirement of high frequency, low latency queries at the application layer, data warehouses are simply not optimized for these access patterns. In the best case scenario, a data warehouse is able to achieve low latency queries, but at significantly higher cost per query. Warehouses like Snowflake optimize for low-cost storage, storing data on disk, while OLAP data stores like Apache Druid optimize for low-cost queries, storing data mostly in memory.
Retrofitting an existing business intelligence stack for operational intelligence capabilities will deliver lower performance at a higher cost; while these costs may not be apparent at small scale, at large scale they can become unsustainable.
Synergies between operational and business intelligence While the illustration above shows operational and business intelligence as parallel, independent technology stacks, the reality is that the most successful data architectures intertwine these. Traditional business intelligence applications such as Looker and Tableau can benefit from the performance gains offered by the data stores favored by operational intelligence stacks. Similarly, operational intelligence stacks can benefit from the cost efficiencies of batch processing, particularly when doing historical restatements.
As data infrastructure technologies evolve, it’s uncertain whether a single stack will emerge to serve both operational and business intelligence use cases, or whether these stacks will diverge further. In the present, however, the experiences of the world’s leading digital companies indicate that the era of operational intelligence has dawned and will shine for many years to come.
Read full article here.