Due to the challenges of volume, velocity and variety of data many organisations at the present have little choice but to ignore or rapidly dispose of large quantities of potentially valuable information. Indeed if we think of an organisation as a creature that processes data, then most are rather primitive forms of life. Their sensors and IT systems are not up to the job of scanning and interpreting the vast ocean of data in which they swim. As a consequence, most of the data that surrounds an organisation today is ignored. A large proportion of the data gathered by these organisations are not processed and a significant quantity of useful information passes through them, as data exhaust. This problem of not being able to capture, interpret and store relevant data that flow both inside and outside the organisation is what big data technologies are promising to address.
In this post, we are going to briefly discuss fast data - a
technology architecture derived from the big data open sources
landscape, what it is, and what this technology is promising.
Evolution of Data Operations to Big Data.
It can be difficult to determine when you’ve crossed the un-clear border between normal data operations and the realm of Big Data. This is particularly tough since the understanding of Big Data is often in the eye of the beholder.
Before big data technologies, businesses could only process data volume in the gigabytes, terabytes, and petabytes familiar to traditional data warehouses and analytics was often separate from the operations because of the limitation of hardware and computers. This lead to a high cost from a hardware perspective, as well as limitations to the types of applications that were built. These applications were either analytical and operational in nature and there wasn’t always the combined workflow between analytics and how you drive that in the business process.
Today, open source technologies that have emerged from the Apache
software projects are now being used to efficiently process petabytes
upon petabytes of data using commodity and virtualized hardware. This
has given rise to the field of big data.
When operational analytics are managed within a single data platform you can achieve the cost benefits from data governance and security perspective; reduction in number of commodity hardware and software applications.
Fast Data, a Derivative of Big data
Fast data, a derivative of big data, is providing new opportunity to better manage the increasing speed and amount of real time event data generated (i.e. occurring thousands to tens of thousands of times per second) from mobile applications, field sensors and video images. Applications developed with this new technology are currently being referred to as fast data application. Fast Data applications take many forms, from streaming ETL (extract, transform, and load) workloads, to crunching data for online dashboards, to estimating the likelihood of an equipment failure in a machine learning–driven predictive maintenance. In short, these Fast data applications can take the never ending streams of data from people interaction with devices and process them in real time to make better decisions and enhance user experience.
The data warehouse by contrast, is a way of looking through historical data to understand the past and predict the future. Acting on data as it arrives has been thought of as costly and impractical if not impossible, especially on commodity hardware. Just like the value in big data, the value in fast data is being unlocked with the reimagined implementation of data streaming, processing and storage systems, such as open source Kafka and Spark, and the reimagined implementation of NoSQL databases with Cassandra.
Capturing Value in Fast Data
The best way to capture the value of incoming data is to react to it as soon as it arrives, rather than processing them in batches. Processing them in batches means that you have lost time and, thus the value of the data.
According to an eBook “Designing Fast Data Application Architecture” from our fast data technology partner Mesosphere, data-intensive applications that aim to continuously process and extract insights from data as it flows into the system need five main technologies:
• First, a streaming engine capable of delivering events as fast as they come in (i.e. tens of thousands to millions of events per second) - Apache Kafka is the leading project in this area. It delivers a publish/subscribe model that guarantees durability, resilience, fault tolerance, and the ability to replay event messages by different data consumers.
• Second, is a rules engine - this is the place where business logic gets applied to the data. Choosing the right engine is driven by the application requirements, with throughput and latency as key discriminators.
• Third is a data store capable of processing each item as fast as it arrives. The choice of a storage system is usually bound to each application and driven by the write and read patterns it presents.
• Fourth is a data services engine that makes sense out of the data and categorise them into specific topics. For example the data collected from a plant operation could be classified as pressure, temperature or vibration related data to enable seamless consumption via HTTP/REST endpoints by fast data applications.
• The fifth is the resources management engine that provides and manages the compute and storage resources required by the data-intensive application running as “containers”, whilst ensuring that the data is secure and that the application is performing in a resilient way. These containers can be orchestrated by cluster managers to ensure that the applications they contain get their required resources, are restarted in case of failure, or relocated when the underlying hardware fails.
Navigating the big data landscape can be challenging, as some option are more obvious than others. The challenge for software architects within your IT department is to match their application and business requirements with the range of options available to make the right decisions at every layer of the architecture.
Fast Data architectures define the set of integrated components that provide the building blocks to create, deploy, and operate scalable, performant and resilient applications around the clock. A successful implementation of the Fast Data architecture will provide the business the ability to develop, deploy, and operate applications that provide real-time insights and immediate actions, increasing its competitive advantage and agility to react to specific market challenges.
Introducing Data Engineering
Are you looking for a partner to introduce Fast Data Applications into your business? Data Engineering is a digital services partner who can rapidly prototype and develop data intensive solution for you, resulting in significant return on investment (ROI).
Sign up for a free consultation at www.dataengineering.com.au to learn more.
Principal Consultant – Intelligent Automation