Course Overview
Streaming Data and Stream Processing

Challenges Associated With Streaming Data

Lesson #3

In this lesson we will:

  • Discuss the challenges in building streaming solutions;

Why Is Working With Streaming Data Difficult?

As we discussed in previous lessons, moving from the traditional batch approaches towards real time data streaming solutions is a challenging undertaking.

In this lesson, we will explain these challenges in more detail.

Scalability

Streaming platforms need to process and analyse high volumes of event data. Though one stream of events could have a high volume of events, there are likely to be multiple streams all generating data in parralel. An enterprise stream processing platform therefore is likely to need a very high degree of scalability to handle the volumes of data in flight and at rest.

Variance

The volume of events in the stream can scale up and down in terms of volume, and may spike during peak hours. Streaming platforms therefore need a capability to scale up and down dynamically to accomodate these changing workloads.

Latency

In streaming scenarios, businesses often have some benefit to responding to their event streams in real time. We therefore need to ingest, process and respond to the streams of events with low latency in order to extract maximum value from the data.

Exactly Once Processing

When working with event streams it is important to never lose a message, and never double send or double process a message. We therefore need to build solutions which have a high degree of reliability in how messages are processed, even if some component in the stack was to fail.

Stateful Processing

It is relatively simple to develop stateless processors which do things such as filter out, route, or add detail to events. However, the complexity grows when we want to look for historical patterns such as “3 failed credit card transactions in the last hour.” To do this, we need to process events by considering their past state, which adds significant complexity into the stack.

Time Semantics

The notion of time becomes complex in event processing. Do we care about the time the event happened, the time it was received by the processor, or the time it was stored in the database? In most scenarios, event time is the natural choice, but then we need correct semantics to ensure that we are using the state of the world at the time in question when we come to process the event.

Security

It is important to maintain complete security around personally identifiable and commercially sensitive data. We need to encrypt all stored data in flight and at rest as it moves through the various message queues and processors. This repeated encryption and decryption has impacts on latency and operationally managing the system.

Next Lesson:
03

Key Technologies In Streaming Data

In this lesson we will learn about some of the key technologies, tools and platforms that can be used to process and extract value from streaming data.

0h 10m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.