Imagine we have a database table of customer details which sits behind a typical web application:
| Name | Address | Gender | Lifetime Spend | | --------- | ------------- | ------ | -------------- | | Ben Davis | 1 Low Street | Male | 2312323 | | Sue Jones | 2 High Street | Female | 34333 |
The majority of screens in the majority of business applications allow direct changes to this table, for instance by overwriting the data with updates, or deleting rows.
When you click Save on a web form, perhaps some SQL is executed on the database such as UPDATE NAME = "Sue Wilson" WHERE ID 2. The customers old name "Sue Jones" is essentially lost to the depths of the database transaction logs and your application log files.
Statements like this run trillions of times per day across the world, and each time a great deal of useful information is lost.
This information is valuable for lots of operational reasons, but especially for audit and compliance, where we need to do know exactly what data changed, when, why, and by whom.
Where this type of information clearly needs to be maintained, this will often explicitly be added using techniques such as Type 4 Slowly Changing Dimension tables which maintain a history table with timestamps. Information such as the username of the operator who entered the new information could also be logged here.
| Name | Address | Gender | Lifetime Spend | Updated Timestamp | Updated By | | ---------- | ------------- | ------ | -------------- | -------------------- | -------------- | | Sue Jones | 2 High Street | Female | 34333 | 23 Jun 2022 23:21:23 | Emma Smith | | Sue Wilson | 2 High Street | Female | 34333 | 29 Jun 2022 22:11:23 | Tom Richardson |
The problem is that this requires recognising that we need to maintain history and then explicitly coding for it in the application. Slowly Changing Dimensions are not an inherent property of our architecture or database, so by default we will be losing data on each transaction which is a compliance nightmare.
Often, we will attempt to fix this within a centralised Data Warehouse by implementing Slowly Changing Dimensions there. This avoids the need to modify legacy applications to maintain the necessary data and logs. This is not an infallable solution though, especially when we are using batch extract, transfer and load. For instance, we could miss updates which have happened between batches, and may not have metadata such as the application user who updated the value and when it happened.
Event Driven Architectures consist of loosely coupled services that respond to real world events. In the above example, an event could be Name Changed, and all of the services which need to respond to that event will subscribe to it and respond accordingly, updating state and triggering necessary actions. If this were a bank for instance, we might wish to save the new name and re-issue the debit card in response to the event.
From an audit and compliance perspective, we can simply log these events for a very rich history of how our datasets are changing and how we are responding to them. This log can happen at multiple points, at the source applications, at the destination applications, and even as they go through messaging infrastructure such as Kafka.
We can also use these events for Data and Analytics purposes. For instance, when the event is generated, we could record it and all of the metadata into a centralised Data Lake or Data Warehouse for subsequent processing. We can then analyse how our datasets are changing over time using the exact event streams which our applications are responding to. We can also use Data and Analytics initiatives to analyse this event stream for compliance purposes and audit breaches.
To conclude, moving from batch data exchange to exchanging events gives us a rich event stream that describes how the state of our enterprise data is changing over time.