Things to consider before adopting Event Sourcing

3
Clap

Event Sourcing is a pattern in which the application stores all change to the system as an event in an event store. These series of events are then run in an order to arrive at the current application state.
For optimization, store the state periodically by running through the events. This is called a snapshot. Hence, the application can derive its state faster by applying events that have occurred after the snapshot to the state snapshot instead of deriving from the start.
Some examples,
RDBMS: All Relational Database uses transaction logs to store all the changes. The current state (event sourcing mentions this as a snapshot) is present in the table. It is also common to use “log shipping”, where the database ships transaction logs from one database to another, and data are restored in other end using these transaction logs.
There is a lot to “Event Sourcing” and did not plan to cover those in this post. This post assumes there is some basic understanding already known on event sourcing. This post focuses on challenges and some techniques to address those challenges.
Event sourcing is one of the useful techniques especially when applied with CQRS and DDD.
[CQRS – Command Query Responsibility Segregation]
[DDD – Domain Driven Design]
However, it is hard to get it right or make everyone in your team understand to do it correctly.
Below are some of the challenges I have come across when understanding & designing an application using “Event Sourcing”. Please note there might be several ways and other challenges apart from these, but I only try to highlight the challenges that I faced and the measure that I took to tackle it.

Things to ponder before adopting Event Sourcing,

GDPR/Security

By theory in event sourcing “events” are never deleted but only appended. Hence if the application handles customer data then it is possible that customers can ask to get these deleted, often called “Forget Me” in GDPR.

This can become a bit tricky with event sourcing. It is also said that in some cases the system needs to delete all traces of customer data (including backed up data in drives & tapes). The complexity to delete may differ from application to application, but in general, it is easier if the data is in a database.

I am not a legal person and based on the domain the level of complexity may vary. Even if the application does not offer a direct way to delete the data, it is possible to run a SQL query to delete the data in the database. In the case of event sourcing deleting the data may be directly against the definition of event sourcing, remember event sourcing is only “APPENDING” and the PAST events cannot be changed.

Ways to handle it

  1. If we take regular snapshots and rely on events sourcing only for a few days then it may help to handle certain cases. Even if it is not immediate delete, it is possible to delete customer data on request. But this may be a constraint as we will be able to delete customer data only after certain days have passed. Also, snapshot state becomes a starting state of the system and the system state can be recovered by running events only from this snapshot “STARTING STATE”
  2. REDACT is one way to handle if we use Apache Kafka. Apache Kafka supports a concept called “LOG COMPACTION“. With “Log Compaction”,  events can be replaced by matching id. This way we can replace the customer details with a dummy value in the event store. Kafka provides a way to replace the message matching the primary key when the compaction runs.

You may read more on log compaction here, https://kafka.apache.org/documentation/#compaction

Schema Change

Data Model changes because of business changes and as our understanding of the domain mature. A traditional application where we store the last state of the application in the DB has a mature set of tools and well know practices to follow. Processes like configuration changes, database schema changes and many others have a good set of tools, guidelines and practices to follow.  Similar to these events schema can change from version to version. Most Event Sourcing works on top of pub/subtopic or on a log based streaming platform like Kafka. One main advantage of this model is that producers doesn’t have to know consumers of the events.

This majorly helps as new consumers can come in or go out any time without affecting the producer.

Ways to handle it

  1. It depends on published the message type (JSON, XML, protocol buffer, or Objects). If it is JSON or XML, some technique used in API/Service versioning can be applied here.
  2. Avro is a most recommended tool by many but I am yet to use it. Hence I will not comment on it.

Development understanding & maturity

As for any design, event sourcing must have a commitment from the development team. It is difficult for the developer or any new joiners to understand and code correctly. Developers used to imperative coding find it more difficult as event sourcing is mostly functional.  This one may take more time than you may think to make everyone understand.

Ways to handle it

  1. If event sourcing is for an existing application, then it is better to try out for one part of the subdomain before applying for all. This provides an opportunity to learn & to do it correctly for the next one.
  2. If the plan is to use event sourcing for an existing application, then have a person who has already done it lead it from the front. More likely you may end up hiring a consultant than a full-time employee.

Operations

There are many tools and techniques for RDBMS. In some cases, even business people find it easier to query the database directly to pull a report to find out what is happening in the system. In addition to this, there are a rich set of tools available for activities such as monitoring, management, backups, reporting & transformation. Although the event sourcing concept been there for a while, it is gaining traction in recent years. Also, Event Sourcing is a concept, RDBMS or event store or log/event streaming applications like “Apache Kafka” still have to be used for achieving event sourcing.

Ways to handle it

In the case of Kafka, then there are monitoring and management tools that can watch a topic and report details about it. This way it is possible to look at find slower consumers (projections and reactors). This may also be used to dynamically scale up or down projections and reactors for processing the events.

Beware of side effects of reactors & projections

The main test of event sourcing is events can be replayed from a point the application last processed to arrive at the current state. Hence different consumers (often referred to as projectors or reactors) can process the data at different times. It is also possible that these projectors or reactors raise more events.  One of the advantages of an event management system (not just event sourcing) is consumers and producers don’t have to know about each other.

When replaying events in a particular application it may raise more events and may need consumers of the events to handle it or to have a flag to stop raising events until a certain state.

For example, assume “Service A” raises events for which service B is a consumer. Similarly “Service B” processes the events and may raise more events for which “Service C” is a consumer.

Say for any reason “Service B” has lost its state completely and replays all the events raised by “Service A” to arrive at its state. Then this may raise more events which can cause issues in Service C if there is no id column or if Service C is not idempotent.

Ways to handle it

Projections are typically stateless and hence replaying the message may not have any side effects.

However, reactors maintain state and hence reactors will have to keep track of last processed events by primary key. This way reactors can ignore replayed events as these are already processed.

Is eventual consistency ok for the domain?

Often a common example used is “Bank Account Ledger” to describe as a real-world example of event sourcing. We all know that banks keep up all debits and credits.

We also know that if we apply for debit and credit on an available balance before that in the order we will arrive at the current available balance. However, in my experience bank maintains the current state i.e. Available Balance in a DB.  If the customer reports an issue, then banks will run through the credit and debit to arrive at the available balance.

When two requests come at the same time for the same account to do a fund transfer, certainly the data have to be consistent and the request has to be processed in a transaction one after another.

At the back-end it may be “Eventually Consistent”, but there are use cases where it has to be consistent. This can be tackled by making a “Write Command” Synchronous and single-threaded. i.e. until the application persists the event and derives a state the response is not sent back.  Using reactive programming, process manager, and keeping the write command “Single-threaded” will help address this.  But in a distributed clustered environment where multiple application servers are running behind a load balancer, this becomes difficult to handle. Also, single-threaded means that a particular write command that handles an aggregate will have to be in a single server. In a clustered environment it is possible to make a write command single-threaded only within an instance of the application server. Multiple requests may still reach different servers at the same time and it is not possible to make the write command single-threaded.

Ways to handle it

  1. There are tools that specialized to handle such scenarios in a clustered environment.
  2. Sharding is another approach i.e. although there are multiple application servers handling requests, request of the same action type is routed to the same application server.
  3. Most of the applications do have databases and databases are good at handling these kinds of locking across transactions.

[Note: Alternatively we perform raising events and deriving state together before sending the response]

Below are the steps that we follow

  1. On a write command after successful validation, create an event immediately.
  2. Compute the state using that event in memory for the source application.
  3. Persist the events to the event store and the state to the snapshot data store (DB).
  4. This computed state is immediately used for the next “Write Command” while the other reactors and projections use the persisted events.

In step 1) the state stored in the DB is used for successful validation.

I will keep this post updated as I learn more. Happy reading.

If you want to learn more about event sourcing there are very good videos on you tube.


Also published on Medium.