SuperPumpup (dot com)

General awesomeness may be found here.

12 January 2017

Protecting Long-running streams

Objective

We have long-running entities that we want to operate on with confidence.

Constraints

Canonical question

We have an account with 100k events (Hold, Capture, Release, Deposit) to process. Because of distributed systems, we will receive "duplicate" commands.

Solutions examined

Re-projecting all history to build state

Keeping 'rich' snapshots

Expiring old (vector) transactions

Expiring old (wall clock) transactions

Keeping transaction IDs in small, "external" streams

The problem here is subtle.

If the system crashes here:

And handler 1 picks up event A, it will need to check to see whether the account stream has the event already. However, the account stream has a million events. Searching for that event is problematic. If the stream has 1mm events already, scanning for that one event is prohibitively expensive.

ES has a provision for an "Event ID" A unique identifier representing this event. This is used internally for idempotency if you write the same event twice you should use the same identifier both times. So if you can generate an Event ID (GUID) deteriministically, then actually you would be ok. However, that couples more tightly to an ES implementation detail than we would prefer.

But if we write the version number of the account projection in the "HoldRequested" event, then we know that the subsequent "Held" event must be in the events following that, and we can search only that space to ensure that our work is not repeated.

So in the end, we wind up with a messaging pattern that looks like this:

Account Messaging

You'll see that to process a Hold command, we write three events within the Account Service, and then communicate that back out by writing another.

To Capture the held funds, we also write three messages.

One thing that is striking about this pattern is that the handler that is handling Hold events projects entities from the Account stream, does business logic, and writes to the Account stream. These processes are coupled, for all intents and purposes - which is reasonable. They are coupled as a business process, so their implementation is coupled as well. Decoupling is not always necessary (or desirable).

Categories: Software