The journeylism of @yreynhout

On CQRS, DDD(D), ES, …

Category Archives: Uncategorized

Change is good

After more than 13 years my journey at UltraGenda has come to an end. I now know more than enough about scheduling in healthcare, building products not just projects, the importance of being part of an ecosystem as a product and not some little island, how to analyse problems, how to explore various designs, document those using volatile means and create working software off of those, making choices and trade offs along the way. But more importantly, I’ve learned to communicate the fruit of my brain, how to distinguish the many faces of change, how not everything is a software problem but sometimes a mentality, people or operations problem, how software often reflects the team that created it, why skills matter, both soft and hard, how I admire ambituous human beings. That and many other things … buy me a beer and I’ll tell you more.
Grateful is what I am … so Yoda-esque yet so true. Companies need to be enablers to bring out the best in their employees. Employees need to spot and seize opportunities yet be loyal. Symbiosis. I’m pretty sure that “being in the right environment” is the reason why I am who I am today, professionally speaking. Technology had little todo with it, really. An interesting domain that needed to grow on me, a great ambiance among collegues, the liberty to evolve are what kept me hooked for so long. So here’s saying thank you for all that.
As company takeovers took place and the dust clouds surrounding such events settled, I slowly but surely started to lose some of my connection & identity. I’m sure many of you know what that feels like, when the corporate landscape changes. Still, there’s an awesome busy bunch hard at work within those walls. They keep on producing kickass products and deliver topnotch support and service to an evergrowing customer base.

However, it’s time for me to spread my wings. From January 2014 on I’ll be working as an independent consultant/software moulder for as long as I can make a living off of it. BitTacklr is the name I’ll be trading under. Obviously there’s a website that goes by that name, as well as a twitter account. If you want to contact me for work, just drop me an email. My schedule is pretty full at the moment though (Q3/Q4 2014 earliest availability) ūüėČ

Your UI is a statechart

Let that sink in for a minute … Raise your hand if you start out by designing a behavioral – not to be confused with a navigational – model of your UI before transforming those nice mock-ups into code. Yeah, I figured as much ;-).

Why on earth would you want to turn user interface development into a software design technique? For the same reason all those XP engineering practices are so appealing: So that it can be changed repeatedly throughout the lifetime of the system. I think most of us know how fast UI code can become ¬†‘spaghetti’-shaped, despite our best effort to adhere to patterns like MVC, MVP, MVVM¬†et alii. We all aspire to have a UI that can be quickly and easily written, is easy to test using white box techniques, can be modified without introducing side-effects, and can be regression tested without manual labor. Yet, if we’re honest, that’s rarely what we end up with. I for one am a strong believer that using statechart notation for all but trivial User Interfaces is a must to get us there. It offers insight you can hardly get from looking at either a mockup or controller code.

Example

Let’s look at a simple master detail screen from which I’ve deliberately¬†omitted¬†things such as navigation and paging. It’s a fictional “crud” screen used to edit Eurostar¬†train stations.

Master Detail UI - Example

It’s pretty easy to spot the behavior you have on this simple screen:

  • Filter: Filter the list of stations
  • Sort: Sort either one of the textual list columns
  • New: Start editing a station to add
  • Save: Save the station being modified or added
  • Cancel: Cancel the modification or addition of a station
  • Delete: Delete an existing station (the ‘X’ column above)
  • Select: Select a station to modify from the list

Coding each of these behaviors according to your pattern of choice is probably easy enough that I don’t have to explain. Yet, what you’ll end up with is:

  1. no abstract view of the software: you’ll have to look at the code each time a question comes up¬†about the behavior of the application.
  2. implicit contexts: you’ll probably have conditional logic in your save event handler/controller action to determine whether a new train station is being added or an existing train station is modified. Granted, some patterns (and associated frameworks) will have pointed you in the right direction on this one, e.g. the webby ones might have turned these two contexts into two distinct urls. But take a better look at the above screen … that’s one screen, not two (which is the easy way out most people tend to take).
  3. software that is not working correctly: it depends on your skill as a developer to identify all possible ways a user can supply an event (behavior) to your application.

There are other points to consider like the resulting code being object oriented or easy to maintain. To me the first point is about design as a communication tool. The last two points are the most¬†insidious, and here’s why:

When I’m adding a new station or editing an existing one,¬†what happens¬†when I press the “delete” button (X) of a station item in the list or¬†when I press the “new” button or I select another station?

Again, you could disable every other trigger (button/link) on that screen while in “edit” mode just to prove me wrong. We could have a lengthy discussion on whether one is better than the other from a usability point of view. Suffice it to say that I consider disabling triggers to be the easy way out, so just indulge me here :). I’m not going to beat about the bush and just show you one possible resulting statechart for the behavior of this screen:

Statechart - Train Station Administration

Taking the Filter behavior as an example, you’ll get something along these lines (pseudo-code) in a traditional
approach:

//Called directly (when the Filter event is triggered)
public void Filter() {
  if(WeAreInEditMode) {
    ShowConfirmSaveDialog();
    return;
  }
  var stations = QueryStationsUsingFilter();
  DisplayStationsInList(stations);
}

//Called after save confirmation
public void Reload() { 
  var stations = QueryStationsUsingFilter();
  DisplayStationsInList(stations);
  SetEditMode(None);
}

while in a statechart driven approach you’ll get:

//Called both directly and after save confirmation
public FilterGuard Filter() {
  var stations = QueryStationsUsingFilter();
  if(stations.Any()) {
    DisplayStationsInList(stations);
    return FilterGuard.Filled;
  }
  ClearStationsInList();
  return FilterGuard.Empty;
}

The main difference is that all the conditional logic in the first approach is inside your controller, while in the latter approach the statechart “runtime” takes care of tracking what is essentially context for you.
The statechart “runtime” protects you from making illegal transitions and is in charge of the flow, alleviating your controllers from using conditional logic to determine what context they are in. Since the controller methods either return void or a guard value, it’s obvious this is not your ASP.NET MVC variety of controller. The statechart “runtime” works best in conjunction with a front controller, POCO controllers and viewmodels that enable changetracking.

Conclusion

This was just an introductionary post. There are a lot more details to discuss/describe. Maybe someday I’ll muscle the enthusiasm to write about them. If you can’t wait to learn more about this technique, then do read Ian Horrocks inspirational book.

Credits

Your EventStream is a linked list

A week ago I had an interesting twonversation with¬†J√©r√©mie Chassaing and Rinat Abdullin¬†about event streams. I mentioned how I had been toying with eventstreams as being linked lists of changesets (to the¬†aficionados of Jonathan Oliver’s EventStore, this is very much akin to what he calls Commits) in the past. As a way of documenting some of my thoughts on the subject I’m putting up some schematics here.

Model of an event stream

Model

Fom the above picture you can deduce that an event stream is really a simple thing: a collection of changesets. A changeset itself is a collection of events that occurred (how you got there is not important at this point). Event streams and changesets both have a form of unique identity. Head marks the identity of the latest known changeset. A changeset is immutable.¬†For the sake of simplicity I’ve omitted any form of headers/properties you can attach to an event stream, a changeset and/or an event.

Changesets as a LinkedList

EventStream - Model

Each changeset knows its “parent”, i.e. the changeset that it should be appended to. Except for the very first changeset, which obviously does not have a “parent” (strictly speaking you could have an explicit terminator “parent”). Chronologically, changeset 1 came before changeset 2, changeset 2 came before changeset 3, and so on.

Looking at the write side of a system that uses event streams for storage, there are two main scenarios:

  1. Writing a changeset of a given event stream: concerns here are duplicate changeset elimination & detecting conflicts, besides the act of actually writing.
  2. Reading the entire event stream: the main concern here is reading all changesets of the event stream as fast as we can, in order.

I’m well aware I’ve¬†omitted¬†other concerns such as automatic event upgrading, event dispatching, snapshotting which, frankly, are distractions at this point.

Reading

Changesets As Files

Supposing that each changeset is say a file on disk, how would I know where to start reading? Various options, really. The picture above illustrates one option where – by using “<streamid>.txt” as a convention – the Stream Head File is loaded to bootstrap “walking the chain”, by virtue of having it point to the latest changeset document (represented as LatestChangesetRef) that makes up that stream. As each Changeset File is read, it provides a reference/pointer to the next Changeset File to read (represented by ParentRef). That reference is really the identity of the next changeset.

I hope I don’t need to explain why you need to keep those identifiers logical. Don’t make it a “physical” thing, like the path to the next changeset file. That would be really painful if you were ever to restructure the changeset files on disk. Instead you should delegate the responsibility of translating/resolving a logical identity into its physical location.

Other options for “where to start reading” could be:

  1. keeping the head of each stream in memory (causing “sticky” streams and dealing with recovery mechanisms).
  2. storing the head as a record in a database or blob storage with concurrency control

Now, reading each changeset file could become a bit costly if they’re scattered allover the disk. There’s nothing stopping you from throwing all those changeset documents in one big file, even asynchronously. This is where immutability and resolving identities can really help you out. It’s important to distinguish between what happens at the logical level and what happens at the physical level.

Alternative Changesets As Files

Yet another approach might be to keep an index file of all changesets (above represented by the Stream Index File) that make up the event stream (in an append only fashion), thus alleviating the changeset documents from having to know their parents.

Writing

Basically, this operation can be split up into writing the changeset document and updating the head (or index) of the event stream. The advantage here is that storing the changeset document does not require any form of transaction. This allows you to choose from a broader range of¬†data-stores¬†as there really isn’t a requirement beyond the guarantee that they will remember and serve what you asked them to store. Updating the head of the event stream does require you to at least be able to detect concurrent writes are happening or have happened, depending on how you want to resolve conflicts. As such, there’s no need to store both of them in the same¬†data-store. Also notice that the duration of the transaction is reduced by taking the changeset itself out of the¬†equation.

When picking a changeset identity, you might be tempted to reuse the identifier of the message that caused the changes to happen (usually the Command’s message identifier). Don’t. Remember, retrying that same command might produce a different set of changes. How are you going to differentiate between rectifying a previous failure with a retry and some other thread trying to process the same message? It’s best to use identities/identifiers for just one purpose.

Model - Failure

Model - Failure

What happens when you can’t update the head to your newly inserted changeset identity? You’ll be left with a dangling changeset that didn’t make it into “the circle of trust”. No harm done, except for wasting some storage. If the changesets were blobs in the cloud it might be useful to have a special purpose daemon to hunt these down and remove them (depends on how much storage costs vs the cost of building the daemon) . In general you should optimize for having as few “concurrency conflicts” as possible (it’s bad for business).

Conclusion

I know there are a lot of holes in this story. That’s intentional, I have a lot of unanswered questions myself. I’m especially interested in any comments that can point me to prior art outside the realm of sourcecontrol systems. I have no intention of using this in production. It’s just a mental exercise.

Acknowledgements

Most of what I mention here exists in one shape or another. By now some of you will have noticed the resemblance to certain aspects of Git’s internal object model, although I’m inclined to say it’s closer to what Mercurial does. The convention based file storage, can be found in the NoDB implementation by Chris Nicola. Concepts such as commits and event streams (and much more) can be found in the EventStore project.

Aggregates and their events

There is an interesting relationship between events and the aggregates that produce them, at least to those of us who are building infrastructure for the event sourced domain model.

On one hand we want events to be decoupled from infrastructure. The focus should be on tracking intent & business data, especially in those classes that model the domain. They trigger events in response to behavior and apply state changes, that’s it. On the other hand we want a changeset (i.e. a set of events produced by an aggregate – a.k.a. a commit) to provide enough meta data so the infrastructure knows what aggregate/event stream it belongs to, what revision the aggregate is/was at to detect concurrent changes. Infrastructure might also be interested in the command that caused the changeset to happen (e.g. for idempotent behavior) or it might want to associate execution related data (performance, tracing) with the changeset. And that’s just for writing.

When it comes to reading, typically all the past changesets are read from an event store in order to restore the current state of an aggregate. Sometimes – to optimize aggregate read performance – we might add support for snapshots. The thing about reading is that we’re not really all that interested in the changesets themselves, but rather in the events they contain, across all those changesets. Again, some meta-data will need to stick to the aggregate, either directly embedded in the aggregate or using a map which tracks the meta-data associated with an aggregate. Why? Because at some point, you’ll want to save the changes you made to an aggregate.

Looking at event sourcing implementations you can see a lot of different ways of dealing with tracking meta-data:

In this example the “apply state change” methods are responsible for tracking the aggregate’s identifier and thus form the contract between aggregate and event. Mind you, only “aggregate construction” related events need to track this. Later on, while saving, the identifier (fetched by a “template” method) is used to tell the event storage service what aggregate/stream the events need to be associated with.

//Code courtesy of 
//https://github.com/gregoryyoung/m-r/blob/master/SimpleCQRS/Domain.cs
//in commit 5a7d7d0136e86c3d0cdd851cdf2d3de7d077f117

public abstract class AggregateRoot {
        public abstract Guid Id { get; }
        //Rest of members omitted for brevity
}

public class InventoryItem : AggregateRoot {
        private bool _activated;
        private Guid _id;

        private void Apply(InventoryItemCreated e) {
            _id = e.Id;
            _activated = true;
        }

        public override Guid Id {
            get { return _id; }
        }
        //Rest of members omitted for brevity
}

public interface IRepository<T> where T : AggregateRoot, new() {
        void Save(AggregateRoot aggregate, int expectedVersion);
        //Rest of members omitted for brevity
}

public class Repository<T> : IRepository<T> where T: AggregateRoot, new() {
        public void Save(AggregateRoot aggregate, int expectedVersion) {
            _storage.SaveEvents(aggregate.Id, aggregate.GetUncommittedChanges(), expectedVersion);
        }
        //Rest of members omitted for brevity
    }

In the following example the API designers have gone a little further. They’ve created a contract between the aggregate and the event in the form of a specialized interface called ISourcedEvent. It has a method called ClaimEvent which is used to couple the event to the aggregate. Basically, the “apply state change” methods are alleviated from tracking the identifier (and version) of the aggregate – the base class (and friends) takes care of that. Of course, there’s no free lunch: you events need to derive from ISourcedEvent (or the SourcedEvent base class) to make this work.

//Code courtesy of 
//https://github.com/ncqrs/ncqrs/blob/master/Framework/src/Ncqrs/Eventing/Sourcing/EventSource.cs
//https://github.com/ncqrs/ncqrs/blob/master/Framework/src/Ncqrs/Eventing/Sourcing/SourcedEventStream.cs
//https://github.com/ncqrs/ncqrs/blob/master/Framework/src/Ncqrs/Eventing/Sourcing/ISourcedEvent.cs
//in commit c3ca2490fbf9d1e6ab0411b32bb0589b187b23a8
public abstract class EventSource : IEventSource {
        private Guid _eventSourceId;

        private readonly SourcedEventStream _uncommittedEvents = new SourcedEventStream();

        public Guid EventSourceId   {
            get { return _eventSourceId; }
            protected set {
                Contract.Requires<InvalidOperationException>(Version == 0);

                _eventSourceId = value;
                _uncommittedEvents.EventSourceId = EventSourceId;
            }
        }

        public long Version {
            get {
                return InitialVersion + _uncommittedEvents.Count;
            }
        }
        private long _initialVersion;

        public long InitialVersion {
            get { return _initialVersion; }
            protected set {
                Contract.Requires<InvalidOperationException>(Version == InitialVersion);
                Contract.Requires<ArgumentOutOfRangeException>(value >= 0);

                _initialVersion = value;
                _uncommittedEvents.SequenceOffset = value;
            }
        }

        protected EventSource() {
            InitialVersion = 0;
            EventSourceId = NcqrsEnvironment.Get<IUniqueIdentifierGenerator>().GenerateNewId();
        }

        protected EventSource(Guid eventSourceId) {
            InitialVersion = 0;
            EventSourceId = eventSourceId;
        }

        public virtual void InitializeFromHistory(IEnumerable<ISourcedEvent> history) {
            //Omitted for brevity

            foreach (var historicalEvent in history) {
                if (InitialVersion == 0) {
                    EventSourceId = historicalEvent.EventSourceId;
                }

                ApplyEventFromHistory(historicalEvent);
                InitialVersion++; // TODO: Thought... couldn't we get this from the event?
            }
        }

        internal protected void ApplyEvent(ISourcedEvent evnt) {
            _uncommittedEvents.Append(evnt);

            //Omitted for brevity
        }

        private void ApplyEventFromHistory(ISourcedEvent evnt) {
            //Omitted for brevity
        }

        public void AcceptChanges() {
            long newInitialVersion = Version;

            _uncommittedEvents.Clear();

            InitialVersion = newInitialVersion;
        }

        //Rest of members omitted for brevity
}

public class SourcedEventStream : IEnumerable<ISourcedEvent> {
        public void Append(ISourcedEvent sourcedEvent) {
            ClaimEvent(sourcedEvent);

            _events.Add(sourcedEvent);
        }

        protected void ClaimEvent(ISourcedEvent evnt) {
            //Omitted for brevity

            var nextSequence = LastSequence + 1;
            evnt.ClaimEvent(EventSourceId, nextSequence);
        }
}

public interface ISourcedEvent : IEvent {
        Guid EventSourceId { get; }

        long EventSequence { get; }

        void InitializeFrom(StoredEvent stored);

        void ClaimEvent(Guid eventSourceId, long sequence);
}

There’s nothing wrong with either of these approaches. You just have to be aware of how they work. Both deal with getting/setting meta-data in their specific way. Aggregate identifier and, optionally, its version will – most of the time – appear both as meta-data and event-data. The identifier will be used to identify aggregates, while the version will probably serve as a means to detect optimistic concurrency, both submitted as part of a command later on.
Yet, isn’t it strange we never wonder why we’re modelling it that way? I blame the “LoadFromHistory” method for only taking in a stream of events. This severely limits your options and forces you to “derive” both the aggregate identifier and version from the event stream. Why not make it explicit? Recently, while developing StreamyCore (a toy-around project of mine), I’ve come up with the following API:

public abstract class AggregateRootEntity : IInitializeAggregate, ITrackAggregateChanges {
  public const long InitialVersion = 0;
  private Guid _identifier;
  private long _baseVersion;
  private long _currentVersion;
  private List<IEvent> _events;

  protected AggregateRootEntity() { }

  //Useful for those who want to embed the metadata into their events
  protected Guid AggregateId { 
    get { return _identifier; } 
  }
  protected long AggregateVersion { 
    get { return _currentVersion; } 
  }

  protected void Initialize(Guid identifier) {
    //Used when you're creating an aggregate yourself
    _identifier = identifier;
    _baseVersion = InitialVersion;
    _currentVersion = InitialVersion;
    _events = new List<IEvent>();
  }

  protected void ApplyEvent(IEvent @event) {
    PlayEvent(@event);
    RecordEvent(@event);
    _currentVersion++;
  }

  private void PlayEvent(IEvent @event) {
    //Plays the event to get a state change
  }

  private void RecordEvent(IEvent @event) {
    //Records the event
  }

  void IInitializeAggregate.Initialize(IAggregateConstructionSet set) {
    //Used when you read an aggregate from the event store
    _identifier = set.AggregateIdentifier;
    _currentVersion = set.AggregateBaseVersion;
    _events = new List<IEvent>();
    foreach(var @event in set.Events) { 
      PlayEvent(@event); _currentVersion++; 
    }
    _baseVersion = _currentVersion;
  }

  bool ITrackAggregateChanges.HasChanges() { 
    return _baseVersion != _currentVersion; 
  }

  IAggregateChangeSet ITrackAggregateChanges.GetChanges() {
    //Used when you write an aggregate to the event store
    return new AggregateChangeSet(_identifier, GetType(), 
      _baseVersion, _currentVersion, _events.ToArray());
  }

  void ITrackAggregateChanges.AcceptChanges() {
    _baseVersion = _currentVersion;

    _events.Clear();
  }
}

public class BeautyPageant : AggregateRootEntity {
  private BeautyPageant() {}

  private BeautyPageant(Guid id, IEvent @event) {
    Initialize(id);
    ApplyEvent(@event);
  }

  public static BeautyPageant NewPageant(Guid id, string name, int yearInTheGregorianCalendar) {
    return new BeautyPageant(id,
      new NewBeautyPageantEvent(id, name, yearInTheGregorianCalendar));
  }

  public void ElectBeautyQueen(string nameOfThePoorThing) {
    ApplyEvent(new BeautyQueenElectedEvent(AggregateId, AggregateVersion + 1, nameOfThePoorThing)));
  }

  private void Apply(NewBeautyPageantEvent @event) {  }

  private void Apply(BeautyQueenElectedEvent @event) {  }
}

The idea is to think of the identifier and version as totally separate concerns (i.e. separate from the events being applied) and make them explicit in the infrastructure APIs that need to deal with them. At the same time you have to cater for the scenario where the end-user new’s up the aggregate him-/herself. But with this approach there’s no requirement to track meta-data in the apply methods yourself, nor does it really force a base class/interface upon your events (even though I’m using one here (IEvent), albeit without behavior). Generally, I think of it as a different approach, not a better one.

Regardless of your personal preference, when designing your own API or reusing an existing one there are a couple of questions you should ask yourself:

  • Do I care about the coupling of my events to some base class or interface?
  • Do I consider my events to be immutable after construction?
  • Do I consider meta-data separate from event-data?
  • What kind meta-data tracking do I want to do myself? Or do I want some base class to do that for me?
  • Who tracks the meta-data? The aggregate’s base class? An external map?
  • Am I comfortable with the “derivation” of meta-data from event-data?
  • How will a user of my aggregate base class initialize meta-data? Am I explicitly communicating how to do that?