The journeylism of @yreynhout

On CQRS, DDD(D), ES, …

Category Archives: Messaging

Reply to “DDD, CQRS & ES: Lessons Learned”

Last Monday I attended a meetup of the Belgian Domain Driven Design group. One of the presentations was titled DDD, CQRS & ES: Lessons Learned, which – what did you expect? – caught my interest. Gitte, the presenter, explained the problems she and her team came across while applying the aforementioned set of acronyms. None of these stumbling blocks came as a surprise to me. It’s the same kind of issues that almost everyone faces when they first set out on this journey. The solutions are often trivial and diverse, but sometimes they’re not and may require a bit of explanation. I didn’t get a chance to discuss some of the alternatives with Gitte directly. You know how it goes, you end up talking to a lot of people but not everyone, and perhaps, in retrospect, it’s better to let things sink in, instead of blurting out what can only come across as condescending “I know better” type of remarks, which, honestly, is not my intent. So without further ado, let’s go over some of those problems:

Unique incremental numbers

Description

The basic problem here is that they wanted to assign each dossier a unique number with the added requirement that it had to be incremental. She went on to explain that the read model (for user interface purposes) wasn’t a good place, since that was just a side-effect of replaying events and would easily lose track of the highest number. Also the eventual consistent nature of that read model made it unsuitable for such purposes. In the end they settled on a singleton-eque, event-sourced aggregate that was made responsible for handing out unique numbers.

As an aside, if Greg Young earned a dollar for the number of times the “uniqueness” problem was discussed on the DDD/CARS mailing list, he’d be able to buy a new laptop (I’m being modest).

Reply

The problem with “uniqueness” requirements is that, well, very often there’s a deeper underlying reason why people want them. In this case, I can’t shake the feeling that it was a mixture of (a) having a human readable identifier that is easily communicated over the phone or in a letter, either among employees of the corporation or between employees and the customers/suppliers of the corporation and (b) knowing the number of dossier handled a year by just looking at the number. Obviously I’m going out on a limb in this particular case, but in general digging deeper in the domain gives a better understanding of why these requirements are in fact requirements. The aforementioned requirements could be solved in an entirely different manner than making an event-sourced number generator. Numbers could be assigned at a later point in time ordering all dossier births sequentially, or we could have a number reservation service (used quite liberal here) that hands out exclusive numbers and keeps a time-based lock on said number for us, or a high-lo reservation mechanism of numbers. But such efforts should be weighed against “real” requirements. What’s the business impact of handing out numbers in a different order, or having temporary gaps in the set of numbers, or what if the numbers do not correspond to the chronological order of events? Who is losing sleep over this? Now, the most common response to this is “But, but, but this was dead easy when we used our relational model. The database generated incremental numbers for us”. I’m sorry, but, but, but, you seem to confuse the requirement with the implementation at that point. Yes, you can leverage that implementation to satisfy the requirement, nothing wrong with doing that in your CRUD or anemic based model, but how aware are you of that underlying requirement? Did you even stop and think about it? Because that’s what these acronyms just made you do …

Akin to this problem is set based validation for which you can read this piece of prior art.

Refactoring

Description

For some reason they’ve found refactoring to be cumbersome, i.e. pulling behavior and corresponding events out of a particular aggregate and moving it into one or more new aggregates. Implementation wise they were using a derivative of Mark Nijhof’s Fohjin.

Reply

I don’t think the intent was to refactor here. Refactoring, i.e. restructuring code on the inside without changing external behavior, is much easier using an event-sourced model, especially since tests can be written using messages instead of the nasty coupling to implementation I usually get to see. The reason you break out to new aggregates is due to new insights, either around consistency boundaries or concepts that were implicit before or concepts that were misunderstood (Hm … how did this even pass analysis, design, coding, Q&A? The reality we live in, I guess). That’s not refactoring to me, but I don’t want to be splitting hairs either. Regardless, the problem remains. Yet, state based models have to deal with this issue as well. You bend your code to a new structure and you transform the data into a shape that fits the new structure. Hopefully, you’ve been collecting enough historical data to be able to provide sane input to that data transformation. State based models tend to “forget” historical facts because nobody deemed them to be important. Historical structural models are an even more painful experience I’ve blocked from my memory, so let’s not go there. Granted, tooling is more pervasive for structural models (or should I say the data that underpins them). But it’s not exactly a walk in the park either. It’s always going to be work, either way. It’s just different work.

One thing I noticed was that they were using a base class for their events (https://github.com/MarkNijhof/Fohjin/blob/master/Fohjin.DDD.Example/Fohjin.DDD.Events/DomainEvent.cs) with a property called “AggregateId”. That’s one source of trouble when you want to move events to another aggregate. Don’t do that. Just call it what it is, e.g. a CustomerId or an OrderId or a DossierId. The aggregate an event was produced by should cater for the identification of the corresponding stream, not the event itself (though e.g. an attribute/annotation based approach could work as well). Don’t get me wrong, you can copy all data you like into the event, including the aggregate id, the event id, the version. But I doubt it’s wise to make it the basis of your infrastructure. At least, such has been my experience. I could make a similar argument for the version and event id.

Another remark that struck me as odd was the fact that event migration is non-trivial. Odd, since I’ve always perceived event streams as lightweight, easy to transform, merge or split. This should be dead easy. If it’s not, you might want to start digging to see where the friction is coming from.

Da Big Fat Aggregate Cat

Description

Some of their aggregates were getting too big, resulting in too many lines of code, especially since decision and state changing behavior had been split into two different methods.

Reply

Their future solution to this issue is looking into splitting up the decision making and state tracking in two different classes. The good news is, it’s been done before, no need to reinvent that wheel. The most common reasons for these obese aggregates is that they have too fine-grained behaviors, their behaviors take parameter lists that require scrolling off-screen, the consistency boundary is modeled after a noun, or we might be forcing an event-sourced approach onto what would be better served with a mixture of a document and events. It’s hard to tell without looking at the specifics, but it’s often an indicator that more investigation is needed instead of brushing it off as a code navigation issue. Vaughn Vernon’s book and papers, as well as this excellent post by Gojko Adzic should be a gentle reminder of what care and consideration should go into designing your aggregates.

Tooling

Description

Tooling for building these types of systems isn’t as pervasive as it is for structural/state-based models. Be prepared to build a lot of it yourself.

Reply

From an operational point of view there might be some truth to this statement and there’s definitely room for improvement. But maybe that’s what I love about it. It’s more about cherry-picking existing tooling that fills the gaps that need filling. For the most part it’s about understanding the mechanics and the forces at play. There’s no wrong or right, there’s just various trade-offs and specific requirements/needs. Trying to come up with something that satisfies a lot of the generic requirements would lead us down the path of the lowest common denominator or a Swiss army knife you configure using lots of xml, json, yaml or even code. Let’s not go there … ever again. I know, sounds like hand-waving, but give me a specific problem, and I’ll try to give you a specific answer.

To put things into perspective, I’ve built a stream viewer/smuggler/transformer/rewriter on top of NEventStore, all in under one day’s worth of work. Not production quality, but good enough nonetheless. Alternatively, I could have chosen to leverage a document store, put all the events in it using a catch-up subscription and reaped the benefits of full text searching those events, all in under a few hours, all because I know how to marry a paradigm and a set of technologies … and this to me is the essential part.

In the end …

I’ve just reiterated some of the problems they’ve encountered. It’s good people discuss these things out in the open, because that’s how you shorten the feedback loop, even if the solutions seem trivial or highly depend on context. I hope it’s clear that my replies should not be interpreted as glorifying the acronyms nor as bashing of the presenter.

Trench Talk – Authorization

For a long time now, I’ve been dealing with the subject of authorization and its bigger brother, authentication. Not that dealing with these makes me an expert, mind you. But it did arouse my interest in the areas of RBAC (role based access control) and family (hierarchical RBAC, administrative RBAC, attribute based RBAC, …), RAD (resource access decision) from OMG fame, XACML (eXtensible Access Control Markup Language) from OASIS, and a few others I’ve lost track of over the years. Why do we embed authorization into software, though? I can understand the need in military and nuclear facilities, or when company secrets are involved. But beyond that it’s mainly a game of fear, risk, trust and workflow. In general, in end-user facing systems, there are two forces at play: one is the top-down, enterprisy, we-want-control-over-who-sees-or-does-what and the other is the this-is-my-stuff-and-i-say-who-gets-to-see-it-and-manipulate-it. It’s not uncommon to find both in one software system you’re working on. The reason is quite simple: administration overhead. Imagine controlling each and every employee’s role(s), exceptions to such role(s), role transitions over time, etc … and you’ve got 90,000 of ’em. While some organizations are – believe it or not – willing to pay money to their IT department (in a low-cost country, no doubt) to take care of this task, others have flattened their structure or adopted controlled delegation to overcome this. I guess, a lot depends on “the culture within”. Bottom line, to me, is that fine-grained, top-down control doesn’t scale very well – administratively speaking – and is entangled with topics of trust, risk, and – for those at the bottom – frustration. This is usually the point where we tend to see the “owner-centric” approach to authorization, where “owners” impose who gets to see or manipulate a particular resource or set of resources they own or co-own. If we’re lucky, that is.

Authorization models are fascinating. There have been many attempts at abstracting this domain and treating it as a “separate” problem. All in all we’ve been pretty successful at doing that. Think of all the products out there that incorporate the notion of storing and enforcing authorization “policies”. I’ve always been in environments where I couldn’t use off-the-shelf products for this particular problem. Mostly because it would have meant coupling to a particular technology and the inflexibility/inadequacy they brought with them. So, yes, in case you’re wondering, I’ve reinvented this wheel a couple of times. Why? Because, amongst other reasons, the data to base the access decision on was tightly coupled to the domain at hand. Depending on the required “freshness” of said data with regard to securing a particular resource you will want to design it one way or another.

What I like about Domain Driven Design is that it embraces the idea of entertaining multiple models to solve a particular problem. Of course, with choice comes responsibility, trying things out, failing, heck, even stupidity, but above all learning from those. Being focused and doing small, controlled experiments – verbally at the whiteboard or textual as in a code probe – may mitigate the risk of going down the wrong path. Such endeavors depend on the skill and experience of those involved, IMO. Yet by no means is success guaranteed. Such is life … how depressing. Still, let’s look at the bright side … of life, that is.

Make the implicit explicit

Let’s look at a small example … Imagine a piece of software that allowed one or more people to collaborate on an art design. Below I’ve coded up the use case of inviting somebody to collaborate with somebody else on a particular design.

public class InviteArtDesignCollaboratorHandler : Handles<InviteArtDesignCollaborator>
{
  //invite somebody to collaborate with you on a certain art design.
  public void Handle(InviteArtDesignCollaborator message)
  {
    var inviter = this.personRepository.Get(new PersonId(message.InviterId));
    var invitee = this.personRepository.Get(new PersonId(message.InviteeId));
    var artDesign = this.artDesignRepository.Get(new ArtDesignId(message.ArtDesignId));
    var invitation = artDesign.InviteCollaborator(inviter, invitee);
    this.invitationRepository.Add(invitation);
  }
}

If we zoom in on the authorization aspect of this use case, what do we see in the above code? Nothing, that’s right. Who can invite collaborators to an art design? The original creator of such a design IF and only IF he is still a participant of the collection the art design is part of. Who can be invited to an art design? Somebody who already participates in the collection the art design is part of. Obviously, I’m making this up as I go. Yet, these aren’t uncommon requirements to come across. Let’s see how this manifests itself intermingled with the above code.

public class InviteArtDesignCollaboratorHandler : Handles<InviteArtDesignCollaborator>
{
  //invite somebody to collaborate with you on a certain art design.
  public void Handle(InviteArtDesignCollaborator message)
  {
    var inviter = this.personRepository.Get(new PersonId(message.InviterId));
    var invitee = this.personRepository.Get(new PersonId(message.InviteeId));
    var artDesign = this.artDesignRepository.Get(new ArtDesignId(message.ArtDesignId));
    var collection = this.collectionRepository.Get(artDesign.CollectionId);
    if (!artDesign.WasOriginallyCreatedBy(inviter))
      throw new NotAuthorizedException("The inviter is not the original creator of the art design.");
    if (!collection.HasAsParticipant(inviter))
      throw new NotAuthorizedException("The inviter is not a participant of the collection the art design is part of.");
    if (!collection.HasAsParticipant(invitee))
      throw new NotAuthorizedException("The invitee is not a participant of the collection the art design is part of.");
    var invitation = artDesign.InviteCollaborator(inviter, invitee);
    this.invitationRepository.Add(invitation);
  }
}

What if we were to separate the authorization responsibility from the actual use case? Why would we do that? Does changing the rules about who can be invited and who can invite fundamentally change this use case? Aren’t the axis of change different for these two? If yes, then the next piece of code might make more sense. Mind you, it’s a matter of preference at this point. Requirements might push it more in one or the other direction as to “the code feeling natural”.

public class InviteArtDesignCollaboratorAuthorizer : Authorizes<InviteArtDesignCollaborator>
{
  //invite somebody to collaborate with you on a certain art design.
  public void Authorize(InviteArtDesignCollaborator message)
  {
    var inviter = this.personRepository.Get(new PersonId(message.InviterId));
    var invitee = this.personRepository.Get(new PersonId(message.InviteeId));
    var artDesign = this.artDesignRepository.Get(new ArtDesignId(message.ArtDesignId));
    var collection = this.collectionRepository.Get(artDesign.CollectionId);
    if (!artDesign.WasOriginallyCreatedBy(inviter))
      throw new NotAuthorizedException("The inviter is not the original creator of the art design.");
    if (!collection.HasAsParticipant(inviter))
      throw new NotAuthorizedException("The inviter is not a participant of the collection the art design is part of.");
    if (!collection.HasAsParticipant(invitee))
      throw new NotAuthorizedException("The invitee is not a participant of the collection the art design is part of.");
  }
}

public class InviteArtDesignCollaboratorHandler : Handles<InviteArtDesignCollaborator>
{
  //invite somebody to collaborate with you on a certain art design.
  public void Handle(InviteArtDesignCollaborator message)
  {
    var inviter = this.personRepository.Get(new PersonId(message.InviterId));
    var invitee = this.personRepository.Get(new PersonId(message.InviteeId));
    var artDesign = this.artDesignRepository.Get(new ArtDesignId(message.ArtDesignId));
    var invitation = artDesign.InviteCollaborator(inviter, invitee);
    this.invitationRepository.Add(invitation);
  }
}

We can now separate the authorization test specifications – who can perform the use case under what circumstances – from the actual use case specifications – what is the side effect of the use case, when can it happen/can’t it happen depending on current state. Is doing all this desirable? I guess it depends on what you are getting out of it.
Fundamentally, those query methods (“WasOriginallyCreatedBy”, “HasAsParticipant”) are using projected and provided state (input) to determine what their boolean return value should be. If we could live with relaxed consistency in this regard, we could even project these values “at another point in time” and use those instead of these query methods. Heck, we’d be able to ditch these query methods from our model altogether. But again, it largely depends on what kind of consistency you are expecting vis-à-vis the aggregate you are affecting. It’s also debatable whether we are dealing with a different model or not. After distillation, would these query methods still be there? Why? Are they essential or circumstantial? These aren’t questions a blog post can answer, I’m afraid. They are highly contextual.

Keep it simple when appropriate

In scenarios with simpler requirements we could choose an approach reminiscent of aspect oriented programming or pipes and filters. Below are a couple of examples. The assumption is that some piece of code will inspect the handler’s Handle method, look for the Authorization attribute and instantiate either a decorating handler or a pipeline component to represent the authorization based on the attribute’s properties.

public class StartSeasonPortfolioHandler : Handles<StartSeasonPortfolio>
{
  //start a new season portfolio for a subsidiary.
  [Authorize(Role="SubsidiaryAdministrators")]
  public void Handle(StartSeasonPortfolio message)
  {
    var subsidiaryId = new SubsidiaryId(message.SubsidiaryId);
    var season = Season.From(subsidiary, message.Season);
    if (this.portfolioRepository.HasPortfolioForSeason(subsidiaryId, season)) 
      throw new SeasonPortfolioAlreadyStartedException("A portfolio was already started for the specified season of the subsidiary.");
    var subsidiary = this.subsidiaryRepository(subsidiaryId);
    var portfolio = subsidiary.StartPortfolio(new PortfolioId(message.PortfolioId), new PortfolioName(message.Name), season);
    this.portfolioRepository.Add(porfolio);
  }
}

This example codifies the caller must be member of the role ‘SubsidiaryAdministrators’. Interestingly enough, you can’t really tell who “the caller” is from the above code, now can you? This was less obvious in the previous art design example. The assumption there was that the “inviter” was “the caller“. How do you know that for sure, though? It’s just data I could have easily spoofed. Alas, authentication – proving the caller’s identity – and message integrity – did somebody tamper with this message – are beyond the scope of what I want to discuss here (*). For now, let’s assume we know who the ambient caller is.

public class StartSeasonPortfolioHandler : Handles<StartSeasonPortfolio>
{
  //start a new season portfolio for a subsidiary.
  [Authorize(Permission="CanStartSeasonPortfolio")]
  public void Handle(StartSeasonPortfolio message)
  {
    //ommitted for brevity - same as above
  }
}

When dealing with fixed or hard coded roles becomes cumbersome administratively speaking, the next thing you’ll see is this shift in focus towards permission to perform a certain operation. It’s obviously easier, since I could be member of multiple, dynamic roles that have the CanStartSeasonPortfolio access decision set to allowed. Yet, what if multiple roles had conflicting access decisions? You’d need a strategy to resolve such issues. In finance, accounting, health care and military environments you may even encounter separation of duty.

public class StartSeasonPortfolioHandler : Handles<StartSeasonPortfolio>
{
  //start a new season portfolio for a subsidiary.
  [Authorize]
  public void Handle(StartSeasonPortfolio message)
  {
    //ommitted for brevity - same as above
  }
}

Command messages could prove to be natural authorization boundaries. In such case, the name of the message translates to a particular permission, being a very conventional way of modeling authorization. If only it was always this simple 🙂

One thing that is very different from the art design example is that the notion of authorization wasn’t very coupled to the model we were dealing with. It felt somewhat more natural to tuck it away behind an attribute. There was no urge to consider it part of the model that dealt with the use case itself. Why is that? I think because no decisions were made based on state that was part of the model.

(*) If you’ve been infatuated with OAUTH, OpenID, Kerberos, PKI, etc. in the past, I doubt I could bring anything new to the table.

Not just for writing

Upon till now we’ve been very focused on preventing callers from invoking behavior they are not allowed to. However formidable that may be, I often find that’s only half of the story. When dealing with users or automated services, we want to guide them towards the happy path, no? We want to tell them: to the best of my knowledge, at the point in time you’re asking me, these are the state transitions you are allowed to perform. That’s when it hit me. The same authorization decisions you’re performing on the write side are also required on the read side, albeit in a form that could be very different. Obviously, we’re dealing with more latency here. As soon as you’ve computed or queried the access decision, it could be modified behind your back. It may have to travel across a network, it may have to manifest itself in a user interface (e.g. enable or disable a button/link) or await a user’s decision, it may manifest itself in an automated service choosing an alternate path through its logic. It’s a hint/clue, at best. Yet, often end users are quite content with these decisions.
That said, to me the key differentiator is the fact that permissions are not limited to state changing behavior. Whether or not you are allowed to view a particular piece of information is something that usually only manifests itself on the read side. Also note that the read side is often less harsh. It doesn’t throw an exception at you, rather it filters or hides information from you. Often I find that you’ll have a mixture of view and behavior permissions, especially on the read side.
Below you’ll find a very technology specific example of how that might manifest itself in a web api.

[RoutePrefix("portfolios")]
public class PortfolioResourceController : ApiController
{
  [Route("{id}")]
  public IHttpActionResult Get(string id)
  {
    var identity = new ClaimsIdentity(RequestContext.Principal.Identity);
    return this.portfolioQueries.
      ById(id).
      Select(_ => Ok(_.CompleteAuthorization(identity, this.Url))).
      DefaultIfEmpty(NotFound()).
      Single();
  }
}

public class PortfolioResource 
{
  public PortfolioResource CompleteAuthorization(ClaimsIdentity identity, UrlHelper helper)
  {
    var links = new List<Link>();
    //Note: CanEdit|Delete|ViewPortfolioItems are extension methods on ClaimsIdentity that, internally, 
    //      use claims to answer the authorization requests in a similar way as how the
    //      write side would ask them.
    if(identity.CanEditPortfolio(Id)) 
      links.Add(new Link { Rel = "edit", Href = helper.GetLink<PortfolioResourceController>(_ => _.Put(Id)) });
    if(identity.CanDeletePortfolio(Id)) 
      links.Add(new Link { Rel = "delete", Href = helper.GetLink<PortfolioResourceController>(_ => _.Delete(Id)) });
    if(identity.CanViewPortfolioItems(Id)) 
      links.Add(new Link { Rel = "items", Href = helper.GetLink<PortfolioResourceController>(_ => _.GetItems(Id)) });

    return new PortfolioResource(Id, Name, Season, SubsidiaryId, links.ToArray());
  }
}

What’s important to realize, is that we’ll probably start looking for similarity and deduplication (dare I say DRY) of the same logic that happens to be invoked on the write and read side, and trying to find a common home for them. Over the past months I saw an entire model emerge to deal with this and something as difficult as the art design example. Again, I’m pretty sure this isn’t the only way how I could’ve modeled it, but for now there were strong indicators this was a viable option. Below another, more convoluted – it’s the last, I promise – example of what I’ve come up with.

public class PortfolioAccessPolicy
{
  public bool CanEdit(Subject subject)
  {
    return 
      subject.CanEditPortfolio(RolePermissionSet) && 
      subject.IsStarterOfPortfolio(StarterId);
  }

  public bool CanDelete(Subject subject)
  {
    return 
      subject.CanDeletePortfolio(RolePermissionSet) && 
      subject.IsStarterOfPortfolio(StarterId);
  }

  public bool CanViewItems(Subject subject)
  {
    return 
      subject.CanViewPortfolioItems(RolePermissionSet) ||
      subject.IsPortfolioSupervisor();
  }
}

//Usage on the read side
public class PortfolioResource 
{
  public PortfolioResource CompleteAuthorization(Subject subject, PortfolioAccessPolicy policy, UrlHelper helper)
  {
    var links = new List<Link>();
    if(policy.CanEdit(subject)) 
      links.Add(new Link { Rel = "edit", Href = helper.GetLink<PortfolioResourceController>(_ => _.Put(Id)) });
    if(policy.CanDelete(subject)) 
      links.Add(new Link { Rel = "delete", Href = helper.GetLink<PortfolioResourceController>(_ => _.Delete(Id)) });
    if(policy.CanViewItems(subject)) 
      links.Add(new Link { Rel = "items", Href = helper.GetLink<PortfolioResourceController>(_ => _.GetItems(Id)) });

    return new PortfolioResource(Id, Name, Season, SubsidiaryId, links.ToArray());
  }
}

//Usage on the write side
public class EditPortfolioAuthorizer : Authorizes<EditPortfolio>
{
  public void Authorize(Subject subject, EditPortfolio message)
  {
    var policy = this.policyRepository.Get(new PortfolioId(message.PortfolioId));
    if(!policy.CanEditPortfolio(subject))
      throw new NotAuthorizedException("The caller is not authorized to edit this portfolio.");
  }
}

Conclusion

My only hope in writing this down is that it will make you think about how you’re modeling authorization and the realization that there is no one way of modeling it. Whether you’re rolling your own or dealing with something off the shelf, know that your requirements will dictate a lot. If you’re in an environment where reads and writes are separated but crave some commonality, know there are ways to share. I do not pretend to have all the answers, merely sharing what I’ve learned so far and what worked for me.

Trench Talk: Assert.That(We.Understand());

After having written 2000+ eventsourcing-meets-aggregates specific given-when-then test specifications(*), you can imagine I started to notice both “problems” and “patterns”. Here’s a small overview …

(*) If you don’t know what those are, here’s a 50 minutes refresher for you: http://skillsmatter.com/podcast/design-architecture/talk-from-greg-young … no TL;DW;

Variations along the given and when axis

Ever written a set of tests where the when stays the same but the givens vary for each test case? Why is that? What are those tests communicating at that point? They’re testing the same behavior (and its outcome) over and over again, but each time the SUT is in a (potentially) different state. With these tests, one particular behavior is put in the spotlight.
When you are exploring scenarios – hopefully using a tangible, visual DSL and together with someone knowledgeable about the problem domain – you can uncover these variations. How? By gently steering the conversation in a direction where you keep asking what the outcome is of a certain behavior, each time changing the preconditions.

Similarly, ever written a set of tests where the givens stay the same but the when varies for each test case? Why is that? They’re testing different behavior, but each time the SUT is in the same state. From what I’ve seen, these tests are focusing on a particular state in the lifecycle of the SUT and constrain the behavior that can happen at that point in time.
Again, in your exploratory conversations you can uncover these cases by focusing on a certain state and asking what the outcome of each behavior and/or behavior variation would be.

Obviously, conversations don’t always go the way you want them to, but I just wanted to point out the importance of language, and how listening attentively and asking the right questions enables you to determine appropriate input so you can assert that you’ve covered most – if not all – variations.

Verbosity

Writing these tests, verbosity strikes in odd ways. It often becomes difficult for readers of the codified test specification to differentiate between the essential and secondary givens or thens. Both the “Extract method” refactoring and the use of test data builders help a lot in reducing that verbosity and in bringing back the readability of the test. There’s a parallel to this when generating printable/readable output based off of these test specifications for business people. Often this business oriented reader will not care for a lot of the minute details that are in the givens, when or thens. No, what he wants is a narrative that describes the specific scenario, emphasizing the important ingredients. So, how do we get that? Custom formatting. Not by overriding .ToString() on your messages, although you could still do that for other, more verbose purposes, but by associating a specific “narrative writer” with your message or scenario. I acknowledge this is pretty niche, but in my opinion it’s important if you want to stick to executable specifications and not revert to narrative on one side, code on the other side. High coupling is a desired property in this area.

Test data duplication

This may sound like it’s the same as verbosity, but it isn’t. It’s related, yes, but not the same thing. I started noticing the same data(**) being set up across a set of tests. Not a random set of tests, but a set of tests that focused on a certain feature, covered by one or more use cases. Basically, events and commands that were used in conjunction, with data that flowed from givens, over to when, into thens. Obviously, this is to be expected since it’s the very nature of why one writes these tests. Now, you’d think that test data builders solve this problem. I’m inclined to say yes, if you allow them to be authored for a set of tests. That’s a convoluted way of saying there are no general purpose test data builders that will work in each and every scenario. Now, you could move the variation that exists into those tests, but then you’d notice the duplication and, frankly, verbosity again. So, there’s some “thing” sitting in between the test data builders and that set of tests. I call it the Test Model, abbreviated just Model. It captures the data, the events and commands, using methods and properties as I see fit, and is used in each of those tests. Some tests may put forward mild variations, either inline or as explicit methods, of existing events and/or commands, but that’s okay. My gutt tells me I’m not at the end of the line here, but it’s as far as I’ve gotten.

(**) I might be using data and messages interchangeably. I blame the wine.

Isolation

This must be my favorite advantage of writing test specifications this way: the safety net that allows me to restructure things on the inside without my tests breaking because there is no coupling to an actual type. Refactoring on the inside does not affect any of the tests I’ve written using messages. I cherish this freedom enormously. Does that mean I don’t write any unit tests for the objects on the inside? No, but I consider those to be more brittle. That’s not a problem per sé, that’s just something you need to be aware of. Now, how far does that safety net stretch, you might wonder? As long as you don’t change the structure nor the semantics of the messages, I’d say pretty far. Again, these messages are capturing your assumptions and understanding, your conversation with that person that knows a thing or two about the business, better listen good, get them right and save yourself from a few refuctors.

Dependencies

Although I haven’t found much use for dependencies/services I needed to use on the inside, when I did, I found that this way of testing can work in full conjunction with mocking, stubbing, and/or faking. Why? Well, these dependencies are mostly important to the execution part of a specification, not its declaration. You can freely pass these along with the specification to the executor of the specification. The executor is responsible to make it available at the right place and the right time (***).

(***) Remind me again what an inversion of control container does? 😉

Pipeline

Because the execution of test specifications is decoupled from its declaration, it’s well suited to make sure you’re not testing structurally invalid commands (or events if you want to go that far). Before executing your when, how hard would it be to validate that the command you’re executing is actually structurally valid? If your command’s constructor takes care of that you won’t even be able to complete the declaration (****). If you’ve made some other piece of code responsible for validating that type of command, then that’s what is wellsuited to hook into the test specification execution pipeline. The pipeline is also suited to print out a readable form of the test specification as it is being executed.

(****) I have my reservations vis-à-vis said approach, but to each his own.

Tests are data

At the end of the tunnel, this is the nirvana you reach. Why write these test specifications by hand when you can leverage that battalion of acceptance testers and end-users? Give them the tooling to record their scenarios. Sure, there’s still value in capturing those initial conversations and those initial scenarios, but think of all the real world variations that end-users are generating on a day to day basis. To me, this is not some wild dream, this is the path forward, albeit littered with a few maintainability obstacles in my mind, but nothing I’m not willing to cope with.
On a smaller scale, I found that if you leverage the test case generation features of your unit testing framework, you can already take baby steps in this direction. Think of the variations above. How hard would it be to consider either the set of givens or whens as nothing more than a bunch of test case data you feed into the same test? Think about how many lines of duplicate code that would save.

Conclusion

So there you have it, an experience report from the trenches. Overall, I’m very “content” having written tests this way. I’ve made a lot of mistakes along the way, but that was to be expected. It should come as no surprise that many of these hard learned lessons defined the shape of AggregateSource and AggregateSource.Testing.

Event Enrichment

Event or more generally message enrichment, the act of adding information to a message, comes in many shapes and sizes. Below I scribbled down some of my thoughts on the subject.

Metadata

Metadata, sometimes referred to as out of band data, is typically represented separately from the rest of the payload. Reusing the underlying protocol/service is the sane thing to do. HTTP, SOAP, Amazon AWS/Windows Azure API calls, Event Store‘s event metadata, etc … are just a few examples of how metadata could be carried around next to the payload. Other examples are explicit envelopes or message wrappers, such as the Transmission Wrapper and the Intermediate Control Act Wrapper in the HL7 V3 specification.
From a code perspective (and from my experience) the addition of metadata tends to happen in one place. Close to the metal boundary if you will, e.g. http handlers, message listeners, application services, etc …

Separate concerns

Sometimes you need to augment a message with an extra piece of data, but it feels more natural to make another piece of code responsible for adding that data. A typical example I can think of are things that have a name but you only have an identifier to start with. In an event-sourced domain model(*), think of an aggregate that holds a soft reference (an identifier if you will) to another aggregate, but the event – about to be produced – would benefit from including the name of the referenced aggregate at that time. Many events may fall in the same boat. Having a dedicated piece of code doing the enrichment could make things more explicit in your code.
Other times you may have a model that is computationally intensive, and adding the extra data would result in sub optimal data-access. How so? Well, each operation would require say a query, possibly querying the same data over and over again. While caching could mitigate some of this, having another piece of code do the enrichment could allow you to deal with this situation more effectively (e.g. batching). Not only that, but it could also make it very explicit what falls into to the category of enrichment and what not. This nuance may become even more important when you’re working as a team.
Sometimes the internal shape of your event may not be the external shape of your event, meaning they may need less, more, or even different data for various consumers. Whether such a transformation really is an enrichment is debatable. But assuming it is, it’s just another one of those situations where enrichment makes sense.
An advantage of doing enrichment separately is that certain things never enter your model. As an example, an aggregate in an event-sourced domain model, does it really need the version it is at or its identity(**)? Could I not tack that information onto the related events as I’m about to persist them? Sure I can. No need to drag(**) those things into my model.
The enrichment could be implemented using an explicit event enricher that runs either synchronously in your event producing pipeline or asynchronously, depending on what makes the most sense. Using event builders has proven to be a killer combo.

public class BeerNameEnricher : 
  IEnrich<BeerRecipeDescribed>, 
  IEnrich<BeerRecipeAltered> {

  Dictionary<string, string> _lookupNameOfBeerUsingId;

  public BeerNameEnricher(Dictionary<string, string> lookupNameOfBeerUsingId) {
    _lookupNameOfBeerUsingId = lookupNameOfBeerUsingId;
  }

  public BeerRecipeDescribed Enrich(BeerRecipeDescribed @event) {
    return @event.UsingBeerNamed(_lookupNameOfBeerUsingId[@event.BeerId]);
  }

  public BeerRecipeAltered Enrich(BeerRecipeAltered @event) {
    return @event.UsingBeerNamed(_lookupNameOfBeerUsingId[@event.BeerId]);
  }
}

(*): Familiarity with Domain Driven Design is assumed.
(**): Not every situation warrants this.

EventBuilders – Revisited

Introduction

I’ve been using event builders for some time now. With time and practice comes experience (at least that’s the plan), both good and bad. Investing in event or – more general – message builders means first and foremost investing in language. If you’re not willing to put in that effort, don’t bother using them, at least not for the purpose of spreading them around your entire codebase. That piece of distilled wisdom applies equally to messages themselves, as far as I’m concerned. Builders are appealing because, if you do make the investment in embedding proper language, they can make your code a bit more readable. Granted, in a language like C# (at least the recent versions), that might have become less of a concern since things like object initializers and named arguments can go a long way to improve readability.

Builders are very similar to Test Data Builders. In fact, if you’re writing test specifications that use some sort of Given-When-Then syntax, builders can be useful to take the verbosity out of those specifications. You can tuck away test-specific, pre-initialized builders behind properties or behind test-suite specific classes. A lesser known feature of builders (especially the mutable kind – more about that below) is that you can pass them around, getting them to act as data collectors, because the state they need might not be available at their construction site(*). A well crafted model might rub this even more in your face (think about all the data in your value objects, entities, etc …). If you take a more functional approach – the fashionable thing to do these days – to passing them around, you can use immutable builders and hand back a new copy with the freshly collected values.

Messages, as in their representation in code, go hand in hand with serialization. There are many ways of doing serialization, with hand rolled, code generated and reflection based being the predominant ones. Sometimes the serialization library you depend upon comes with its own quirks. At that point, builders could be useful to insulate the rest of your code from having to know about those quirks or how to handle them. Whether you really need another “abstraction” sitting in between is debatable.

Composition is often overlooked when defining messages, resulting in flat, dictionary-like data dumpsters. Yet both json and xml – probably the most predominant textual serialization formats – allow by their very nature to define information in a hierarchic way, ergo composing messages from smaller bits of information. Not that I particularly believe that doing so is tied to the choosen serialization format. This is another area builders can help since they could be conceived around these highly cohesive bits of information, at least if your model has a similar shape (not that it has to). Message builders could then leverage message part builders or code could use message part builders to feed the proper data into message builders. This ties in nicely with the builder being a data collector.

(*) Construction site, i.e. the place where they are created.

A world of flavors

Below I’ll briefly glance over various flavors I’ve used myself or seen floating around. This post is a bit heavy on the (C#) code side of things and repetitive (on purpose, I might add). For demonstration purposes, I’ve shamelessly stolen and mutated a bit of code from The CQRS Journey.

Flavor 1 – Mutable event with getters and setters – no builder

namespace Flavor1_MutableEvent_GettersNSetters_NoBuilder {
  public class SeatTypeCreated {
    public Guid ConferenceId { get; set; }
    public Guid SeatTypeId { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public int Quantity { get; set; }

    // Optional ctor (if you're not happy with the datatype defaults)
    public SeatTypeCreated() {
      ConferenceId = Guid.Empty;
      SeatTypeId = Guid.Empty;
      Name = string.Empty;
      Description = string.Empty;
      Quantity = 0;
    }

    public override string ToString() {
      return string.Format("New seat type created '{0}' ({1}) for conference '{2}': {3}. Initial seating quantity is {4}.", Name, SeatTypeId, ConferenceId, Description, Quantity);
    }
  }

  public static class SampleUsage {
    public static void Show() {
      var _ = new SeatTypeCreated {
        ConferenceId = Guid.NewGuid(),
        SeatTypeId = Guid.NewGuid(),
        Name = "Terrace Level",
        Description = "Luxurious, bubblegum stain free seats.",
        Quantity = 25
      };

      Console.WriteLine(_);

      _.Quantity = 35;

      Console.WriteLine(_);
    }
  }
}

This is what I call the I’m in a hurry version where you don’t see nor feel the need to have builders and you’re not particularly worried about events/messages getting mutated. It’s still pretty descriptive due to the object intializers. There’s a time and place for everything.

Mutable messages are an anti-pattern. They are the path to a system that is held together with duct tape and bubble gum.” – Greg Young anno 2008.

I don’t know if Greg still feels as strongly about it today. You’ll have to ask him. Great quoting material is all I can say.

Flavor 2 – Immutable event with getters and readonly fields – no builder

namespace Flavor2_ImmutableEvent_GettersNReadOnlyFields_NoBuilder {
  public class SeatTypeCreated {
    readonly Guid _conferenceId;
    readonly Guid _seatTypeId;
    readonly string _name;
    readonly string _description;
    readonly int _quantity;

    public Guid ConferenceId {
      get { return _conferenceId; }
    }

    public Guid SeatTypeId {
      get { return _seatTypeId; }
    }

    public string Name {
      get { return _name; }
    }

    public string Description {
      get { return _description; }
    }

    public int Quantity {
      get { return _quantity; }
    }

    public SeatTypeCreated(Guid conferenceId, Guid seatTypeId, string name, string description, int quantity) {
      _conferenceId = conferenceId;
      _seatTypeId = seatTypeId;
      _name = name;
      _description = description;
      _quantity = quantity;
    }

    public override string ToString() {
      return string.Format("New seat type created '{0}' ({1}) for conference '{2}': {3}. Initial seating quantity is {4}.", _name, _seatTypeId, _conferenceId, _description, _quantity);
    }
  }

  public static class SampleUsage {
    public static void Show() {
      var _ = new SeatTypeCreated(
        conferenceId: Guid.NewGuid(),
        seatTypeId: Guid.NewGuid(),
        name: "Terrace Level",
        description: "Luxurious, bubblegum stain free seats.",
        quantity: 25
      );

      Console.WriteLine(_);

      var __ = new SeatTypeCreated(_.ConferenceId, _.SeatTypeId, _.Name, _.Description, 35);

      Console.WriteLine(__);
    }
  }
}

This is the immutable companion to the previous one. Named arguments cater for the readability in this one. A bit heavy on the typing if you want to mutate the event. It also implies you collect ALL information before you’re able to construct it.

Flavor 3 – Mutable event with getters and setters – implicit builder

namespace Flavor3_MutableEvent_GettersNSetters_ImplicitBuilder {
  public class SeatTypeCreated {
    public Guid ConferenceId { get; set; }
    public Guid SeatTypeId { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public int Quantity { get; set; }

    public SeatTypeCreated AtConference(Guid identifier) {
      ConferenceId = identifier;
      return this;
    }
    
    public SeatTypeCreated IdentifiedBy(Guid identifier) {
      SeatTypeId = identifier;
      return this;
    }
    
    public SeatTypeCreated Named(string value) {
      Name = value;
      return this;
    }
    
    public SeatTypeCreated DescribedAs(string value) {
      Description = value;
      return this;
    }
    
    public SeatTypeCreated WithInitialQuantity(int value) {
      Quantity = value;
      return this;
    }

    // Optional ctor (if you're not happy with the datatype defaults)
    public SeatTypeCreated() {
      ConferenceId = Guid.Empty;
      SeatTypeId = Guid.Empty;
      Name = string.Empty;
      Description = string.Empty;
      Quantity = 0;
    }

    public override string ToString() {
      return string.Format("New seat type created '{0}' ({1}) for conference '{2}': {3}. Initial seating quantity is {4}.", Name, SeatTypeId, ConferenceId, Description, Quantity);
    }
  }

  public static class SampleUsage {
    public static void Show() {
      var _ = new SeatTypeCreated {
          Quantity = 35 //does not do what you expect ...
        }.
        AtConference(Guid.NewGuid()).
        IdentifiedBy(Guid.NewGuid()).
        Named("Terrace Level").
        DescribedAs("Luxurious, bubblegum stain free seats.").
        WithInitialQuantity(25);

      Console.WriteLine(_);

      _.Quantity = 35;

      Console.WriteLine(_);
    }
  }
}

This is what I call the “AWS” variety since it’s what is being used in Amazon’s SDK for .NET. You get the readability of the builder with methods right on the message object itself and after each method you have access to the mutable instance of the message. What’s not to like, except for the mutability?

Flavor 4 – Mutable event with getters and setters – explicit builder

namespace Flavor4_MutableEvent_GettersNSetters_ExplicitBuilder {
  public class SeatTypeCreated {
    public Guid ConferenceId { get; set; }
    public Guid SeatTypeId { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public int Quantity { get; set; }

    // Optional ctor (if you're not happy with the datatype defaults)
    public SeatTypeCreated() {
      ConferenceId = Guid.Empty;
      SeatTypeId = Guid.Empty;
      Name = string.Empty;
      Description = string.Empty;
      Quantity = 0;
    }

    // Optional convenience method
    public SeatTypeCreatedBuilder ToBuilder() {
      return new SeatTypeCreatedBuilder().
        AtConference(ConferenceId).
        IdentifiedBy(SeatTypeId).
        Named(Name).
        DescribedAs(Description).
        WithInitialQuantity(Quantity);
    }

    public override string ToString() {
      return string.Format("New seat type created '{0}' ({1}) for conference '{2}': {3}. Initial seating quantity is {4}.", Name, SeatTypeId, ConferenceId, Description, Quantity);
    }
  }

  public class SeatTypeCreatedBuilder {
    Guid _conferenceId;
    Guid _seatTypeId;
    string _name;
    string _description;
    int _quantity;

    // Optional ctor (if you're not happy with the datatype defaults)
    public SeatTypeCreatedBuilder() {
      _conferenceId = Guid.Empty;
      _seatTypeId = Guid.Empty;
      _name = string.Empty;
      _description = string.Empty;
      _quantity = 0;
    }

    public SeatTypeCreatedBuilder AtConference(Guid identifier) {
      _conferenceId = identifier;
      return this;
    }

    public SeatTypeCreatedBuilder IdentifiedBy(Guid identifier) {
      _seatTypeId = identifier;
      return this;
    }

    public SeatTypeCreatedBuilder Named(string value) {
      _name = value;
      return this;
    }

    public SeatTypeCreatedBuilder DescribedAs(string value) {
      _description = value;
      return this;
    }

    public SeatTypeCreatedBuilder WithInitialQuantity(int value) {
      _quantity = value;
      return this;
    }

    public SeatTypeCreated Build() {
      return new SeatTypeCreated {
        ConferenceId = _conferenceId,
        SeatTypeId = _seatTypeId,
        Name = _name,
        Description = _description,
        Quantity = _quantity
      };
    }
  }

  public static class SampleUsage {
    public static void Show() {
      var _ = new SeatTypeCreatedBuilder().
        AtConference(Guid.NewGuid()).
        IdentifiedBy(Guid.NewGuid()).
        Named("Terrace Level").
        DescribedAs("Luxurious, bubblegum stain free seats.").
        WithInitialQuantity(25).
        Build();

      Console.WriteLine(_);

      _.Quantity = 35;

      Console.WriteLine(_);

      var __ = _.ToBuilder().
        WithInitialQuantity(45).
        Build();

      Console.WriteLine(__);
    }
  }
}

This is just a simple variation on the above with the builder pulled out of the message object. Might come in handy if you don’t have control over the messages but you still fancy builders (the ToBuilder could become an extension method in that case).

Flavor 5 – Immutable event with getters and readonly fields – implicit builder

namespace Flavor5_ImmutableEvent_GettersNReadOnlyFields_ImplicitBuilder {
  public class SeatTypeCreated {
    readonly Guid _conferenceId;
    readonly Guid _seatTypeId;
    readonly string _name;
    readonly string _description;
    readonly int _quantity;

    public Guid ConferenceId {
      get { return _conferenceId; }
    }

    public Guid SeatTypeId {
      get { return _seatTypeId; }
    }

    public string Name {
      get { return _name; }
    }

    public string Description {
      get { return _description; }
    }

    public int Quantity {
      get { return _quantity; }
    }

    public SeatTypeCreated() {
      _conferenceId = Guid.Empty;
      _seatTypeId = Guid.Empty;
      _name = string.Empty;
      _description = string.Empty;
      _quantity = 0;
    }

    SeatTypeCreated(Guid conferenceId, Guid seatTypeId, string name, string description, int quantity) {
      _conferenceId = conferenceId;
      _seatTypeId = seatTypeId;
      _name = name;
      _description = description;
      _quantity = quantity;
    }

    public SeatTypeCreated AtConference(Guid identifier) {
      return new SeatTypeCreated(identifier, _seatTypeId, _name, _description, _quantity);
    }

    public SeatTypeCreated IdentifiedBy(Guid identifier) {
      return new SeatTypeCreated(_conferenceId, identifier, _name, _description, _quantity);
    }

    public SeatTypeCreated Named(string value) {
      return new SeatTypeCreated(_conferenceId, _seatTypeId, value, _description, _quantity);
    }

    public SeatTypeCreated DescribedAs(string value) {
      return new SeatTypeCreated(_conferenceId, _seatTypeId, _name, value, _quantity);
    }

    public SeatTypeCreated WithInitialQuantity(int value) {
      return new SeatTypeCreated(_conferenceId, _seatTypeId, _name, _description, value);
    }

    public override string ToString() {
      return string.Format("New seat type created '{0}' ({1}) for conference '{2}': {3}. Initial seating quantity is {4}.", _name, _seatTypeId, _conferenceId, _description, _quantity);
    }
  }

  public static class SampleUsage {
    public static void Show() {
      var _ = new SeatTypeCreated().
        AtConference(Guid.NewGuid()).
        IdentifiedBy(Guid.NewGuid()).
        Named("Terrace Level").
        DescribedAs("Luxurious, bubblegum stain free seats.").
        WithInitialQuantity(25);

      Console.WriteLine(_);

      var __ = _.WithInitialQuantity(35);

      Console.WriteLine(__);
    }
  }
}

This is an immutable version of the “AWS” variety, giving you a new event upon each call. Great if you intend to have different branches off of the same message, such as for testing purposes. Pushes you down the functional alley if you want to collect data since each mutation gives you a new instance. It’s probably my favorite when dealing with flat datastructures. I’m no expert, but I’m pretty sure the immutable versions are going to use more memory (or at least annoy GC’s GEN 0). Whether that’s something you should be overly focused on highly depends on your particular context.

“In the land of IO, the blind man, who doesn’t measure, optimizes for the wrong thing first.” – Yves anno 2013

Flavor 6 – Immutable event with getters and readonly fields – Mutable explicit builder

namespace Flavor6_ImmutableEvent_GettersNReadOnlyFields_MutableExplicitBuilder {
  public class SeatTypeCreated {
    readonly Guid _conferenceId;
    readonly Guid _seatTypeId;
    readonly string _name;
    readonly string _description;
    readonly int _quantity;

    public Guid ConferenceId {
      get { return _conferenceId; }
    }

    public Guid SeatTypeId {
      get { return _seatTypeId; }
    }

    public string Name {
      get { return _name; }
    }

    public string Description {
      get { return _description; }
    }

    public int Quantity {
      get { return _quantity; }
    }

    public SeatTypeCreated(Guid conferenceId, Guid seatTypeId, string name, string description, int quantity) {
      _conferenceId = conferenceId;
      _seatTypeId = seatTypeId;
      _name = name;
      _description = description;
      _quantity = quantity;
    }

    // Optional convenience method
    public SeatTypeCreatedBuilder ToBuilder() {
      return new SeatTypeCreatedBuilder().
        AtConference(ConferenceId).
        IdentifiedBy(SeatTypeId).
        Named(Name).
        DescribedAs(Description).
        WithInitialQuantity(Quantity);
    }

    public override string ToString() {
      return string.Format("New seat type created '{0}' ({1}) for conference '{2}': {3}. Initial seating quantity is {4}.", _name, _seatTypeId, _conferenceId, _description, _quantity);
    }
  }

  public class SeatTypeCreatedBuilder {
    Guid _conferenceId;
    Guid _seatTypeId;
    string _name;
    string _description;
    int _quantity;

    // Optional ctor (if you're not happy with the datatype defaults)
    public SeatTypeCreatedBuilder() {
      _conferenceId = Guid.Empty;
      _seatTypeId = Guid.Empty;
      _name = string.Empty;
      _description = string.Empty;
      _quantity = 0;
    }

    public SeatTypeCreatedBuilder AtConference(Guid identifier) {
      _conferenceId = identifier;
      return this;
    }

    public SeatTypeCreatedBuilder IdentifiedBy(Guid identifier) {
      _seatTypeId = identifier;
      return this;
    }

    public SeatTypeCreatedBuilder Named(string value) {
      _name = value;
      return this;
    }

    public SeatTypeCreatedBuilder DescribedAs(string value) {
      _description = value;
      return this;
    }

    public SeatTypeCreatedBuilder WithInitialQuantity(int value) {
      _quantity = value;
      return this;
    }

    public SeatTypeCreated Build() {
      return new SeatTypeCreated(_conferenceId, _seatTypeId, _name, _description, _quantity);
    }
  }

  public static class SampleUsage {
    public static void Show() {
      var _ = new SeatTypeCreatedBuilder().
        AtConference(Guid.NewGuid()).
        IdentifiedBy(Guid.NewGuid()).
        Named("Terrace Level").
        DescribedAs("Luxurious, bubblegum stain free seats.").
        WithInitialQuantity(25).
        Build();

      Console.WriteLine(_);

      var builder = _.ToBuilder();

      var __ = builder.
        Named("Balcony level").
        DescribedAs("High end seats. No smoking policy.").
        WithInitialQuantity(45).
        Build();

      Console.WriteLine(__);

      var ___ = builder.
        WithInitialQuantity(45).
        Build(); //Probably not the result you expect

      Console.WriteLine(___);
    }
  }
}

This kind of builder is great if you want to collect data using the same instance before producing the immutable message.

Flavor 7 – Immutable event with getters and readonly fields – Immutable explicit builder

namespace Flavor7_ImmutableEvent_GettersNReadOnlyFields_ImmutableExplicitBuilder {
  public class SeatTypeCreated {
    readonly Guid _conferenceId;
    readonly Guid _seatTypeId;
    readonly string _name;
    readonly string _description;
    readonly int _quantity;

    public Guid ConferenceId {
      get { return _conferenceId; }
    }

    public Guid SeatTypeId {
      get { return _seatTypeId; }
    }

    public string Name {
      get { return _name; }
    }

    public string Description {
      get { return _description; }
    }

    public int Quantity {
      get { return _quantity; }
    }

    public SeatTypeCreated(Guid conferenceId, Guid seatTypeId, string name, string description, int quantity) {
      _conferenceId = conferenceId;
      _seatTypeId = seatTypeId;
      _name = name;
      _description = description;
      _quantity = quantity;
    }

    // Optional convenience method
    public SeatTypeCreatedBuilder ToBuilder() {
      return new SeatTypeCreatedBuilder().
        AtConference(ConferenceId).
        IdentifiedBy(SeatTypeId).
        Named(Name).
        DescribedAs(Description).
        WithInitialQuantity(Quantity);
    }

    public override string ToString() {
      return string.Format("New seat type created '{0}' ({1}) for conference '{2}': {3}. Initial seating quantity is {4}.", _name, _seatTypeId, _conferenceId, _description, _quantity);
    }
  }

  public class SeatTypeCreatedBuilder {
    readonly Guid _conferenceId;
    readonly Guid _seatTypeId;
    readonly string _name;
    readonly string _description;
    readonly int _quantity;

    // Optional ctor content (if you're not happy with the datatype defaults)
    public SeatTypeCreatedBuilder() {
      _conferenceId = Guid.Empty;
      _seatTypeId = Guid.Empty;
      _name = string.Empty;
      _description = string.Empty;
      _quantity = 0;
    }

    SeatTypeCreatedBuilder(Guid conferenceId, Guid seatTypeId, string name, string description, int quantity) {
      _conferenceId = conferenceId;
      _seatTypeId = seatTypeId;
      _name = name;
      _description = description;
      _quantity = quantity;
    }

    public SeatTypeCreatedBuilder AtConference(Guid identifier) {
      return new SeatTypeCreatedBuilder(identifier, _seatTypeId, _name, _description, _quantity);
    }

    public SeatTypeCreatedBuilder IdentifiedBy(Guid identifier) {
      return new SeatTypeCreatedBuilder(_conferenceId, identifier, _name, _description, _quantity);
    }

    public SeatTypeCreatedBuilder Named(string value) {
      return new SeatTypeCreatedBuilder(_conferenceId, _seatTypeId, value, _description, _quantity);
    }

    public SeatTypeCreatedBuilder DescribedAs(string value) {
      return new SeatTypeCreatedBuilder(_conferenceId, _seatTypeId, _name, value, _quantity);
    }

    public SeatTypeCreatedBuilder WithInitialQuantity(int value) {
      return new SeatTypeCreatedBuilder(_conferenceId, _seatTypeId, _name, _description, value);
    }

    public SeatTypeCreated Build() {
      return new SeatTypeCreated(_conferenceId, _seatTypeId, _name, _description, _quantity);
    }
  }

  public static class SampleUsage {
    public static void Show() {
      var _ = new SeatTypeCreatedBuilder().
        AtConference(Guid.NewGuid()).
        IdentifiedBy(Guid.NewGuid()).
        Named("Terrace Level").
        DescribedAs("Luxurious, bubblegum stain free seats.").
        WithInitialQuantity(25).
        Build();
      Console.WriteLine(_);

      var builder = _.ToBuilder();

      var __ = builder.
        Named("Balcony level").
        DescribedAs("High end seats. No smoking policy.").
        WithInitialQuantity(45).
        Build();

      Console.WriteLine(__);

      var ___ = builder.
        WithInitialQuantity(45).
        Build(); //The result you expect

      Console.WriteLine(___);
    }
  }
}

This one is only useful in the odd case you’d like to do branching off of the builder. It might be worthy to read Greg’s musings on this and the previous one.

Flavor 8 – Immutable event with getters and readonly fields – Implicit builder combined with mutable explicit builder

namespace Flavor8_ImmutableEvent_GettersNReadOnlyFields_ImplicitBuilderCombinedWithMutableExplicitBuilder {
  public class SeatTypeCreated {
    readonly Guid _conferenceId;
    readonly Guid _seatTypeId;
    readonly string _name;
    readonly string _description;
    readonly int _quantity;

    public Guid ConferenceId {
      get { return _conferenceId; }
    }

    public Guid SeatTypeId {
      get { return _seatTypeId; }
    }

    public string Name {
      get { return _name; }
    }

    public string Description {
      get { return _description; }
    }

    public int Quantity {
      get { return _quantity; }
    }

    public SeatTypeCreated() {
      _conferenceId = Guid.Empty;
      _seatTypeId = Guid.Empty;
      _name = string.Empty;
      _description = string.Empty;
      _quantity = 0;
    }

    SeatTypeCreated(Guid conferenceId, Guid seatTypeId, string name, string description, int quantity) {
      _conferenceId = conferenceId;
      _seatTypeId = seatTypeId;
      _name = name;
      _description = description;
      _quantity = quantity;
    }

    public SeatTypeCreated AtConference(Guid identifier) {
      return new SeatTypeCreated(identifier, _seatTypeId, _name, _description, _quantity);
    }

    public SeatTypeCreated IdentifiedBy(Guid identifier) {
      return new SeatTypeCreated(_conferenceId, identifier, _name, _description, _quantity);
    }

    public SeatTypeCreated Named(string value) {
      return new SeatTypeCreated(_conferenceId, _seatTypeId, value, _description, _quantity);
    }

    public SeatTypeCreated DescribedAs(string value) {
      return new SeatTypeCreated(_conferenceId, _seatTypeId, _name, value, _quantity);
    }

    public SeatTypeCreated WithInitialQuantity(int value) {
      return new SeatTypeCreated(_conferenceId, _seatTypeId, _name, _description, value);
    }

    public Builder ToBuilder() {
      return new Builder().
        AtConference(ConferenceId).
        IdentifiedBy(SeatTypeId).
        Named(Name).
        DescribedAs(Description).
        WithInitialQuantity(Quantity);
    }

    public override string ToString() {
      return string.Format("New seat type created '{0}' ({1}) for conference '{2}': {3}. Initial seating quantity is {4}.", _name, _seatTypeId, _conferenceId, _description, _quantity);
    }

    public class Builder {
      Guid _conferenceId;
      Guid _seatTypeId;
      string _name;
      string _description;
      int _quantity;

      // Optional ctor (if you're not happy with the datatype defaults)
      internal Builder() {
        _conferenceId = Guid.Empty;
        _seatTypeId = Guid.Empty;
        _name = string.Empty;
        _description = string.Empty;
        _quantity = 0;
      }

      public Builder AtConference(Guid identifier) {
        _conferenceId = identifier;
        return this;
      }

      public Builder IdentifiedBy(Guid identifier) {
        _seatTypeId = identifier;
        return this;
      }

      public Builder Named(string value) {
        _name = value;
        return this;
      }

      public Builder DescribedAs(string value) {
        _description = value;
        return this;
      }

      public Builder WithInitialQuantity(int value) {
        _quantity = value;
        return this;
      }

      public SeatTypeCreated Build() {
        return new SeatTypeCreated(_conferenceId, _seatTypeId, _name, _description, _quantity);
      }
    }
  }

  public static class SampleUsage {
    public static void Show() {
      var _ = new SeatTypeCreated().
        AtConference(Guid.NewGuid()).
        IdentifiedBy(Guid.NewGuid()).
        Named("Terrace Level").
        DescribedAs("Luxurious, bubblegum stain free seats.").
        WithInitialQuantity(25);
      Console.WriteLine(_);

      var builder = _.ToBuilder();

      var __ = builder.
        Named("Balcony level").
        DescribedAs("High end seats. No smoking policy.").
        WithInitialQuantity(45).
        Build();

      Console.WriteLine(__);

      var ___ = _.
        WithInitialQuantity(45); //The result you expect

      Console.WriteLine(___);
    }
  }
}

A slight variation that may prove useful if you know by convention that builders are mutable and events/messages are not. This would allow you to pass a builder around whenever opportunity knocks and get back to the immutable shape when you’re done using the builder. I’ll spare you flavor 9 which combines explicit mutable and immutable builders with implicit builders.

Congratulations, achievement unlocked “scrolled to the end“.

Conclusion

By no means is this list of flavors finite. Other programming languages might have constructs which make authoring this way easier. Overall, I’m quite content and comfortable with using builders, having shifted somewhat more to the immutable side of the flavor spectrum. I’m pretty sure they’re not for everybody, and that’s just fine. The choice of which flavor to use highly depends on the scenario at hand, how comfortable a team is with them in general, what benefits could be gotten from them. I also feel you shouldn’t try to force the same type of builder on all events/messages since simplicity might not warrant their perceived complexity. A consistent approach is not something I’m particularly fond of since it’s mainly driven by the human desire to do everything the same way, not by rationalizing about what would be the best choice in a particular situation. Then again, I probably think too much – pretty sure some will think I’ve gone bananas. Maybe I have … what a way to end a post. Later!

Object Inheritance

When mentioning “object inheritance” most people immediately think of “class inheritance“. Alas, that’s not what it is. Quoting from Streamlined Object Modeling*:

Object inheritance allows two objects representing a single entity to be treated as one object by the rest of the system.

Put a different way, where class inheritance allows for code reuse, object inheritance allows for data reuse.

Application

Where could this be useful? Depending on the domain you are in, I’d say quite a few places. Whenever you replicate data from one object to another object, you should stop and consider if “object inheritance” could be applicable. While object inheritance does impose some constraints, it also saves you from writing reconciliation code to synchronize two or more objects (especially in systems that heavily rely on messaging).
A common example I like to use is the one of a video title and a video tape (**). From a real world point of view, a video tape has both the attributes of the tape itself and the title. Yet, if I modeled this as two separate objects, I run into trouble (a.k.a. synchronization woes). If I copy the title information to the tape upon creation of the tape, and I made an error in say the release date of the title, I now have to replicate that information to the “affected” tapes. Don’t get me wrong, sometimes this kind of behavior is desirable, i.e. it makes sense from a business perspective. But what if it doesn’t? That’s where “object inheritance” comes in. To a large extent, it can cover the scenarios that a synchronization based solution can, IF constraints are met.

Constraints

  • Localized data: “Object inheritance” assumes that the data of both objects lives close together. That might be a deal breaker for larger systems, or at least an indication that you’d have to consider co-locating the data of “both” objects.
  • Immutable data: One of the objects, the “parent”, its data is immutable during object inheritance. From an OOP perspective that means you can only invoke side-effect free functions on a “parent”.
  • “Parent” object responsibilities: the parent object contains information and behaviors that are valid across multiple contexts, multiple interactions, and multiple variations of an object.
  • “Child” object responsibilities: the child object represents the parent in a specialized context, in a particular interaction, or as a distinct variation.

The aforementioned book also states other constraints such as the child object exhibiting the parent object’s “profile” (think interface), but I find those less desirable in an environment that uses CQRS. For more in-depth information, I strongly suggest you read it. It has a wealth of information on this very topic.

Code

In its simplest form “object inheritance” looks and feels like object composition.

The composition can be hidden from child object consuming code behind a repository’s facade as shown below.

The gist of “object inheritance” is that the child object (the video tape) asks the parent (the video) for data or invokes side-effect free functions on the parent to accomplish its task.

Lunch -> Not Free

Whether you go down the route of “object inheritance” or (message based) “synchronization”, you will have to think about object life-cycles (both parent and child). Sometimes it’s desirable for the child to have a still view of the parent, a snapshot if you will. Other times you may want the child to always see the parent in its most current form. Children can even change parent during their life-cycle. Other children may want a more “controlled” view of the parent, rolling forward or backward based on their needs or context. You can get pretty sophisticated with this technique, especially in an event sourced domain model since it’s very well suited to roll up to a certain point in time or a certain revision of a parent. In the same breath, I should also say that you can very well shoot yourself in the foot with this technique, especially if the composition is to be taking place in the user interface, in which case there’s no point in using this. It’s also very easy to bury yourself in the pit of abstraction when talking about “child” and “parent”, so do replace those words with nouns from your domain, and get your archetypes right.

All in all, I’ve found this a good tool in the box, that plays nicely with aggregates, event-sourcing, even state-based models that track the concept of time. It’s not something I use everyday, but whenever I did, I ended up with less code. YMMV.

(*) Streamlined object modeling is a successor to Peter Coad’s “Modeling In Color“. A Kindle version of the former is available.
(**) I’m from the “cassette & video” ages.

The Money Box

Once in a while I hear/read people struggle with projection performance. The root causes of their performance issues are diverse:

  • using the same persistence behavior during projection rebuild as during live/production build
  • use of ill fit technology (here’s looking at you EF) or usage of said technology in the wrong way
  • too wide a persistence interface which makes it difficult to optimize for performance
  • no batching support or batching as an afterthought
  • not thinking about the implication of performing reads during projection

Ever heard of the book Refactoring to Patterns? It has a nice refactoring in there, called Move Accumulation to Collecting Parameter that refers to the Collecting Parameter pattern on C2. How would this help with thy projections? Well, what if you could decouple the act of performing an action from collecting what is required to be able to perform an action? Put another way, what if you decouple the act of executing SQL DML statements from collecting those statements during projection (a.k.a. event handling)? So, instead of …

… we add another level of indirection …

The most noticeable differences are the decoupling from persistence technology(*), no reads, and no promises with regard to when the requested operations will be executed/flushed to storage. Usually, the IProjectionSqlOperations interface will have a very small surface (i.e. low member count), covering INSERT, UPDATE and DELETE. During live/production projection building you could have an implementation of this interface (a.k.a. strategy) that flushes as soon as an operation is requested.

However, the more interesting implementations are the ones that are used during rebuild.

This implementation translates the requested operations into sql statement objects (abstracted by ISqlStatement) and pushes them onto something that observes these sql statements. The observer couldn’t care less what the actual sql statements are (that happened in the projection handler above). The simplest observer implementation could look something like this …

Of course, collecting in and by itself is not all that useful. You have to do something with what you’ve collected (“the money in the box”). Let’s look at another observer that takes a slightly different approach.

Without diving too much into the details, this observer flushes statements to the database as soon as a hard-coded threshold is reached. It does so in a batch-like fashion to minimize the number of roundtrips, but still adhering to the limitations that come with this particular ADO.NET data provider. Other implementations use SqlBulkCopy to maximize performance (but come with their own limitations). Depending on what memory resources a server has you could get pretty creative as to which strategy you choose to rebuild a large projection.

Conclusion

I’ve shown you SQL centric projections, but please, do step out of the box. Nothing is stopping you from producing and collecting “HttpStatements” for your favorite key-value store or “FileOperations” for your file-based projections. Nothing is stopping you from making different choices for the producing and consuming side. Nothing is stopping you from doing the “statement” execution in an asynchronous and/or parallel fashion. It’s just a matter of exploring your options and use what works in your environment. Next time I’ll show you how reading fits into all this …

(*): Yeah, yeah, I know, too much abstraction kills kittens …

Using Windows Azure to build a simple EventStore

Continuing my journey, I figured it shouldn’t be that hard to apply “Your EventStream is a linked list” using Windows Azure Storage Services.

Disclaimer: Again, this is just a proof of concept, a way to get started, not production code … obviously. I’m assuming you know your way around Windows Azure (how to set up an account, stuff like that) and how the .NET SDK works.

Just like with Amazon Web Services, the first thing to create are the containers that will harbor the changesets and aggregate heads. Unlike the AWS example, I’ve chosen to store both using the same service, Windows Azure Blob Storage. Why? Because it offers optimistic concurrency by leveraging ETags. The only unfortunate side-effect is that it seems to force me to break CQS, but I can live with that for now. The Windows Azure Blob Storage provides convenience methods to only create the containers when they don’t exist.

//Setting up the Windows Azure Blob Storage container to store changesets and aggregate heads

const string PrimaryAccessKey = "your-key-here-for-testing-purposes-only-ofcourse-;-)";
const string AccountName = "youraccountname";
const string ChangesetContainerName = "changesets";
const string AggregateContainerName = "aggregates";

var storageAccount =
  new CloudStorageAccount(
      new StorageCredentialsAccountAndKey(
            AccountName,
            PrimaryAccessKey),
      false);

var blobClient = storageAccount.CreateCloudBlobClient();

var changesetContainer = blobClient.GetContainerReference(ChangesetContainerName);
changesetContainer.CreateIfNotExist();

var aggregateContainer = blobClient.GetContainerReference(AggregateContainerName);
aggregateContainer.CreateIfNotExist();

Now that we’ve set up the infrastructure, let’s tackle the scenario of storing a changeset. The flow is pretty simple. First we try to store the changeset in the changeset container as a blob.

public class ChangesetDocument {
      public const long InitialVersion = 0;

      //Exposing internals to make the sample easier
      public Guid ChangesetId { get; set; }
      public Guid? ParentChangesetId { get; set; }
      public Guid AggregateId { get; set; }
      public long AggregateVersion { get; set; }
      public string AggregateETag { get; set; }

      public byte[] Content { get; set; }
}

//Assuming there's a changeset document we want to store,
//going by the variable name 'document'.

var changesetBlob = changesetContainer.GetBlobReference(document.ChangesetId.ToString());
var changesetUploadOptions = new BlobRequestOptions {
      AccessCondition = AccessCondition.None,
      BlobListingDetails = BlobListingDetails.None,
      CopySourceAccessCondition = AccessCondition.None,
      DeleteSnapshotsOption = DeleteSnapshotsOption.None,
      RetryPolicy = RetryPolicies.NoRetry(),
      Timeout = TimeSpan.FromSeconds(90),
      UseFlatBlobListing = false
};
changesetBlob.UploadByteArray(document.Content, changesetUploadOptions);

const string AggregateIdMetaName = "aggregateid";
const string AggregateVersionMetaName = "aggregateversion";
const string ChangesetIdMetaName = "changesetid";
const string ParentChangesetIdMetaName = "parentchangesetid";

//Set the meta-data of the changeset
//Notice how this doesn't need to be transactional
changesetBlob.Metadata[AggregateIdMetaName] = document.AggregateId.ToString();
changesetBlob.Metadata[AggregateVersionMetaName] = document.AggregateVersion.ToString();
changesetBlob.Metadata[ChangesetIdMetaName] = document.ChangesetId.ToString();
if(document.ParentChangesetId.HasValue)
      changesetBlob.Metadata[ParentChangesetIdMetaName] = document.ParentChangesetId.Value.ToString();
changesetBlob.SetMetadata();

If that goes well, we try to upsert the head of the aggregate to get it to point to this changeset. Below, we’re using the ETag of the aggregate head blob as a way of doing optimistic concurrency checking. It caters for both the initial (competing inserters) and update (competing updaters) concurrency.

public static class ExtensionsForChangeDocument {
      public static AccessCondition ToAccessCondition(this ChangesetDocument document) {
            if(document.AggregateVersion == ChangesetDocument.InitialVersion) {
                  return AccessCondition.IfNoneMatch("*");
            }
            return AccessCondition.IfMatch(document.AggregateETag);
      }
}

//Upsert the aggregate
var aggregateBlob = aggregateContainer.GetBlobReference(document.AggregateId.ToString());
var aggregateUploadOptions = new BlobRequestOptions {
      AccessCondition = document.ToAccessCondition(),
      BlobListingDetails = BlobListingDetails.None,
      CopySourceAccessCondition = AccessCondition.None,
      DeleteSnapshotsOption = DeleteSnapshotsOption.None,
      RetryPolicy = RetryPolicies.NoRetry(),
      Timeout = TimeSpan.FromSeconds(90),
      UseFlatBlobListing = false
};
aggregateBlob.UploadByteArray(document.ChangesetId.ToByteArray(), aggregateUploadOptions);
//Here's where we are breaking CQS if we'd like to cache the aggregate.
//This won't be a problem if we're re-reading the aggregate upon each behavior.
var eTag = aggregateBlob.Properties.ETag;

If the UploadByteArray operation throws a StorageClientException indicating that “The condition specified using HTTP conditional header(s) is not met.”, we know there was some form of optimistic concurrency. In such a case, it’s best to repeat the entire operation.
Now that we’ve dealt with writing, let’s take a look at reading. First we need to fetch the pointer to the last stored and approved changeset identifier.

var aggregateBlob = aggregateContainer.GetBlobReference(aggregateId.ToString());
var aggregateDownloadOptions = new BlobRequestOptions {
      AccessCondition = AccessCondition.None,
      BlobListingDetails = BlobListingDetails.None,
      CopySourceAccessCondition = AccessCondition.None,
      DeleteSnapshotsOption = DeleteSnapshotsOption.None,
      RetryPolicy = RetryPolicies.NoRetry(),
      Timeout = TimeSpan.FromSeconds(90),
      UseFlatBlobListing = false
};
var changesetId = new Guid?(new Guid(aggregateBlob.DownloadByteArray(aggregateDownloadOptions)));
var eTag = aggregateBlob.Properties.ETag;

Now that we’ve bootstrapped the reading process, we keep reading each changeset, until there’s no more changeset to read. Each approved changeset contains metadata that points to the previous approved changeset. It’s the responsibility of the calling code todo something useful with the read changesets (e.g. deserialize the content and replay each embedded event into the corresponding aggregate).

while(changesetId.HasValue) {
      var changesetBlob = changesetContainer.GetBlobReference(changesetId.Value.ToString());
      var changesetDownloadOptions = new BlobRequestOptions {
            AccessCondition = AccessCondition.None,
            BlobListingDetails = BlobListingDetails.None,
            CopySourceAccessCondition = AccessCondition.None,
            DeleteSnapshotsOption = DeleteSnapshotsOption.None,
            RetryPolicy = RetryPolicies.NoRetry(),
            Timeout = TimeSpan.FromSeconds(90),
            UseFlatBlobListing = false
      };
      var content = changesetBlob.DownloadByteArray(changesetDownloadOptions);
      changesetBlob.FetchAttributes();
      var document = new ChangesetDocument {
            AggregateETag = eTag,
            AggregateId = new Guid(changesetBlob.Metadata[AggregateIdMetaName]),
            AggregateVersion = Convert.ToInt64(changesetBlob.Metadata[AggregateVersionMetaName]),
            ChangesetId = new Guid(changesetBlob.Metadata[ChangesetIdMetaName]),
            Content = content,
      };
      if (changesetBlob.Metadata[ParentChangesetIdMetaName] != null)
            document.ParentChangesetId = new Guid(changesetBlob.Metadata[ParentChangesetIdMetaName]);

      yield return document;

      changesetId = document.ParentChangesetId;
}

The only *weird* thing with this code is that I’m propagating the aggregate’s head ETag using the changesets. It’s a modeling issue I’ll have to revisit ;-).
But that’s basically all there’s to it. In reality, you’ll need a lot more metadata and error handling to make this a success. I should point out that the performance of this service consumed on premise was better than what I experienced with AWS.

Conclusion

Using Windows Azure Storage Services is not so different from the Amazon Web Services in this case. However, this overall technique suffers from a few drawbacks. As mentioned before, upon concurrency, you might be wasting some storage space. Another drawback is the fact that you need to read all changesets that make up the history of an aggregate (or eventsource if want to decouple it from DDD terminology) before being able to apply the first event. There’s ways around this, such as storing all the changeset identifiers in the aggregate head if the total number of behaviors in an aggregate is low on average. You could even partition the aggregate head into multiple documents using nothing but its version number or count of applied behaviors to partition, but that’s the subject of a future exploration. I apologize if this post is a somewhat copy-paste of its AWS counterpart, but given the goal and similarities that was to be expected :-).

Using Amazon Web Services to build a simple Event Store

My post on “Your EventStream is a linked list” might have been somewhat abstract (a more prosaic version can be found here). By way of testing my theory, I set out to apply it using Amazon’s Web Services (AWS) stack. I used AWS DynamoDB as the transactional medium that would handle the optimistic concurrency, and AWS S3 to store the changesets.

Disclaimer: This is just a proof of concept, a way to get started. It’s not production code … obviously 🙂 I’m assuming you know your way around AWS (how to set up an account, stuff like that) and how the .NET SDK works.

The first thing todo is to create the container for the changesets. In AWS S3 those things are called buckets. This is a one time operation. You might want to make this conditional at the startup of your application/worker role/what-have-you. The AWS S3 API provides mechanisms to query for buckets, so you should be able to pull that off. On the other hand, when creating buckets using the REST API the HTTP PUT verb is used, and we all know that PUT is supposed to be idempotent. You might want to read up on the PUT Bucket API. One last thing: bucket names are unique across all of S3.

//Setting up the Amazon S3 bucket to store changesets in
const string ChangesetBucketName = "yourorganization_yourboundedcontextname_changesets";

var s3Client = AWSClientFactory.CreateAmazonS3Client();

s3Client.PutBucket(
  new PutBucketRequest().
    WithBucketName(ChangesetBucketName).
    WithBucketRegion(S3Region.EU)); //Your region might vary

The same thing needs to happen in AWS DynamoDB. There, the containers are called tables. Again, this is a one time, conditional operation. You might wanna read up on the specifics of table naming and uniqueness. The throughput provisioning is not something you want to be static. Monitor the load on your system (they have notification events/alerts for that), both in terms of number of reads/writes and bytes used for storage, and use that information to tune the table throughput settings.

//Setting up the Amazon DynamoDB table to store aggregates in
const string AggregateTableName = "yourboundedcontextname_aggregates";
const string AggregateIdName = "aggregate-id";

//At the time of writing, DynamoDBClient didn't make it yet
//into the AWSClientFactory.
var dynamoDbClient = new AmazonDynamoDBClient();

dynamoDbClient.CreateTable(
  new CreateTableRequest().
    WithTableName(AggregateTableName).
    WithKeySchema(
      new KeySchema().
        WithHashKeyElement(
          new KeySchemaElement().
            WithAttributeName(AggregateIdName).
            WithAttributeType("S")
        ).
        WithRangeKeyElement(null)
    ).
    WithProvisionedThroughput(
      new ProvisionedThroughput().
        WithReadCapacityUnits(300).
        WithWriteCapacityUnits(500)
    )
  );

Now that we’ve set up the infrastructure, let’s tackle the scenario of storing a changeset. The flow is pretty simple. First we try to store the changeset in AWS S3 as an object in the configured bucket.

//An internal representation of the changeset
public class ChangesetDocument {
  public const long InitialValue = 0;

  //Exposing internal state here to
  //simplify the example.
  public Guid AggregateId { get; set; }
  public long AggregateVersion { get; set; }
  public Guid ChangesetId { get; set; }
  public Guid? ParentChangesetId { get; set; }

  public byte[] Content { get; set; }

  public Stream GetContentStream() {
    return new MemoryStream(Content, writable: false);
  }
}

//Assuming there's a changeset document we want to store,
//going by the variable name 'document'.

const string AggregateIdMetaName = "x-amz-meta-aggregate-id";
const string AggregateVersionMetaName = "x-amz-meta-aggregate-version";
const string ChangesetIdMetaName = "x-amz-meta-changeset-id";
const string ParentChangesetIdMetaName = "x-amz-meta-parent-changeset-id";

var putObjectRequest = new PutObjectRequest().
  WithBucketName(ChangesetBucketName).
  WithGenerateChecksum(true).
  WithKey(document.ChangesetId.ToString()).
  WithMetaData(ChangesetIdMetaName, document.ChangesetId.ToString()).
  WithMetaData(AggregateIdMetaName, document.AggregateId.ToString()).
  WithMetaData(AggregateVersionMetaName, Convert.ToString(document.AggregateVersion));

if (document.ParentChangesetId.HasValue) {
  putObjectRequest.WithMetaData(ParentChangesetIdMetaName, document.ParentChangesetId.Value.ToString());
}

putObjectRequest.WithInputStream(document.GetContentStream());

s3Client.PutObject(putObjectRequest);

If that goes well, we try to create an item in the configured AWS DynamoDB table. The expected values below are the AWS DynamoDB way of doing optimistic concurrency checking. They cater for both the initial (competing inserters) and update (competing updaters) concurrency.

In this example I’m using an incrementing version number to do that. Strictly speaking, I could be using the changeset identifier for that, but an incrementing version number is easier to relate to when you come from an ORM/database background.

const string AggregateVersionName = "aggregate-version";
const string ChangesetIdName = "changeset-id";

public static class ExtensionsForChangesetDocument {
  public static KeyValuePair<string, ExpectedAttributeValue>[] ToExpectedValues(this ChangesetDocument document) {
    var dictionary = new Dictionary<string, ExpectedAttributeValue>();
    if(document.AggregateVersion == ChangesetDocument.InitialValue) {
      //Make sure we're the first to create the aggregate
      dictionary.Add(AggregateIdName,
        new ExpectedAttributeValue().
          WithExists(false)
      );
    } else {
      //Make sure nobody changed the aggregate behind our back 
      dictionary.Add(AggregateIdName,
        new ExpectedAttributeValue().
          WithExists(true).
          WithValue(
            new AttributeValue().
              WithS(document.AggregateId.ToString()))
      );
      dictionary.Add(AggregateVersionName,
        new ExpectedAttributeValue().
          WithValue(
            new AttributeValue().
              WithN(Convert.ToString(document.AggregateVersion - 1)))
      );
    }
    return dictionary.ToArray();
  }

  public static KeyValuePair<string, AttributeValue>[] ToItemValues(this ChangesetDocument document) {
    //The relevant values to store in the item.
    var dictionary = new Dictionary<string, AttributeValue> {
      {AggregateIdName, new AttributeValue().WithS(document.AggregateId.ToString())},
      {AggregateVersionName, new AttributeValue().WithN(Convert.ToString(document.AggregateVersion))},
      {ChangesetIdName, new AttributeValue().WithS(document.ChangesetId.ToString())},
    };
    return dictionary.ToArray();
  }
}

dynamoDbClient.PutItem(
  new PutItemRequest().
    WithTableName(AggregateTableName).
    WithExpected(document.ToExpectedValues()).
    WithItem(document.ToItemValues())
  );

If the PutItem operation throws an exception indicating that the “ConditionalCheckFailed”, we know there was some form of optimistic concurrency. In such a case, it’s best to repeat the entire operation.

I’ve left duplicate command processing elimination as an exercise for the reader 😉

The only thing left todo is showing the reverse operation, reading. First we need to fetch the pointer to the last stored and approved changeset identifier. I hope you do realize that a consistent read is not strictly necessary, since the likelihood of concurrency should be low.

var getItemResponse = dynamoDbClient.GetItem(
  new GetItemRequest().
    WithTableName(AggregateTableName).
    WithKey(
      new Key().
        WithHashKeyElement(
          new AttributeValue().
            WithS(aggregateId.ToString())).
        WithRangeKeyElement(null)).
    WithConsistentRead(false).
    WithAttributesToGet(ChangesetIdName)
  );

Now that we’ve bootstrapped the reading process, we keep reading each changeset, until there’s no more changeset to read. Each approved changeset contains metadata that points to the previous approved changeset. It’s the responsibility of the calling code todo something useful with the read changesets (e.g. deserialize the content and replay each embedded event into the corresponding aggregate).

var changesetId = new Guid?(new Guid(getItemResponse.GetItemResult.Item[ChangesetIdName].S));
while(changesetId.HasValue) {
  var getObjectResponse = s3Client.GetObject(
    new GetObjectRequest().
      WithBucketName(ChangesetBucketName).
      WithKey(changesetId.Value.ToString()));
  var document = new ChangesetDocument {
    AggregateId = new Guid(getObjectResponse.Metadata[AggregateIdMetaName]),
    AggregateVersion = Convert.ToInt64(getObjectResponse.Metadata[AggregateVersionMetaName]),
    ChangesetId = new Guid(getObjectResponse.Metadata[ChangesetIdMetaName]),
    Content = getObjectResponse.ResponseStream.ToByteArray()
  };
  var parentChangesetIdAsString = getObjectResponse.Metadata[ParentChangesetIdMetaName];
  if(parentChangesetIdAsString != null) {
    document.ParentChangesetId = new Guid(parentChangesetIdAsString);
  }

  changesetId = document.ParentChangesetId;

  yield return document;
}

And that’s basically all there’s to it. In reality, you’ll need a lot more metadata and error handling to make this a success. I should point out that the performance of these services consumed on premise was abominal. I’m assuming (hoping) that if you consume them from an AWS EC2 instance, performance will be much better.

Conclusion

A clever reader will notice that this technique can easily be transposed to other document/data stores. The specifics will vary, but the theme will be the same. The reason for applying this technique has mainly todo with the inherent constraints of certain datastores. The transactional ones – especially in cloudy environments – are limited with regard to the number of bytes they can store, which makes them less desirable for storing payloads in (prediction of payload size can be hard at times). There are easy ways to side-step concurrency problems such as serializing all aggregate access to one worker/queue, but that’s another discussion.

Your EventStream is a linked list

A week ago I had an interesting twonversation with Jérémie Chassaing and Rinat Abdullin about event streams. I mentioned how I had been toying with eventstreams as being linked lists of changesets (to the aficionados of Jonathan Oliver’s EventStore, this is very much akin to what he calls Commits) in the past. As a way of documenting some of my thoughts on the subject I’m putting up some schematics here.

Model of an event stream

Model

Fom the above picture you can deduce that an event stream is really a simple thing: a collection of changesets. A changeset itself is a collection of events that occurred (how you got there is not important at this point). Event streams and changesets both have a form of unique identity. Head marks the identity of the latest known changeset. A changeset is immutable. For the sake of simplicity I’ve omitted any form of headers/properties you can attach to an event stream, a changeset and/or an event.

Changesets as a LinkedList

EventStream - Model

Each changeset knows its “parent”, i.e. the changeset that it should be appended to. Except for the very first changeset, which obviously does not have a “parent” (strictly speaking you could have an explicit terminator “parent”). Chronologically, changeset 1 came before changeset 2, changeset 2 came before changeset 3, and so on.

Looking at the write side of a system that uses event streams for storage, there are two main scenarios:

  1. Writing a changeset of a given event stream: concerns here are duplicate changeset elimination & detecting conflicts, besides the act of actually writing.
  2. Reading the entire event stream: the main concern here is reading all changesets of the event stream as fast as we can, in order.

I’m well aware I’ve omitted other concerns such as automatic event upgrading, event dispatching, snapshotting which, frankly, are distractions at this point.

Reading

Changesets As Files

Supposing that each changeset is say a file on disk, how would I know where to start reading? Various options, really. The picture above illustrates one option where – by using “<streamid>.txt” as a convention – the Stream Head File is loaded to bootstrap “walking the chain”, by virtue of having it point to the latest changeset document (represented as LatestChangesetRef) that makes up that stream. As each Changeset File is read, it provides a reference/pointer to the next Changeset File to read (represented by ParentRef). That reference is really the identity of the next changeset.

I hope I don’t need to explain why you need to keep those identifiers logical. Don’t make it a “physical” thing, like the path to the next changeset file. That would be really painful if you were ever to restructure the changeset files on disk. Instead you should delegate the responsibility of translating/resolving a logical identity into its physical location.

Other options for “where to start reading” could be:

  1. keeping the head of each stream in memory (causing “sticky” streams and dealing with recovery mechanisms).
  2. storing the head as a record in a database or blob storage with concurrency control

Now, reading each changeset file could become a bit costly if they’re scattered allover the disk. There’s nothing stopping you from throwing all those changeset documents in one big file, even asynchronously. This is where immutability and resolving identities can really help you out. It’s important to distinguish between what happens at the logical level and what happens at the physical level.

Alternative Changesets As Files

Yet another approach might be to keep an index file of all changesets (above represented by the Stream Index File) that make up the event stream (in an append only fashion), thus alleviating the changeset documents from having to know their parents.

Writing

Basically, this operation can be split up into writing the changeset document and updating the head (or index) of the event stream. The advantage here is that storing the changeset document does not require any form of transaction. This allows you to choose from a broader range of data-stores as there really isn’t a requirement beyond the guarantee that they will remember and serve what you asked them to store. Updating the head of the event stream does require you to at least be able to detect concurrent writes are happening or have happened, depending on how you want to resolve conflicts. As such, there’s no need to store both of them in the same data-store. Also notice that the duration of the transaction is reduced by taking the changeset itself out of the equation.

When picking a changeset identity, you might be tempted to reuse the identifier of the message that caused the changes to happen (usually the Command’s message identifier). Don’t. Remember, retrying that same command might produce a different set of changes. How are you going to differentiate between rectifying a previous failure with a retry and some other thread trying to process the same message? It’s best to use identities/identifiers for just one purpose.

Model - Failure

Model - Failure

What happens when you can’t update the head to your newly inserted changeset identity? You’ll be left with a dangling changeset that didn’t make it into “the circle of trust”. No harm done, except for wasting some storage. If the changesets were blobs in the cloud it might be useful to have a special purpose daemon to hunt these down and remove them (depends on how much storage costs vs the cost of building the daemon) . In general you should optimize for having as few “concurrency conflicts” as possible (it’s bad for business).

Conclusion

I know there are a lot of holes in this story. That’s intentional, I have a lot of unanswered questions myself. I’m especially interested in any comments that can point me to prior art outside the realm of sourcecontrol systems. I have no intention of using this in production. It’s just a mental exercise.

Acknowledgements

Most of what I mention here exists in one shape or another. By now some of you will have noticed the resemblance to certain aspects of Git’s internal object model, although I’m inclined to say it’s closer to what Mercurial does. The convention based file storage, can be found in the NoDB implementation by Chris Nicola. Concepts such as commits and event streams (and much more) can be found in the EventStore project.