How to Stop Your Event-Driven Architecture from Turning Into Chaos
The three most common pitfalls organizations fall into when building event-driven architectures.
Event-driven architecture is becoming more and more accessible to us. Event-driven architecture isn’t anything new, in fact it’s been going on for decades, but with the rise of the cloud and managed services it’s easier now more than ever to be “event-driven”.
The steps normally look like this:
Create the message
Call an SDK to publish the message (e.g Azure EventGrid, Confluent, Google Pub/Sub, Amazon SQS).
Connect consumers (now or in the future).
That’s it. All done in less than 100 lines of code (give or take).
We have connected our systems together, over a distributed channel, and now we have a loosely coupled, event-driven architecture…. but in reality all you have is a distributed big ball of mud.
Over the past 10 years I have spoken to 100s of companies adopting and trying to implement event-driven architectures, and almost all of them fall into the same problems and develop some level of chaos.
I quickly want to highlight some of the common problems and provide you with some resources to help you avoid them.
Problem 1: Distributed big ball of mud
As the barrier to entry lowers for event-driven architecture, it becomes easier and easier to create big balls of mud. The first time I heard of the term “big ball of mud” was in Brian Foote and Joseph Yoder white paper “Big ball of mud”.
Foote and Yoder highlight characteristics of this problem, which IMO fit very well with event-driven architecture:
Unregulated growth
Structure of the system may never have been well defined
Information is shared freely among distant elements of the system
Unregulated growth is a popular problem. Producer-consumer relationships are disconnected, and consumers come and go when they want, the growth of your architecture becomes out of control, and you lose control of your architecture, and knowing what the hell is going on (more on this later).
As the barrier to entry lowers, it’s so easy to build systems that are not really defined in the first place very well. The domains are sparse, and it’s easy just to raise events, and move on with your life.
Information is shared freely across your system, your event design tends to lack, you start to expose implementation details, and your contracts are all over the place.
Everything starts simple and is fine, but quickly develops into a big ball of mud.
Problem 2: Lack of event design and too much coupling
Many companies today building APIs tend to use best practices when it comes to versioning, naming conventions and documentation (e.g OpenAPI). It’s quite rare that you can go into a big organization these days, just throw an API out there, without any of these thoughts or concepts captured.
But when it comes to event design, it’s a free for all (in most cases and companies). There is a lack of care for the design of the event itself, what goes in it and how consumers can couple directly back to the producer through the schema itself.
Implementation details start to leak into the event, producers start to lose control on who is consuming the events, so they run the risk of breaking downstream consumers they have no idea about. The consumers start to conform to the payload of the event, and the coupling increases. Events in the architecture vary in size, standards and lack a version strategy.
It becomes harder and harder to manage events, and you end up in a complete mess.
Problem 3: Lack of discoverability
“Producers don’t know about consumers” is what we read online, we feel like it’s written in stone, and we must follow it. In fact this thought travels with us, and we start to lose control of who is producing what, which version, who is consuming, what payloads, ownership, and visibility on your architecture.
Things at the start are simple. Discoverability is not a problem, until it creeps up on you. People start to ask questions about contracts, consumers, breaking changes, managing the events, and much more….
It becomes hard for architectures, developers and product folks to understand the architecture. Rework across your organization happens as no-one really knows what events you already have, or even how to consume them.
Teams start to get frustrated, and lose trust in “event-driven architecture”.
Almost all the companies I have spoken to, have various levels of these problems and chaos.
So the question is, what can you do to help?
What’s the solution?
You don’t have to fall into the same trap everyone else is facing. Some simple techniques can help you avoid huge costs to your organization and manage some of this chaos and complexity.
The TLDR:
Distributed big ball of mud - Don’t fall into the trap of implementation first mindsets. Stop, think, design and model. There are some great things that can help, like EventStorming or EventModeling to help you capture the domain and start to plan. More resources are below.
Lack of event design and coupling - How you consume the events couples you back to the producer. Don’t just throw anything in your message. Think about the tradeoffs, explore event patterns, and look at standards like CloudEvents. Explore bounded context maps to help.
Lack of discoverability - Don’t fall into the trap of lack of discoverability for your events and architecture. Help your teams find information they need. Start simple with a basic README file, and explore specifications like AsyncAPI or join me on my mission to help build the open source documentation tool EventCatalog.
To dive deeper here are some links I recommend:
Look for the warning signs of big balls of mud
EDA Visual: Big ball of mud (resources to help you dive deeper)
Stop the implementation first mindset, think about the structure of your system. Explore EventStorming and EventModeling domain discovery techniques to help you.
Here I have a video on big balls of mud and how they start and how you can fix them
Here is a visual on bounded context, what they are and how they can help.
Event design + coupling
Explore standards like CloudEvents can help you introduce standards in your messages, that can help.
I have a visual here to help you understand why event design is important.
Don’t expose too much information in your events. Here is another visual to help.
There is a difference between internal (private) and public events.
Move to “event first thinking”, I have a deeper dive talk on this subject here.
Lack of discoverability
Help your teams find the information they need. A simple README is better than nothing.
Explore tools and specs like AsyncAPI or EventCatalog to help you.
Why not use AI to help you ask questions about your architecture? Maybe some value here for you.
I hope some of these resources can help you learn more and avoid some of the common pitfalls I see folks make. If you have any questions let me know, happy to help!
If you want to join over 1100 of us on Discord, feel free to join us! Bring your EDA questions, or anything you have.




