Here we are at part four of my chronicle of converting a monolith to microservices! Let's talk long running processes!
In our microservices system, we have some long running processes. An example might be something like this:
- A scheduled event fires off
- Query a table to check our inventory of phone numbers
- Make a call to an api and purchase some phone numbers
- Insert them into a table
My company has years of experience using RabbitMQ, so we decided early on to utilize it for messaging. For long running processes, we considered MassTransit. Mass Transit is a service bus that uses RabbitMQ by default. Not only that, MassTransit has a built in feature to handle orchestration problems like ours. It's called a Saga.
A saga is a long-lived transaction managed by a coordinator. Sounds just about perfect for what we need. Sagas maintain the state of the overall transaction. The thing about the saga that we didn't like was having a central manager for our workflows. The saga is a central place that holds the state of a long running process. That provides durability but also a bottleneck. There's lots of back and forth messaging to and from the saga. This can slow things down. Enter the routing slip.
The Routing Slip
As Jimmy Bogard talks about in the embedded video, the routing slip is a concept that will provide a way to handle a long running workflow, with no bottleneck. He explains how sagas became a major bottleneck and didn’t perform fast enough for his needs. Its a worthwhile video.
The routing slip concept is similar to going to Subway and getting a sandwich. You travel along from station to station with your sandwich and provide instruction based on the state of your sandwich. The stations are like endpoints. The sandwich has a state. Our routing slip holds the steps and state of a long running process, and each endpoint does the work at each step.
So in our code, the routing slip is a class that can can contain multiple steps. The slip is embedded in a message that gets dumped into a queue to start the process. When a step completes, the current endpoint checks the state and the slip to see what to do next, perhaps route the message to another endpoint, or whatever it may be.
We'll see how this concept works in production sometime soon!