The centerpiece of every financial exchange is its matching engine. This is what allows traders to enter and exit markets, buying and selling at the best price currently available. As their market orders are filled from the limit orders of other traders in the order book, the balance between the best available buy/sell prices adjusts, setting a new market price.

In computer science, matching engines are known as state machines. They have an internal state that changes every time they receive new inputs and generate new outputs. In trading, inputs are orders from traders and outputs are the updates as orders are filled partially, completely, or rejected. 

For a matching engine to be able to handle large volumes of orders and to retain an accurate record of what took place, it must be connected to various networking and storage components. As we’ll see below, every design choice affects the performance of the final system, as well as what it can and can’t do.

Performance vs reliability

The above diagram is a very basic matching engine design. Traders interact with it via an API gateway, the matching engine responds, and a separate database component keeps a record of all the various interactions that took place.

When designing exchange infrastructure, developers are faced with a dilemma between reliability and performance. Focusing too much on performance negatively affects reliability. Focusing too much on reliability negatively affects performance.

The above design fails on both fronts. The problem is that it quickly becomes overwhelmed by a high volume of orders (poor performance). Also, as a standalone system with no backups, it’s a single point of failure (poor reliability).

Introducing replication

The image below is a simple iteration of the first design. In this one, multiple copies of all components are running in parallel. Replication allows for redundancy should any individual component become overwhelmed, however, this causes the system to not perform as well, particularly due to database replication.

Picture

Consistency and consensus

Replication comes with the added issue of how to maintain consistency between copies. Where a single instance can be regarded as canonical, replication introduces the issue of how to achieve consensus between copies. If any of these instances deviate in state from the others, the reliability of the system immediately takes a hit.

The technical challenge here is to ensure that each copy receives its inputs at the same time and in the correct order without adversely affecting the performance of the system or introducing even more points of potential failure.

PicturePicture

What’s required here is a bit of distributed systems thinking. Consensus algorithms can be used to ensure that inputs arrive at all matching engine instances at the right time and in the right order.

Leader nodes are responsible for propagating inputs to all matching engine copies. New leaders are elected to continue this task in cases where the leader is unavailable for whatever reason.

If a majority of nodes agree as to the current state, it’s business as usual. If a leader should become unavailable, a new leader can be elected to continue propagating inputs to all copies.

PicturePicture

How about system recovery?

The current state of a matching engine is determined by the series of changes made on its initial state by each subsequent input. If, for whatever reason, the system experiences downtime, it’s possible to replay all the inputs the system received in the correct order and thus to arrive at the last state before the disruption occurred.

However, a problem arises when you consider the extent of a financial exchange’s event logs. Due to their sheer size, it can be impractical to play all events forward from the very beginning every time a recovery has to take place.

To solve this issue, snapshots are taken of the current state at regular intervals. In this way, it’s possible to recover a system using these snapshots. The most recent one represents a local point of consensus from which inputs can be played forward to the desired point.

Storage considerations

Record keeping is crucial for regulatory, clearing, settlement, and recovery needs. However, interactions take place in RAM so that the matching engine may perform in line with the demands of modern financial exchanges.

RAM, though, is insufficient for keeping a persistent record of events, which is why a separate system is required to preserve all client interactions for future reference.

In addition to the databases we saw above, a separate storage component is used so that the matching engine is not required to communicate directly with the databases (which would impact performance).

This component receives matching engine outputs and asynchronously writes them to a database in order to maintain system performance. The database can then be queried via a separate API.

PicturePicture

Throughput considerations

Finally, we come to throughput. Nowadays, both retail and institutional venues have strict throughput requirements. It’s crucial for matching engines to be engineered to handle surges in trading activity that are far greater than the average.

So far, we’ve discussed replication as a resilience strategy, however, matching engine copies can also be used to scale a system horizontally, allowing for more orders to be processed simultaneously.

An effective way to increase a system’s throughput is to divide the various assets on offer into multiple segments and then to use a separate instance of the matching engine for each of these. This can be done for the entire symbol roster, or just for the most traded ones.

For example, to allow a crypto exchange to scale horizontally, a separate matching engine instance can be dedicated to BTC pairs, another to ETH pairs, with less liquid crypto assets grouped together in their own separate matching engine instances.

Conclusion

We hope you found this introduction to matching engine design considerations interesting and informative. It’s a piece of technology that often doesn’t get the attention it deserves, despite it being the backbone of every trade across all asset classes, facilitating the efficient exchange of trillions of dollars’ worth of value every day.

The post Matching Engine 101: The Challenges of Matching Orders Quickly and Reliably appeared first on BeInCrypto.

Source link





News Source link