RabbitMQ is used by 1846 big companies, including Reddit and Trivago. Meanwhile, 70% of Fortune 500 companies have used Apache Kafka. But why, and what are the strengths of these platforms? Project type and client need will determine which one is the best choice.
What is message broker software?
A message broker is a program that acts as a middleman between applications, relaying their messages to each other. This means that applications don’t have to be aware of each other, or be in direct communication, to do their jobs. But why is this useful, and why are message brokers so important?
Let’s say a company wants to update an inventory list so customers can see which items are available for purchase. This requires communication between the inventory log and the client-side landing page.
When a service uses a standard messaging system to do this, such as a REST API, then an immediate response is required for the full request/response process to complete. This has issues that can have severe effects.
Web servers that accept the response might not be online or have low fault tolerance, leading to latency. No response, then, will be forthcoming. This means the process cannot complete. Where a reply is required, this can lead to system failure or to the process getting stuck in a repeat loop. This is the main issue with synchronous communication.
A general-purpose message broker solves this problem by introducing a means of message queuing. Here, messages can be held between the two communicating parties, the “producer” and the “consumer.”
A request is sent from the producer. We can think of this request as a message. The message waits in the queue until the consumer receives it. If there is an issue with the consumer side, the message simply waits in place and system operability is not affected.
For example, in our e-commerce example, an inventory page—the producer—might want to populate an item in a checkout page—the consumer. Being asynchronous, the system can continue functioning after the message is sent.
Unread messages safely wait in a queue if a response is not immediately sent by the consumer, allowing the producer to continue sending messages and data to other consumers.
As an intermediary, a message broker can create multiple message queues for multiple consumers. Message brokers remove the producer–consumer dependency. This decoupling allows apps to continue unimpeded.
What is Apache Kafka?
Events and topics
Apache Kafka is an open-source distributed streaming platform. The producer and consumer process here is based on stream processing where the system creates and maintains data records of events. These are called “topics.”
These topics contain the messages and as such can be thought of as streaming data that has been stored for analysis by consumers.
As in a general-purpose message broker, consumers subscribe to the messages and have the data sent to them directly. In Kafka, consumers request batches of topics from Kafka’s stream history or from its real-time stream processing instead of them being sent directly from a queue.
Broadcast and partition
Here, the consumer is like a receiver picking up broadcasts of channels it is interested in. The producer continues streaming and the consumers can accept at that moment or go back to find specific information.
This is because Kafka stores messages in partitions, and each record within that partition receives a unique ID, which is called the offset. Every time a new message is added, it is given a number, with each number increasing by one with each additional offset.
These offsets are replicable so a large number of consumers can request them and use the service while a high throughput is still maintained. The partition of topics into numerical offsets means that the stored messages are highly accurate and also ensures the exact occurrence of the streaming data.
These topics are saved to disk or to an additional server cluster, so messages can be set to disappear upon acknowledgement or after some hours after having been received by the consumer. Because of this, however, if physical storage space is sufficient, Kafka can store messages permanently, and therefore offers advantages for analyzing data over time.
This lends itself to horizontal scalability: with every new disk, node, server, and cluster added, Kafka can retain more information and respond to more consumers.
Use cases of Apache Kafka
- Data analysis: Kafka’s ability to retain messages and their offset information allows apps to build pictures of consumers or processes over time and offer more tailor-made information or solutions.
- Ensuring resilience: As apps grow and become more sophisticated, the chance for faults grows. Kafka’s ability to store and replicate topics means the system is extremely resilient.
- Application crashes/Data Loss: Kafka’s processing data is available for retrieval, with no limit, so developers can restore the system accurately.
- Information streaming: Because the producer API of Kafka is constantly broadcasting and streaming data, it is easy to configure a consumer that displays this information in a useful UI.
- Live events: Live data rendered on an clear UI can create apps for cargo or transportation or show the location of objects in real time.
- Event sourcing: Kafka can retain messages, read or unread, permanently. This is useful for instances where precise events need to be pinpointed.
- Auditing: A company’s processes are audited for efficiency: Kafka preserves and makes the relevant events easily retrievable for analysis.
What is RabbitMQ?
An open-source message broker, RabbitMQ allows for asynchronous communication between apps and pages.
The message exchange
RabbitMQ is a message broker that develops the function of the message queue into something more sophisticated using data analysis for more complex routing scenarios.
Its advanced message queuing protocol (AMQP) places messages into an exchange, rather than a single message queue. Multiple queues can be configured for multiple consumers. This stops consumers receiving irrelevant messages once messages are released from a queue. There can be multiple queues intended for multiple consumers.
Each queue is linked to the exchange by what is called a “binding.” These links are identifiable via a unique ID called a “binding key.”
Consumers are linked to specific queues, and those queues will be populated only by messages relevant to them, after they have been released from the exchange. The system is similar to a post office.
The RabbitMQ routing methods
RabbitMQ has several different routing methods that guarantee effective message delivery, which makes it different from a traditional message broker. The speed of these exchanges grants low latency messaging.
Fanout and direct exchanges
- Fanout exchange: The producer sends a message, which is then duplicated by the exchange and forwarded to every queue, and so to every consumer.
- Direct exchange: Direct exchanges deliver messages by using a routing key.
What is the routing key?
- The routing key: A unique ID that is paired with the relevant binding key, ensuring that that message can be delivered only to that specific message queue, where it then waits to be received by the consumer linked to that queue.
The topic exchange
- Topic exchange: The binding and routing keys are generally named for the content of the messages. For example, in an e-commerce setting, a message might be concerned with updating a purchase order.
- Purchase example: The routing key might then have the ID “purchase.shoes” while the binding key is labelled “purchase.all.” This partial match between the routing and binding keys allows the exchange to notice the topic similarity and so send messages to the appropriate queue. This avoids the need to create multiple queue and consumer groups for each inventory item.
The importance of metadata
- Header exchange: Similar to a topic exchange, here the routing key is ignored in favor of metadata contained in the header. This allows for more keywords and data to be analyzed than might be contained in the routing key.
A RabbitMQ-specific exchange
- Default exchange: This operates the same way as a direct exchange, except the routing key is named after the message queue itself, and not the binding key. When the message leaves the exchange, it goes directly to the assigned queue.
- Key-free messaging: The default exchange is useful as the producer and consumer groups might not know each other, as in the case of disparate microservices in a large application, and therefore do not know the binding key. Messages can therefore be sent only by knowing the queue’s name.
Use cases of RabbitMQ
- Flexibility in different message routing: Different complex routing scenarios are available to deliver messages, rather than simple point to point messaging.
- Data exploitation: The routing scenarios are mediated via item metadata, and this allows different exchange methods to work, rather than having to conform to the simple infrastructure of a traditional message broker.
- Message priority: RabbitMQ can create a priority queue, with the metadata determining which messages have high priority.
- Wide language mediation: RabbitMQ allows for producers and consumers to communicate in different languages. This grants a high throughput potential, as the success of pub sub communication patterns is not hindered by language exclusivity.
- Legacy protocol support: Although RabbitMQ works on a default AMQP, it can support legacy protocols thanks to the range of plug-ins available.
- Vertical scaling: An application can become larger and more complex, adding more resources, including producers and consumers. All of this can be done while the queuing system remains the same, requiring no additions or extra code.
RabbitMQ vs Apache Kafka
The type of projects and uses Kafka and RabbitMQ lend themselves to should now be clear.
You should use RabbitMQ if:
- You need to manage pub sub communication patterns where different messages are sent to multiple consumers, allowing the producer to continue with other tasks.
- You have a variety of message types and consumers that would benefit from the variety of routing protocols offered by the RabbitMQ exchange.
- You have long-running tasks and want to run reliable background jobs with no interference from the messaging process or the processing data.
- You are using multiple microservice APIs with different languages and need their communication to be standardized by RabbitMQ as a middleman. Likewise, if you are running a system using legacy protocols that will need to be aligned to overall functionality.
You should use Apache Kafka if:
- You have a project or app that requires the long-term storage of data.
- Your project relies on the ability to analyze the history of data changes.
- You need to replicate and resend messages to multiple consumers asynchronously.
- You would like to present live messaging from the producer via a client-side UI.
- You have an enormous number of consumers requesting topics; the horizontal scalability of Kafka allows for the necessary replication and high throughput required.