Apache Kafka
Apache Kafka is an event streaming platform. It provides the following functions:
- To publish and subscribe to streams of events, including continuous import/export of your data from other systems. These systems could be: databases, sensors, mobile devices, cloud services, etc.
- To store streams of events durably and reliably for as long as you want.
- To process streams of events as they occur or retrospectively.
What does it provide?
Kafka uses server-client architecture that communicates to each other via a TCP protocol.
- Servers:
- Distributed, can span across different regions, datacenters.
- Some servers are brokers while others could run something like
Kafka Connect
to handle the data import/export from/to other data sources. - Fault tolerance.
- Clients: Applications that pub/sub to Kafka topics.
Kafka Streams
provides some integrated functions.
How does Kafka work?
Requirements
- High throughput to support high volume event streams like real-time log aggregation.
- Handle Large data loads from periodic data loads from offline systems.
- Low latency to handle traditional messaging use cases.
- Easy to scale to ditributed systems.
- Fault tolerance.