Apache Kafka

Apache Kafka is an event streaming platform. It provides the following functions:

  1. To publish and subscribe to streams of events, including continuous import/export of your data from other systems. These systems could be: databases, sensors, mobile devices, cloud services, etc.
  2. To store streams of events durably and reliably for as long as you want.
  3. To process streams of events as they occur or retrospectively.

What does it provide?

Kafka uses server-client architecture that communicates to each other via a TCP protocol.

  1. Servers:
    • Distributed, can span across different regions, datacenters.
    • Some servers are brokers while others could run something like Kafka Connect to handle the data import/export from/to other data sources.
    • Fault tolerance.
  2. Clients: Applications that pub/sub to Kafka topics. Kafka Streams provides some integrated functions.

How does Kafka work?

Requirements

  1. High throughput to support high volume event streams like real-time log aggregation.
  2. Handle Large data loads from periodic data loads from offline systems.
  3. Low latency to handle traditional messaging use cases.
  4. Easy to scale to ditributed systems.
  5. Fault tolerance.
· 开源项目, 系统设计