Apache Kafka vs. Amazon Kinesis
Like many of the offerings from Amazon Web Services, Amazon Kinesis software is modeled after an existing Open Source system. In this case, Kinesis is modeled after Apache Kafka.
Amazon Kinesis has a built-in cross replication while Kafka requires configuration to be performed on your own. Cross-replication is the idea of syncing data across logical or physical data centers. Cross-replication is not mandatory, and you should consider doing so only if you need it.
Engineers sold on the value proposition of Kafka and Software-as-a-Service or perhaps more specifically Platform-as-a-Service have options besides Kinesis or Amazon Web Services. Keep an eye on http://confluent.io.
When to use Kafka or Kinesis?
Kafka or Kinesis are often chosen as an integration system in enterprise environments similar to traditional message brokering systems such as ActiveMQ or RabbitMQ. Integration between systems is assisted by Kafka clients in a variety of languages including Java, Scala, Ruby, Python, Go, Rust, Node.js, etc.
Other use cases include website activity tracking for a range of use cases including real-time processing or loading into Hadoop or analytic data warehousing systems for offline processing and reporting.
An interesting aspect of Kafka and Kinesis lately is the use in streaming processing. More and more applications and enterprises are building architectures which include processing pipelines consisting of multiple stages. For example, a multi-stage design might include raw input data consumed from Kafka topics in stage 1. In stage 2, data is consumed and then aggregated, enriched, or otherwise transformed. Then, in stage 3, the data is published to new topics for further consumption or follow-up processing during a later stage.
Keep an eye on supergloo.com for more articles and tutorials on Kafka, Kinesis and other stacks used in data processing and pipelines using streams.
Featured image credit https://flic.kr/p/7XWaia