What is Kafka? An informal explanation for non-techies

04.10.2019

Apache Kafka, is a well-engineered piece of technology and a lot has already been written about it. This blog doesn’t attempt to explain Kafka technically, neither is it targeted to technical audiences. In fact, this article is for non-technical readers who have little or no technical background and want to understand the basics of Kafka in simple terms.

Kafka is a messaging system

Stated simply, Kafka is a messaging system. Obviously that raises the question “What is messaging system?” A messaging system is a piece of software that sends or receives messages.

Ever since organisations started using systems and applications to fulfil their business needs, there has always been a need for these systems and applications to communicate with each other. There are many solutions addressing this need, and the most common (and successful) one is messaging. Messaging allows systems that produce messages (simply called producers or publishers) to communicate with other systems that consume messages (called consumers or subscribers). Kafka is one such publisher-subscriber (pub-sub) messaging solution.

Why Kafka?

Let’s start from the beginning. Suppose you started a new business. To run your business you have put in place various systems and applications –your IT infrastructure. For your business to operate efficiently, these systems and applications need to exchange data, so you add a messaging solution to enable this communication. As your business grows, so does your IT infrastructure and consequently your need for additional messaging bandwidth. Eventually, you reach a point where your existing messaging solutions are struggling to keep up with the increased volume of messages coming in and out of your systems and applications, and you may even experience a breakdown of your messaging server. In addition to this, continuous data (such as Clickstream or sensors or IoT) might also be coming to your systems (this continuous data flow is also called streaming), demanding a new approach to messaging. Kafka offers the answer.

Kafka is designed to grow with your business (scalable), it spans over multiple servers (distributed), is reliable (this means no message is ever lost once they are received by Kafka), and, among its many functionalities, can handle continuous flow of data (streaming). Because it is distributed, server failure is no longer a risk. Also, Kafka can handle any number of producers and any number of consumers without affecting performance.

Now we can redefine Kafka as a fast, reliable, scalable, distributed messaging and streaming platform.

Does that mean that you need to remove existing messaging systems? Well it depends. Kafka can replace your existing messaging systems but can also work with them. Either way it will need some changes in your existing infrastructure, which will require a certain level of technical expertise.

Applying Kafka in the real world

Often a technology can be better understood by its application in the real world. Following are just a few applications of Kafka from our experience working with different companies and on different projects:

Streaming: Often in large organisations like Banks and e-Commerce companies, there is a need to continuously move data from one system to another. For example in banking, a large amount of data is continuously pushed to various different systems for different reasons, like risk assessment or regulatory compliance. Similarly in e-Commerce, data like user activity is continuously tracked and pushed to analytical systems for insights, recommendations etc.

Messaging: Business systems generate a large volume of data (messages) which need to be processed by numerous other systems to address different needs (eg manufacturing, utilities, energy, sciences etc). Kafka is the perfect solution for such scenarios -messages are pushed to Kafka and can either be processed quickly (called near-real time) or later by different applications.

Data Pipelines: Kafka is often used in building big data pipelines. This means Kafka is used as a tool to bring large amounts of data into big data platforms such as Hadoop. This process is called ingestion and is the most common way of bringing external data into big data environments.

Final thoughts

Over the years, we have worked with various messaging and streaming solutions. We’ve found Kafka to be one of the most reliable, scalable and powerful technologies, solving a lot of the data messaging and streaming problems companies are facing as the reach a critical size.

If you would like to find out more about our Kafka services, do give us a call at +44 (0)203 475 7980 or email us at marketing@whishworks.com

Other useful links:

Kafka Services

Big Data Services

Big Data Centre of Excellence

Recent Posts