Popular Topics

What is Kafka? The top 5 things you should know


Kafka is a community distributed, event-driven, streaming platform and was originally created and open sourced by LinkedIn in 2011. Kafka is based on an abstraction of a distributed commit log. 

What is Kafka? An informal explanation for non-techies

by Gaurav Bhardwaj

Apache Kafka, is a well-engineered piece of technology and a lot has already been written about it. This blog doesn’t attempt to explain Kafka technically, neither is it targeted to technical audiences. In fact, this article is for non-technical [...]

WHISHWORKS is the sponsor at "Open Enterprise Hadoop Roadshow - London"


WHISHWORKS is the official sponsor at Open Enterprise Hadoop Roadshow, London.

Find out on how Hadoop can help your organization turn data into business value and also learn more about Open Enterprise Hadoop.

Installing SolrCloud on Hadoop

by Vijaya Bhoomireddy

Apache Solr is one of the powerful open-source search libraries available in the market. In this article, we will discuss on how to set up a Solr cluster i.e. SolrCloud on Hadoop so that Solr index data is stored in HDFS and is made available for [...]

Custom Processing using Apache Pig UDFs (User Defined Functions)

by Rakesh Gupta

Pig UDFs can be easily implemented in Java, below are the Steps to create a UDF using eclipse.

  • Create a normal java project and a java class (UDF), which extends one of the Eval, Store, Load or Filter classes.
  • Override the exec() function to [...]