Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
At WHISHWORKS we have worked extensively with Apache Spark in many Big Data projects:
Implementation of robust production data pipelines at scale.
Implementation of multiple "Spark and NiFi" based IoT pipelines.
Numerous projects requiring Spark applications to perform efficiently on Yarn clusters.
Introduction of SMACK (Spark, Mesos, Akka, Cassandra, and Kafka) stack into our Big Data roadmap.
Development of reusable component registries, based on our extensive production experience to help reduce development time for building enterprise grade search solutions using Spark and Apache Solr, by almost 50%.
Extensive experience into building and running production grade Data pipelines on cloud platforms like AWS and Azure.
Multiple use cases involving streaming data processing, interactive analytics, batch processing and Machine Learning.