Apache Spark Support

Apache Spark consulting, implementation, optimisation and support

Our Spark Specialism

Apache Spark is a fast and general engine for Big Data processing, with built-in modules for streaming, SQL, Machine Learning and graph processing. At WHISHWORKS we have worked extensively with Apache Spark in many Big Data projects:

• Implementation of robust production data pipelines at scale.

• Implementation of multiple “Spark and NiFi” based IoT pipelines.

• Numerous projects requiring Spark applications to perform efficiently on Yarn clusters.

• Introduction of SMACK (Spark, Mesos, Akka, Cassandra, and Kafka) stack into our Big Data roadmap.

• Development of reusable component registries, based on our extensive production experience to help reduce development time for building enterprise grade search solutions using Spark and Apache Solr, by almost 50%.

• Extensive experience into building and running production grade Data pipelines on cloud platforms like AWS and Azure.

• Multiple use cases involving streaming data processing, interactive analytics, batch processing and Machine Learning.

  • Consulting

    • Needs Analysis
    • Architectural Consulting
    • Spark Cluster Architecture Review & Design
    • Identification of Use Cases

  • iot

    Managed services

    • Cluster Administration & Optimisation
    • Tailored Services
    • Staff Augmentation

  • Application support

    • High performance Spark applications implementation. Real-time / batch / streaming / offline analytics
    • Full Spark stack delivery: Spark SQL, SparkML, Spark Streaming, Spark GraphX
    • Deliver high quality SQLs that run seamlessly on Spark engine backed by AWS (S3 and Redshift) or Azure Blob/Table Storage
    • Deliver high performance Spark-based data pipelines by strictly following Test Driven Development approach

  • Deployment & Application Delivery

    • Support and Issues Management on Existing Open Source Spark clusters.
    • Support to maintain Spark SLAs and SLOs consistently. Spark SQL read/writes speed optimisation.
    • Spark multi-user, cluster sharing.
    • 24×7 or tailored support packages, Incident management and reporting.