Big Data glossary

Deciphering the basics of Big Data

What is Big Data?

Variety    Velocity    Volume

Big data is a term used to describe data sets –both structured (eg databases) and unstructured (eg patient records)– that are too large or complex for traditional applications to process. Today's Big Data solutions allow companies to access, store, process and analyse this multitude of data to reveal patterns, trends, and associations.

Variety: Refers to the different types of data formats (text, video, images etc). Big Data is characterised by the many different formats of the data.

Velocity: Refers to the speed with which the data flows. Big Data is characterised by the high speed with which data flows.

Volume: Refers to the size or number of data. Big Data is characterised by very high volumes of data. 

What is a Data Lake?

Extract    Load    Transform

A data lake is a single dumping ground for data in its native format. Contrary to the 'traditional' Data Warehouse approach, the structure and requirements of the data stored in a data lake are not defined until the data is needed. This promotes a lot more usability and drives down costs, as storage is no longer limited by a specific use case.

Extract Transform Load (ETL): Data Warehouses employ ETL techniques that transform the data before they store them.

Extract Load Transform (ELT): Data Lakes store the data in raw format and transform them only when they are needed.

What is Data Fabric?

Cold Data   Warm Data   Hot Data

Data Fabric is not an application or a piece of software. It is a strategic approach towards data and storage. It is focused on how to store, manage, transfer and maintain data. This covers a much wider spectrum including but not limited to on-premise systems, offsite cloud hosted systems, data backups and archival, and other silos.

Organisations can better plan and manage their data, by not being limited to a single cluster view. One of the better ways to manage it, is by classifying the data into: 

Cold Data: old archived data

Warm Data: data that is a few days/weeks old

Hot Data: newly arrived data

What is Artificial Intelligence?

Machine Learning   Deep Learning   Natural Language Processing

Artificial intelligence (AI) is an area of computer science that combines large amounts of data with fast, iterative processing and advanced algorithms, to enable a machine to learn from patterns within these data and, ultimately, think and perform tasks like a human. Spawn from AI are Machine Learning, Deep Learning and Natural Language Processing (all branches of AI focusing on different aspects of AI). Some of the most popular AI examples include computers playing chess, self-driving cars, chatbots and even Alexa and Siri.

Machine Learning: with machine learning, software applications learn from processing data inputs without human intervention.

Deep Learninga subfield of machine learning, deep learning imitates the human brain by building artificial neural networks.

Natural Language Processing: a branch of AI, NLP focuses on how to program computers to process and analyse natural language / speech.

What is Big Data Analytics?

Descriptive   Diagnostic   Predictive   Prescriptive

Data analytics is the science of analysing raw data in order to derive useful insights and can be classified into four types:

Descriptive analytics: is a technique that summarises raw, historic data from multiple sources, in order to provide insights into the past and help answer the question of what has happened.

Diagnostic analytics: examines historical data from different periods of time, in order to identify patterns, trends and correlations to help uncover the root cause behind a problem. 

Predictive analytics: uses data from Descriptive and Diagnostic Analytics, and applies statistical models to make predictions for future events. 

Prescriptive analytics: helps organisations decide on what action to take to solve a problem based on outcomes from similar events. This requires both Descriptive and Predictive Analytics to determine the best solution or course of action among various options. 

What are Messaging Frameworks?

Messaging Queue   Distributed Messaging Pub-Sub   Distributed Stream Processing

Messaging frameworks translate a message from the formal messaging protocol of the sender to the formal messaging protocol of the receiver. There are three types of frameworks:

Messaging Queue Frameworks: The traditional message queue paradigm, which is to be used only when there is a fixed end-to-end messaging system to support it.

Distributed Messaging Pub-Sub Frameworks: Publish–subscribe is a sibling of the message queue paradigm. This pattern provides greater network scalability and a more dynamic network topology, with a resulting decreased flexibility to modify the publisher and the structure of the published data.

Distributed Stream Processing Frameworks: Stream processing frameworks are runtime libraries which help developers write code to process streaming data, without dealing with lower level streaming mechanics.

How can WHISHWORKS help your business?

We work hard to be the best, and we are proud to be the data specialists of choice for a wide variety of brands. If you want a deeper understanding of your own data's potential, we are the right people to talk to.

Give me insight
Mulesoft
Salesforce
Microsoft Azure
HPE MapR
Confluent
Cloudera
Databricks