Believe it or not, data mining for business intelligence is not dissimilar to mining in the physical world, where, say, a precious metal extracted from the earth is transferred to a processing plant, filtered, refined and fashioned into jewellery or other products used in industry. Data follows a similar process, from its raw state to the delivery of valuable business insight.
What you put in is what you’ll get out
Before you begin, you’ll need to know what sources of data are important for your business analysis. Pick what you need from the variety of data sources available to you, from simple numerical measurements and text documents to more complex information. Business-oriented Big Data is typically machine data like server logs, network logs, and RFID logs; transaction data like that from websites or retail stores; and cloud data like stock ticker prices or social media feeds. This data is often unstructured, strings of text, images or audio files; or semi-structured, log data with a timestamp or IP address.
If you are not clear about the type of analysis you need at the outset, your results will be flawed. Be clear about the answers you seek - is it about volumes, a trend, or something else? In addition, you must prepare a data analysis roadmap that outlines how you will integrate and segment the data in your business's information store.
How and where data is captured
Structured data is daily transactional information collected from operations, sales, inventory etc. and stored in a database that has a structure or schema. However, unstructured data like social media content does not have a schema to store. The volume of data collected today is vast and includes visitor interaction with your website, called click-stream data. Analysts also collect information about what people are saying about your products online. Data sources like this reflect trends in sentiment around your company and its products. They also provide opportunities for new innovations, and they contain valuable information about your competitors.
How to extract the gold and not the dust from your data mine
The size and complexity of unstructured data makes it essential to have a distributed storage and processing framework. For this, Hadoop is the best fit. Hadoop is designed to support Big Data – data that is too big for traditional database technologies to accommodate. For processing unstructured data, you need to extract the relevant information and give it a structure. Analysing unstructured data typically involves complex algorithms. Hadoop is an open-source platform and can be used in numerous applications specific to video, audio, image, and text file analysis.
Where and how to use your data
Big Data needs to be filtered, transformed and sorted before loading it into a data warehouse. The type of data analysis that needs to be performed depends entirely on the business need, as does the selection of the tool used for the analysis. It should be able to integrate data from multiple sources and should not make assumptions about where that data comes from or how it is organised.
The infrastructure required for analysing Big Data must be able to support deeper analytics such as statistical analysis and data mining on a wide variety of data types stored in a diverse range of systems. It must be scalable enough to cope with extreme volumes of data, respond quickly, and allow for decision-automation based on the results of those analytical models.
If you would like to find out more about how Big Data could help you make the most out of your current infrastructure while enabling you to open your digital horizons, do give us a call at +44 (0)203 475 7980 or email us at email@example.com.
HISHWORKS is a Hortonworks Gold Consulting Partner and Reseller, an Authorised Reseller and Certified Consulting Partner for MapR, and a Cloudera Silver Partner.