First, it is important to define the term big data and differentiate between data analysis and data analytics.
Big data refers to data sets so large or complex that traditional data processing applications are inadequate. It is characterised by big volume, high velocity (it accrues fast e.g. transactions data) and variety (can be structured or unstructured e.g. videos, emails); what is sometime referred to as the three Vs of big data. The challenges mainly include analysis, capture, storage and visualization.
Data analysis refers to the extensive use of statistics, with or without the aid of computerized tools, to gain insights/knowledge from the data.
Data analytics is a discipline rather than a tool. It uses data analysis and other data science tools to recommend actions or aid decision-making; it is thus concerned with the whole process of analysis to insights generated to decisions being made from the insights.
Big data analytics is the process of examining big data to uncover hidden patterns and other useful information that can be used to make better decisions in the application context, which is mostly a business environment.
Why is big data analysis and analytics essential?
First, it provides business intelligence through standard and unplanned business reports which might answered questions such as how consumers behave the way they do and what individual consumer factors are associated with particular product choices or purchase preferences. Big data analytics can also be proactive through approaches like optimization, predictive modelling and forecasting thus aiding decision making for the future. Big data analytics also ensures efficiency in managing the huge volume and variety of data that businesses have to deal with. By using big data analytics you can extract only the relevant information from terabytes of potential data in efficient and speedy manner.
What are some of the tools are available for big data analytics?
Due to the sheer volume, variety and velocity associated with big data traditional analysis tools based on relational databases are limited in their capacity for big data analysis and analytics. Institutions that need to analyse big data are adopting customised technologies that are developed or being developed to offer a platform for big data analytics.
Some of the technologies in place include:
Apache Hadoop: an open source data processing platform and a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. The related tools to hadoop are: YARN, MapReduce, Spark, Hive and Pigas.
Cloudera enterprise: its aim is to help users become information-driven by leveraging the best of the open source community with the enterprise capabilities they need to succeed with Apache Hadoop in their organization. It is designed specifically for mission-critical environments and includes CDH, the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools.
Hortonworks Data Platform (HDP): is an enterprise-grade data management platform that enables a centralized architecture for running batch, interactive and real-time analytics and data processing applications simultaneously across a common shared dataset. It is also built on Apache Hadoop, powered by YARN.