What is Big Data - Techno Brigade

The latest technology news and information

Breaking

Post Top Ad

Wednesday 6 September 2017

What is Big Data



Big data is a term for data sets that are so large  that traditional data processing application software is unable to deal with them. Big data challenges include getting data, data storage, data analysis, search, sharing, transfer, querying, updating and information privacy.

RDBMS ( Relational database management systems ) and desktop statistics- and visualisation-packages often have difficulty handling big data. The work may require "big software running on lots of servers". What counts as "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make big data a moving target. "For some organisations, facing lots of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."

(Why AngularJs is best Javascript framework for front end development?)

Characteristics

Big data can be described by the following characteristics:-
Volume
The amount of generated and stored data. The size of the data determines the value and potential insight- and whether it can actually be considered big data or not.
Variety
The type and nature of the data. This helps people who analyse it to effectively use the resulting insight.
Velocity
 the speed at which the data is generated and processed to meet the demands 
Variability
Inconsistency of the data set can hamper processes to handle and manage it.

Factory work and Cyber-physical systems may have a 6C system:
  • Customization
  • Content/context 
  • Cyber 
  • Connection
  • Cloud
  • Community
Data must be processed with advanced tools (analytics and algorithms) to reveal meaningful information. For example, to manage a factory one must consider both visible and invisible issues with various components. Information generation algorithms must detect and address invisible issues such as machine degradation, component wear, etc. on the factory floor.

So, how to do big data analysis

Unstructured and semi-structured data types typically don't fit well in traditional data warehouses that are based on relational databases oriented to structured data sets. Furthermore, data warehouses may not be able to handle the processing demands posed by sets of big data that need to be updated frequently -- or even continually, as in the case of real-time data on stock trading, the online activities of website visitors or the performance of mobile applications.
As a result, many organizations that collect, process and analyze big data turn to NoSQL databases as well as Hadoop and its companion tools, including:
  • YARN
  • MapReduce
  • Spark
  • Hbase
  • Hive
  • Kafka
  • Pig



More frequently, however, big data analytics users are adopting the concept of a Hadoop data lake that serves as the primary repository for incoming streams of raw data. In such architectures, data can be analyzed directly in a Hadoop cluster or run through a processing engine like Spark. 

        As in data warehousing, sound data management is a crucial first step in the big data analytics process. Data being stored in the Hadoop Distributed File System must be organized, configured and partitioned properly to get good performance on both extract, transform and load (ETL) integration jobs and analytical queries.

Once the data is ready, it can be analyzed with the software commonly used in advanced analytics processes. That includes tools for data mining, which sift through data sets in search of patterns and relationships; predictive analytics, which build models for forecasting customer behavior and other future developments; machine learning, which tap algorithms to analyze large data sets; and deep learning, a more advanced offshoot of machine learning.

Text mining and statistical analysis software can also play a role in the big data analytics process, as can mainstream BI software and data visualization tools. For both ETL and analytics applications, queries can be written in batch-mode MapReduce; programming languages, such as R, Python and Scala; and SQL, the standard language for relational databases that's supported via SQL-on-Hadoop technologies.

1 comment:

Post Bottom Ad