Big Data

Big Data is just like regular data, except that there is lots of it.  Actually, not fair.

Regular data is typically in a relational database on a number of SQL\Database servers.  It grows in a predictable way.  BD is in lots of places, in lots of formats and grows rapidly – both because there is increasing content in each location, but we also are adding new data sources as we realize we can use them.  We, of course, don’t have to use ALL of each data source or all the sources at any one time.  That is also assuming we are using the data to answer questions\queries, and not hording it like Smaug the dragon.

You can describe BD as three Vs: velocity, volume and variety where variety extends from flat files, relational data, data in a relational database management system, data warehouses and datamarts, no-sql data and so on.

A core component of the analysis of this data is ETL (Extraction, transformation and loading) frameworks, but it doesn’t have to be into a traditional warehouse or data-star/snowflake.  Data in other formats can be processed by Hadoop engines (using a Google MapReduce layer and HDFS (Hadoop Distributed File System layer) to help deal with the Volume of BD with one master and multiple slaves in a cluster.