Assignment03: Who Defines Big Data?
To start of with, there is not an exact definition of Bigdata. This buzzword Bigdata is something
which is related to massive large size of datasets which keeps on increasing with time. Today’s
bigdata can be called as tomorrow’s small data. For now, data is called as Bigdata if it can’t be
solved with the traditional database technology available and it creates a problem. Different
organizations have come up with their own definitions of Bigdata:
IBM defines Bigdata as “Big data is a term applied to data sets whose size or type is beyond
the ability of traditional relational databases to capture, manage, and process the data with low-
latency. And it has one or more of the following characteristics – high volume, high velocity,
or high variety.”1
Oracle considers Gartner’s definition on Bigdata as still the go-to definition which states that
“Big data is high-volume, high-velocity and/or high-variety information assets that demand
cost-effective, innovative forms of information processing that enable enhanced insight,
decision making, and process automation.”2
Oracle defines Big data as “Big data is larger, more complex data sets, especially from new
data sources. These data sets are so voluminous that traditional data processing software just
can’t manage them. But these massive volumes of data can be used to address business
problems you wouldn’t have been able to tackle before.”3
Amazon defines Big data as “Big data can be described in terms of data management
challenges that – due to increasing volume, velocity and variety of data – cannot be solved with
traditional databases. While there are plenty of definitions for big data, most of them include
the concept of what’s commonly known as “three V’s” of big data: Volume, Variety and
Microsoft defines Big data as “Big data is the term increasingly used to describe the process of
applying serious computing power – the latest in machine learning and artificial intelligence –
to seriously massive and often highly complex sets of information.”5
Google defines Big Data as “Big data refers to data that would typically be too expensive to
store, manage, and analyse using traditional (relational and/or monolithic) database systems.
Usually, such systems are cost-inefficient because of their inflexibility for storing unstructured
data (such as images, text, and video), accommodating “high-velocity” (real-time) data, or
scaling to support very large (petabyte-scale) data volumes.”6
There is no standard authority imposed for the exact definition of Big data, but most of it mean the
same. Each of the definitions basically mentions the 3 V’s of Big Data: Volume, Velocity and
Variety. Veracity and Value are other two V’s later linked to Big data. Volume is the size of the data
captured from different sources. For some organizations, 10 Terabytes of data can be a Bigdata
whereas for other Big data analytics companies 100’s of Petabytes of data is meant to be a large
dataset. Velocity is the rate of flow of huge amounts of data. Nowadays since most of the
applications operate in real time it requires immediate response to every action. Variety is the
different types of data. Today, we get data from multiple different repositories which are structured,
unstructured or semi structured data such as text, audio, videos etc. which requires additional
processing to obtain information. Generating value through big data is another important concept. It
tells us about how much your data collected is trustworthy so as to rely on the captured dataset. Data
veracity is the most complex concept to deal upon which deals with the uncertain and abnormal
datasets which are highly unstructured.
Companies offering Big data Solutions:
1. Oracle’s Big Data Solution
?As mentioned earlier, Oracle defines Big data as “Big data is larger, more complex data sets,
especially from new data sources. These data sets are so voluminous that traditional data
processing software just can’t manage them. But these massive volumes of data can be used to
address business problems you wouldn’t have been able to tackle before.” Put simply, big data
means large collections of different datasets which are unstructured, semi-structured or
structured datasets coming from different domains. Since, the datasets are too large, it a task
for the Big Data Companies to analyse the data using the traditional database technologies. But
since these datasets are huge, they provide a lot of information and can be used to address real
world problems and finding unique ways to provide solutions. Oracle focuses mainly on the 3
V’s to provide Big Data Solutions – Volume, Velocity and Variety. But there are some faults
which Oracle definition has. Firstly, they mentioned that Big data are larger and more complex
datasets, but they haven’t mentioned how large should be a data and how complexity of the
data is determined. For some companies, multiple Terabytes of data can be large data, for some
thousands of Petabytes of data can be a Bigdata. So, it misses to mention this important fact.
Secondly, they mention that large data which come from new data sources are mostly Big data,
but it can be possible that the known sources provides more complex and large datasets from
new sources. Finally, they mention that large volume can be used to solve Bigdata problems,
but they miss to mention about the type of data. It can also be possible that the data is small but
it’s so unstructured that it can’t be solved using current technologies and so it is name Bigdata.
2. IBM’s Big Data Solution
?As mentioned earlier, IBM defines Big data as “Big data is a term applied to data sets whose
size or type is beyond the ability of traditional relational databases to capture, manage, and
process the data with low-latency. And it has one or more of the following characteristics –
high volume, high velocity, or high variety.” In other words, data is called as Big Data which
has huge size, coming from multiple sources and difficult to work upon using traditional
databases. Which consumes large amount of time to either capture, store or process the data.
This focuses mainly on the 3 V’s – Volume, Velocity and Variety. However, it lacks on
describing the Value and Veracity parameters which are equally important which provides
information about the amount of uncertain in a dataset and how much can we rely on the data.
3. Microsoft’s Big Data Solution
?As mentioned earlier, Microsoft defines Big Data as “Big data is the term increasingly used
to describe the process of applying serious computing power – the latest in machine learning
and artificial intelligence – to seriously massive and often highly complex sets of information.”
In simpler words, bigdata is most widely applied to large datasets which require enormous
power to operate. The term doesn’t describe about the different sources through which the data
needs to be collected which is of equal importance. It also lacks in describing the Value and
Veracity parameters in its definition which mention about how trustful the data is and how
much we can rely on them.
1″Analytics.” The Analytics Maturity Model (IT Best Kept Secret Is Optimization). Accessed
September 16, 2018. https://www.ibm.com/analytics/hadoop/big-data-analytics.
2 “What Is Big Data? – Gartner IT Glossary – Big Data.” Gartner IT Glossary. December 19, 2016.
Accessed September 16, 2018. https://www.gartner.com/it-glossary/big-data.
3 “Oracle Big Data.” Slowly Changing Dimensions. Accessed September 16, 2018.
4 “What Is Big Data? – Amazon Web Services (AWS).” Amazon. Accessed September 16, 2018.
5 Microsoft. “The Big Bang: How the Big Data Explosion Is Changing the World.” Stories. October
22, 2014. Accessed September 16, 2018. https://news.microsoft.com/2013/02/11/the-big-bang-how-
6 “What Is Big Data? | Cloud Big Data Solutions | Google Cloud.” Google. Accessed September
16, 2018. https://cloud.google.com/what-is-big-data/.