Category Archives: Hadoop

What is big Data

Several definition exists I have included some site links and comments from several web sources.


Big data is being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities and skills”

“Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information”


“Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.”

Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis ….

“Big data is a buzzword, or catch-phrase, meaning a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity”

Berkeley School of information

““Big data.” It seems like the phrase is everywhere. The term was added to the Oxford English Dictionary in 2013 and appeared in Merriam-Webster’s Collegiate Dictionary in 2014. Now, Gartner’s just-released 2014 Hype Cycle shows “big data” passing the “peak of inflated expectations” and moving on its way down into the “trough of disillusionment.” Big data is all the rage. But what does it actually mean?

A commonly repeated definition cites the three Vs: volume, velocity, and variety. But others argue that it’s not the size of data that counts, but the tools being used or the insights that can be drawn from a dataset.”


My opinion:

Big data is data that is structured, unstructured from machines, AI, Human, systems thus any object that can generate any form of data/information and which the data has volume, velocity, and variety. This data can be stored into a data store using computer based tools.







This is the start of my journey with Hadoop and most of the tools used to access, control, monitor, edit, modify and read data etc. I will try and share this journey with you but please note this will most of the time be very technical thus I will make assumptions based on my reading and this is my view of how it works if it is incorrect then post a comment and I will try and get you the correct information.

This is a Big world using Big data (no punt intended) . Thus it is a confusing world and
I will try to simplify this for everyone reading this blog – including making it simple for myself, I like simplicity.

By no means am I an expert on this I am just learning and sharing my learnings with you-you are welcome to follow my journey and make this your own.

My first source of information comes from  I am currently in the process of completing the online self passed training they provide – its good training – go for it and enjoy it.
Hortonworks University Self-Paced Learning Library –
At this stage it is over 250 modules that you need to complete and I am aware that they will be growing it to more soon, introducing more module on their platform called HDP. I will jump around trying to do labs with you, install the software etc. – assisting with configuration etc. – Lets see how it goes as you can imagine this is going to be a lot of work thus this journey will span over a few months.

Also note setup your own VM and test, play and enjoy.

Just a note I am not at all associated with hortonwork or they with me, as indicated they are my starting point to learn and develop big data solutions Please note I only use them as reference to information and I will be using terms that they might use that they own I will also try and explain how their technology fits together as I understand it.

Lets start as follow.

What is big data :