Big data is a term used to accept the fact that its growing at an exponential rate.However the data is of no use until we derive some facts and arrive at conclusion.Big data just means "a lot of data". If the amount of data is massive and cannot be handled or managed by existing storage tools like relational databases, its known as Big Data. Big Data is Big because it grows concurrently, at an exponential growth rate and with the development of technology. Before we understand Big Data , lets get an idea about data measurement metrics.
1 Bit = Single Binary Digit (1 or 0)
1 Byte = 8 bits
1 Kilobyte (KB) = 1,024 Bytes
1 Megabyte (MB) = 1,024 Kilobytes
1 Gigabyte (GB) = 1,024 Megabytes
1 Terabyte (TB) = 1,024 Gigabytes
1 Petabyte (PB) = 1,024 Terabytes (Big Data starts from here onwards ... )
1 Exabyte (EB) = 1,024 Petabytes
1 Zettabyte (ZB) = 1024 Exabyte
1 Yottabyte (YB) = 1024 Zettabyte
The info-graphic related to the growth rate of Internet clearly explains why data is important to us and how it grows faster than it grew a minute before. Some examples of Big Data include blogs (web logs) , social networking sites like Facebook & twitter , Internet search index, genomics, finance, CCTV video archives and call details record. According to a survey 2.5 quintillion bytes of data is created everyday and 90% of the data in the world today was created within the past two years.
Big Data can help solve bigger problems by analyzing the huge data at one hand and using an intelligent analytical tool at other hand. WATSON is one those power machines which can analyze terabytes of data in just under 2 seconds. No matter how much ever I explain about Big Data , its of no use until you visualize it. Real issue is making sense of big data and finding patterns, which will help organizations make better business decisions.
For example, some sites are now gathering the detailed mouse movements of customers in real time as they move around pages. This generates literally millions of coordinates and data points for every user, allowing companies unparalleled insights into what users are doing when they’re on a page.
Deriving Information from unstructured data :
Every time we have a mail about financial results of our company, we tend to skip reading it unless you are a manager or an into an executive role. But trust me its very important to know the financial results ... Here's a simple way to analyze this data and get the most important information from it. That's the key to Big Data.
Original Press Release of IBM's 4Q Results can be read at : http://www.ibm.com/investor/4q11/press.phtml
Now lets read the same results in a better way. Click the below chart to interact with it.The results can be easily visualized by typing search keywords like revenues , fourth , 2010 , 2011, EPS, growth markets, increased, decreased etc.
So what did is , we found important information from raw, unstructured data and then analyzed it for a business purpose.
"Big Data" poses a similar challenge to derive some useful information from such unstructured data. Data growth can be summarized mainly on three factors : Volume, Variety, Velocity
Data volume increases mainly due to daily task items of a company like RFID for employee tracking, daily transactions within the enterprise and buy the customers, and other traditional tasks.
For example, in a telecoms company the data volume increases depending upon the number of customers and their daily usage of cell phones. The data stored consists of call details like caller's time, location, call duration, cost/unit depending on the plan, billing, SMS, backups, historic information etc. So the data grows typically at a constant rate, festivals being an exception. In this case, excessive data is not only a storage issue but also a massive analysis issue.
However here historic data is available for analysis and can provide certain feedbacks to retain the customer, providing best plans to a specific customer base. For example, if a group of people have a low calling frequency on any given day during 2pm to 5pm, then an offer of 50% discount on afternoon call rates can boost the customer's calling pattern. SMS plans can be provided only the customers falling in the student's category.Networks are jammed during festivals like Christmas (globally) or Diwali (India), hence special work load deployer/balancer can be installed during such peak periods.
If the variety of data increases instead of volumes, then its a more complex task to analyze such data since there is no pattern in the data. Types of data to analyze include — information from social media and mobile, documents, e-mails, video, images, audio, stock data, financial transactions and many other. Analyzing such information requires linking all types of data and arriving at a conclusion.
A simple example would be : Using GMail, you have exhanged many emails with your friends regarding your vacation to Norway, also had a chat on GTalk with your girl friend about this, you have also been searching for flights to Norway on Google, you have searched on Google Maps ... directions from Oslo Airport to Bergen city in Norway.
Google is smart. Its tracking your GMail, Google search queries, Gtalk Chats,your Map searches and many other services also. So next time if you are reading an article on the latest elections and the website is also showing some advertisements, you will suddenly start seeing Ads like "10% discount on flights to Oslo, on booking with your CitiBank Credit Card" , "Best Hotels near Bergen" , "Travel from Oslo to Bergen with Norway Airlines" and in case you have chatted something like "Honeymoon" on GTalk, you may see ads like "SOTC Honeymoon packages" , "Norway Honeymoon packages" and the tracking continues..... If you observe there more variety in data than a pattern, but data from different sources like mail, chats, searches and maps has been linked to derive context-aware advertisements.
Velocity means both how fast data is being produced and how fast the data must be processed to meet the demand.
Example could be any real-time application like Facebook. The number of users and the amount of data with each user, is growing at a very fast rate since the users are always connected and are updating in real-time. All updates from friends are seen in a fraction of second.
We generate millions of data points every day, which with the help of applications can be used to generate pattern based information, that can be sold for better price than before. After all, every new technology produced is meant to be consumed. And Yes, currently for businesses its more about right data than BigData.