The tech sector of the global economy generates new buzzwords almost as rapidly as it creates innovative devices and services. To remain conversant in new ideas and technologies requires near constant attention to the marketplace and its thought leaders. Recently, one’s attention would have been rightly focused on the ideas around big data and the related big data analytics. So what then do these terms mean especially for enterprise-level businesses?
Not surprisingly, big data refers to extremely large and complex data sets. In the last thirty years the world’s capacity to store information has doubled every 40 months or so – measured on a per capita basis. As a further measure of the magnitude of big data, by 2012 people worldwide created 2.5 exabytes of data per day and while a comparable number for this year doesn’t exist, certainly those numbers will only increase over time. New data storage systems keep pace with the need to retain this vast amount of data, and their increasing level of sophistication enables users at many levels to recall and access information derived from that data at dizzying speeds.
But therein lie some of the issues: transforming such enormous stored data sets into anything resembling actionable or useful information represents a daunting challenge. In fact, the challenge extends beyond by mandating the capture, curation, search, analysis, transfer and visualization of big data. To lend greater depth and clarity to the definition of big data analytics (the exact threshold remains a moving target) most would use a range from dozens of terabytes to several petabytes in a single data set.
The manipulation of such large data sets exceed the capacity of readily available database management systems and traditional relational databases. But various alternatives exist to enable big data analytics to function efficiently and effectively. They include crowdsourcing, genetic algorithms; signal processing, natural language processing and machine learning. Some solutions to the problems associated with big data analytics sound more recognizable – like as massively parallel processing (MPP) databases, cloud based infrastructure, distributed databases and data-mining grids. These specific technologies, however, are beyond the scope of this article.
The implications for these issues in the private sector are of greater concern. Big data analytics and big data visualization can be illustrated using a number of examples from our experience. Amazon, for instance, suggests products of interest to each individual user who logs into their web site. That simple task requires the instantaneous analysis of the millions of transactions they perform per minute combined with an examination of the individual user’s profile – all in relation to millions of other user profiles. Having made these comparisons, the system then needs to review the millions of discrete items in the product catalog from which to make suggestions. To store the data generated by their millions of back-end operations every day, as well as the nearly half-a-million third-party seller queries; Amazon owns the three largest Linux-based databases in the world. They contain a combined 51 terabytes of data.
A few more related examples drive home the magnitude of the problem for enterprise-level organizations. Wal-Mart completes millions of customer transactions hourly, which reside in a single database on the order of an estimated 2.5 petabytes. This is more than 167 times the total information in all the volumes in the US Library of Congress. Facebook stores and manages more than fifty billion photos. Finally, total combined business data on a worldwide scale doubles every 1.2 years.
Beyond the sheer volume of data involved, enterprises need to tackle other issues as well. Query load can be a major problem as many big data visualization systems have trouble handling a large number of concurrent users and queries. This may result in unsustainable wait times for the casual data user. Pricing then presents a problem as well. Since companies resist paying for different users to access the same information multiple times, per query pricing models are impractical. Systems providers need to address these needs and as enterprises design big data analytics they need to pay attention to these questions as they evaluate potential partners. Before implementing a big data visualization solution, enterprise managers need to fully understand their needs and select technologies that address these needs.
What’s More to Learn
If you would want to know more about big data analytics, you need to discover Metric Insights, which bridges the last mile to Business Intelligence and Big Data. Metric Insights lets your users cut through the noise, focus immediately on the critical business issues that warrant their attention, and take action. Its Push Intelligence platform connects quickly and easily to your existing business intelligence tools, big data and SaaS applications. Metric Insights uniquely delivers a patented KPI warehouse, collaboration and notification technologies that tell you when your key business metrics have changed, and, more importantly, why.