Hadoop for the Enterprise

Turning Big Data Insights into Action

Apache Hadoop has become a key platform for big data analytics thanks to its flexibility, reliability, scalability, and ability to suit the needs of developers, data scientists and enterprise IT alike. A fast and economic way to leverage the massive amounts of data produced by new sources such as social media, mobile sensors, and Internet of Things (IoT) devices, Hadoop has become the preferred platform for storing, processing, and analyzing large multi-structured datasets.

Originally developed in 2003 by data scientists at Yahoo!, Hadoop was quickly embraced by the open source community as well as consumer-facing Internet giants such as Google and Facebook. In recent years, Hadoop has been adopted by enterprises who similarly want to gain actionable insight from new data sources. IDC has predicted the Big Data market, which includes the Hadoop software market, will be worth $48.6 billion by 2019.

Hadoop is a game changer for enterprises, transforming the economics of large-scale data analytics. It eliminates data silos and reduces the need to migrate data between storage and analytics systems, providing businesses with a more holistic view of their customers and operations, leading to quicker and more effective business insights. Its extensibility and numerous integrations can power a new generation of data-aware business applications.

“[Hadoop’s] refreshingly unique approach to data management is transforming how companies store, process, analyze and share big data,” according to Forrester analyst, Mike Gualtieri, “Forrester believes that Hadoop will become must-have infrastructure for large enterprises.”

Despite its many advantages, transitioning to Hadoop can be challenging for enterprises using proprietary data solutions and staff familiar with SQL analytics tools. Namely, analyzing data stored in Hadoop requires highly skilled data scientists, who are in perpetual short supply. To this end, Pivotal offers its Hadoop Native SQL Database, Pivotal HDB—part of the Pivotal Data Suite—that enables data scientists, business analysts, and even casual business users to analyze data in Hadoop using familiar SQL syntax.

Pivotal HDB, powered by Apache HAWQ (incubating), is the Hadoop native SQL database for data science and machine learning workloads and is the key to unlocking the business value of Hadoop. It gives data scientists and analysts the ability to perform advanced SQL analytics on data directly where it lives—in Hadoop—eliminating the need to sample data or move it to another platform. Released in early 2016, Pivotal HDB 2.0 includes performance, scalability and efficiency improvements, resulting in even lower cost-per-query and allowing practitioners to ask more questions of their data more often.

Pivotal HDB—which runs natively on any ODPI-certified Hadoop distribution, including the Hortonworks Data Platform (HDP)—also supports leading data science tools such as Apache MADlib, GraphLab (OpenMPI) and user-defined functions, as well as languages popular with data scientists such as R, Java and Python. It also integrates with Spring ecosystem projects such as Spring Cloud Data Flow, simplifying the development of data-driven applications and services. Additionally, Pivotal GemFire brings real-time analytics to Hadoop, enabling businesses to process and make critical business decisions immediately.

Hadoop provides must-have capabilities for today’s modern, digital enterprise. Together with Pivotal HDB, Hadoop allows enterprises to store, process and analyze all manner of data—structured, unstructured and everything in between—to uncover actionable insights that move the business forward. Pivotal’s engineers and developers, many of whom were integral to Hadoop’s development and evolution, have built an enterprise-grade Hadoop native SQL database, Pivotal HDB, and development continues today. Read more about their continued work on Pivotal HDB on the Pivotal blog.