Find out how we can help your digital transformation. Contact us to learn more.
The Most Advanced Hadoop® Native SQL Analytic Database
Increasingly, data for a new generation of applications is in Apache™ Hadoop® because it has drastically changed the economics and capabilities for capturing and storing today’s unrelenting data growth. However, enterprise users are often frustrated by Hadoop’s immature querying capabilities. Pivotal HDB combines the familiarity of a full ANSI SQL interface, the performance of a massively parallel processing (MPP) engine, and the power of in-database analytics for advanced analytics, data science, and machine learning at scale to help companies harness the strength of Hadoop for business transformation.
Save Time and Simplify By Running Unmodified SQL in Hadoop
Pivotal HDB’s support for standard SQL, interactive queries, and a broad range of scalar and aggregate functions (window functions, rollups and cubes, correlated sub-queries and more) presents analysts with a familiar, powerful interface that increases productivity. Via ODBC and JDBC drivers, users can also leverage a large ecosystem of data analysis and visualization tools. Pivotal HDB also supports extensions for popular languages, including PL/Python, PL/R, PL/Java, PL/pgSQL, and PL/Perl.
Strong Performance for Mixed Workloads
Pivotal HDB executes SQL inside the Hadoop cluster and operates directly on Hadoop Distributed File System (HDFS) data without translating queries to MapReduce. This delivers far faster performance compared with other Hadoop querying tools. Pivotal HDB’s MPP architecture partitions tables to facilitate parallel processing across partitions with pruning to reduce the amount of data that is processed. Partitioning, combined with various other approaches, provide performance at scale for mixed workloads, including interactive queries on large datasets, predictive and analytic workloads, and transactional workloads.
For interactive queries on large datasets, Pivotal HDB provides strong performance for a broad range of query types, validated by results from all 99 types of queries included in the TPC-DS benchmark. Interactivity for these queries, at linear scale, is delivered via a number of performance optimizations:
- Dynamic pipelining: Minimizes the data transport overhead in performing complex SQL joins on Hadoop.
- Powerful, Cost-based, Query Optimizer: Generates query execution plans after performing all possible optimizations and then chooses the best plan. Efficiently handles the most demanding queries involving more than fifty joins on large data sets.
- Concurrent queries: Uses prioritized resource queues to deliver significantly higher query throughput compared to other Hadoop SQL engines.
Pivotal HDB is the first Hadoop-native SQL engine to support ACID compliant transactions This enables the system to isolate concurrent activity on Hadoop and to rollback modifications when a fault occurs.
In-Database Analytics for Rapid Data Science
Pivotal HDB delivers high-performance machine learning functions via the open source Apache MADlib (incubating) library. MADlib provides data-parallel implementations of mathematical, statistical, and machine learning methods for structured and unstructured data. MADlib functions are implemented as user-defined functions accessible through standard SQL syntax. With MADlib, Pivotal HDB simplifies the process of data manipulation, model training, and evaluation to accelerate a wider range of analytics use cases.
More Data Sources for Deeper Insights
Pivotal HDB provides query federation using Pivotal eXtension Framework (PXF). PXF can federate data across other analytic data warehouses (ADW), enterprise data warehouses (EDW), HDFS, HBase, and Apache Hive instances. The PXF Hive plug-ins automatically detect source tables in the following formats: delimited text, SequenceFile, RCFile, ORCFile, Parquet, and Avro. PXF can also extend the data federation framework to new types of data sources. With HCatalog integration, HDB can read Hive schemas and seamlessly federate queries that include Hive data.
Pivotal HDB supports fast parallel load/unload of massive data volumes at scale, without the bottleneck of a master node. A diverse set of data sources and sinks are supported.
Enterprise-Grade Operational Flexibility
Pivotal HDB provides availability and management features that enterprises rely on for mission-critical applications:
- Fault Tolerance and High-Availability: Pivotal HDB tolerates disk-level and nodelevel failures, ensuring business continuity.
- Native Hadoop Management: Pivotal HDB plugs in with the Apache Ambari installation, management and configuration framework. This provides a Hadoopnative mechanism for installation, deployment, and monitoring of cluster resources. YARN integration allows HDB to share resources with other modules in the cluster.
- Out-of-box Integration with Hortonworks HDP: Allows customers to get started quickly on a leading Hadoop distribution without being concerned with integration.
- Compliant with ODPi framework and reference specifications: Allows portability of applications developed for the Hadoop ecosystem across multiple vendor distributions.
Pivotal HDB supports several deployment models, both on premise and in the cloud, including commodity hardware, virtualized IaaS, and EMC Elastic Cloud Storage (ECS), with an option of using EMC Isilon as the HDFS filesystem.
The polymorphic storage framework in Pivotal HDB is designed to minimize data movement and ETL, provide multiple storage options to handle diverse analytical workloads, and support Hadoop native file formats. Pivotal HDB supports a row based format (Avro) as well as a column storage format (Parquet), and other native HDFS file formats
Pivotal HDB enables enterprises to use existing SQL skills and tools to rapidly perform advanced analytics, data science, and machine learning at scale. It operates natively within Hadoop with out-of-the-box support for Apache Ambari and for native Hadoop file formats.Download the PDF