The Open Source Massively Parallel Data Warehouse

Powerful and interactive analytics on large volumes of data

  • Benefits
  • Features

Pivotal Greenplum is a commercial fully featured data warehouse powered by the open source Greenplum Database. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.

Proven Open Source Technology

Massively parallel processing architecture hardened over the course of a decade to provide advanced performance, workload management and security capabilities all via open source, available to all contributors on the community site.

Industry-Leading Performance

Unique cost-based query optimization designed for big data workloads. It can scale interactive and batch mode analytics to large datasets in the petabytes without degrading query performance and throughput.

Deep Analytics

Leverage standards-compliant SQL to support your Business Intelligence and reporting workloads. Utilize out-of-the box, scalable machine learning algorithms to deliver new analytical models and data science insights. Certified support for 3rd party visualization and analytical tools.


Incorporates key performance capabilities, flexible data analytics, robustness, seamless integration with analytics stacks and a database management framework focused on reducing total cost of ownership.

Massively-Parallel, Shared-Nothing Database

  • Shared-nothing architecture that automates parallel processing of data and queries.
  • Petabyte-scale, parallel loading – based on massively parallel processing (MPP) Scatter/Gather Streaming™ technology.
  • Industry-leading, in-database compression technology.

Flexibility in Storage and Analytics

  • Pivotal Query Optimizer, cost-based SQL query optimizer creates query plans that execute complex joins at breakthrough performance on big data volumes.
  • Polymorphic Data Storage™, processing and compression delivers optimal performance and storage efficiency.
  • Flexible partitioning of tables at multiple levels.

Advanced Analytics Platform

  • Optimized for batch jobs with high volume, interactive jobs with low latency and trickle micro-batch jobs with high throughput.
  • Rich SQL dialect with complex join operations.
  • Supports Apache MADlib (incubating), a library of massively parallel in-database, machine learning and statistical algorithms for advanced analytics.
  • Extensibility framework for custom analytics and database functions.

Built for Robustness

  • Supports business continuity features such as high availability, intelligent fault detection and fast online differential recovery, full and incremental backup and disaster recovery.
  • A rich set of security and authentication features address enterprise policy and regulatory requirements.

Seamless Integration with Analytics Stacks

  • Integrates with heterogeneous Hadoop environments.
  • Offers comprehensive SQL support with online analytical processing (OLAP) extensions.
  • Integrates with in-memory data grid and object store to post process structured data.

Best-in-Class Data Management Framework

  • Servers can be added while the database remains online and fully available.
  • Performance monitoring framework allows separation of hardware and software issues.
  • One unified framework for monitoring, administration and workload management.

Flexibility in Deployment Models

  • Software: Packaged software distribution for integration with user-provided commodity hardware running Linux OS.
  • Appliance: EMC Data Computing Appliance (DCA) – fully integrated Hardware + Software solution, available ranging from 1⁄4 rack with 4 nodes to hundreds of nodes.
  • Virtualized IaaS: In a virtualized compute plus storage environment.


Shaping the Future of Data Warehousing through Open Source Software

Contact Us

Thank you for your interest!
We will get back to you shortly.