Extraction, Transformation & Loading (ETL)
Ease data integration with frameworks for continuous loading of streaming and microbatch data.

Attach Pivotal Greenplum to data extraction, transformation, and loading (ETL) pipelines to integrate data between systems. Connect to any upstream data source, perform necessary preparation steps, and load in parallel into a target Greenplum cluster.

Extract from any data source

Get data out of disparate systems for analysis in Pivotal Greenplum. ETL frameworks have connectors to many different kinds of data sources, including proprietary and older systems.

Architect your data lineage

Know where your data comes from, and how it flows. Apply proper data governance, access controls, and privacy protections from sources to targets for compliance.

Standardize steps for data wrangling

Define pipelines for different steps of data transformation, mapping, cleansing, privacy protection, and augmentation in preparation for loading into Pivotal Greenplum.

Move data at any speed

Handle different latency requirements—from bulk and batch loading to frequent updates in microbatches to continuous streaming of data. Take advantage of Pivotal Greenplum’s parallel loading to speed data ingestion.

Automate and monitor data pipelines

Deploy automation to perform pipeline steps and monitor execution. Ensure SLAs are met, processing steps scale, data is available, and latency is minimized.

Tailor for specific use cases

Meet specific business requirements. ETL partners provide solutions for common use cases such as master data management, customer engagement, data migration, streaming, and IOT applications.


Founded by the team that built Apache Kafka®, Confluent builds a streaming platform that enables companies to easily access data as real-time streams.

IBM InfoSphere

IBM InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and IBM InfoSphere.


Informatica is the leading provider of data integration products for ETL, data masking, data quality, data replica, data virtualization, and master data management.


StreamSets is an open source, enterprise-grade, continuous big data ingest infrastructure that accelerates time to analysis by bringing unprecedented transparency and processing to data in motion.