Attach Pivotal Greenplum to data pipelines for extraction, transformation, and loading (ETL) of data into Greenplum. Integrate data between different systems. Connect to any upstream data source, perform necessary preparation steps, and load in parallel into a target Greenplum cluster. Store historical data creation, updates, and deletion events (CDC) in upstream transactional databases or to replicate the current state of databases into Pivotal Greenplum. Enable users to make analytical queries on transactional datasets without impacting upstream database performance.
Get data out of disparate systems for analysis in Pivotal Greenplum. ETL frameworks have connectors to many different kinds of data sources, including proprietary and older systems.
Define pipelines for different steps of data transformation, mapping, cleansing, privacy protection, and augmentation in preparation for loading into Pivotal Greenplum.
Send transactional data to Pivotal Greenplum for reporting and analytical queries to avoid impacting the transaction processing speeds of upstream application databases.
Use Pivotal Greenplum scalability to your advantage. Store entire change histories for analysis—even when they greatly exceed the data size of current transactional stores.
Handle different latency requirements—from bulk and batch loading to frequent updates in microbatches to continuous streaming of data. Take advantage of Pivotal Greenplum’s parallel loading to speed data ingestion.
Send data event messages via Apache Kafka pipelines to scale to any velocity. Make messages available for consumption by different applications in addition to loading into Pivotal Greenplum.
Attunity Replicate empowers organizations to accelerate data replication, ingest and streaming across a wide range of heterogeneous databases, data warehouses and Big Data platforms.
Founded by the team that built Apache Kafka®, Confluent builds a streaming platform that enables companies to easily access data as real-time streams.
Gplink makes it possible to create an External Table in Pivotal Greenplum that connects to ANY JDBC connection through a gpfdist process.
HVR is a real-time data integration solution for Pivotal Greenplum with a rich feature set including log-based change data capture (CDC), bulk loading and data validation.
IBM InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and IBM InfoSphere.
Informatica is the leading provider of data integration products for ETL, data masking, data quality, data replica, data virtualization, and master data management.
Outsourcer automates everything for loading data into Pivotal Greenplum from Oracle and SQL Server.
StreamSets is an open source, enterprise-grade, continuous big data ingest infrastructure that accelerates time to analysis by bringing unprecedented transparency and processing to data in motion.