Data Pipelines
Export, transform, and load (ETL) data into Pivotal Greenplum. Continuously stream data and events. Replicate data between systems, or migrate data to Pivotal Greenplum.

Attach Pivotal Greenplum to data pipelines for extraction, transformation, and loading (ETL) of data into Greenplum. Integrate data between different systems. Connect to any upstream data source, perform necessary preparation steps, and load in parallel into a target Greenplum cluster. Store historical data creation, updates, and deletion events (CDC) in upstream transactional databases or to replicate the current state of databases into Pivotal Greenplum. Enable users to make analytical queries on transactional datasets without impacting upstream database performance.

Extract from any data source

Get data out of disparate systems for analysis in Pivotal Greenplum. ETL frameworks have connectors to many different kinds of data sources, including proprietary and older systems.

Standardize steps for data wrangling

Define pipelines for different steps of data transformation, mapping, cleansing, privacy protection, and augmentation in preparation for loading into Pivotal Greenplum.

Analyze transactional data

Send transactional data to Pivotal Greenplum for reporting and analytical queries to avoid impacting the transaction processing speeds of upstream application databases.

Store entire change histories

Use Pivotal Greenplum scalability to your advantage. Store entire change histories for analysis—even when they greatly exceed the data size of current transactional stores.

Move data at any speed

Handle different latency requirements—from bulk and batch loading to frequent updates in microbatches to continuous streaming of data. Take advantage of Pivotal Greenplum’s parallel loading to speed data ingestion.

Leverage Apache Kafka pipelines

Send data event messages via Apache Kafka pipelines to scale to any velocity. Make messages available for consumption by different applications in addition to loading into Pivotal Greenplum.

Attunity
partner

Attunity Replicate empowers organizations to accelerate data replication, ingest and streaming across a wide range of heterogeneous databases, data warehouses and Big Data platforms.

Confluent
partner

Founded by the team that built Apache Kafka®, Confluent builds a streaming platform that enables companies to easily access data as real-time streams.

Gplink
community

Gplink makes it possible to create an External Table in Pivotal Greenplum that connects to ANY JDBC connection through a gpfdist process.

HVR Software
partner

HVR is a real-time data integration solution for Pivotal Greenplum with a rich feature set including log-based change data capture (CDC), bulk loading and data validation.

IBM InfoSphere
partner

IBM InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and IBM InfoSphere.

Informatica
partner

Informatica is the leading provider of data integration products for ETL, data masking, data quality, data replica, data virtualization, and master data management.

Outsourcer
community

Outsourcer automates everything for loading data into Pivotal Greenplum from Oracle and SQL Server.

StreamSets
partner

StreamSets is an open source, enterprise-grade, continuous big data ingest infrastructure that accelerates time to analysis by bringing unprecedented transparency and processing to data in motion.

联系我们