Spring Cloud Data Flow is a cloud-native orchestration service for composable microservice applications on modern runtimes. With Spring Cloud Data Flow, developers can create and orchestrate data pipelines for common use cases such as data ingest, real-time analytics, and data import/export.
Moving data out of its native repositories for statistical analysis is slow and limits the data used for machine learning to a subset of data. Apache MADlib is an open source library of machine learning algorithms designed to run on scale-out systems. This allows data scientists to quickly build features of an analytical model against large data sets in Greenplum, Apache HAWQ, and PostgreSQL.