March 19, 2019

GREENPLUM SUMMIT

AT POSTGRESCONF   |   NEW YORK   |   MAR 18-22, 2019
Greenplum Summit is where decision makers, data scientists, analysts, DBAs, and developers will meet to discuss, share, and shape the future of advanced data technologies.

Greenplum Database® is “Massively Parallel Postgres” for analytics, machine learning, and AI. Greenplum open source database is designed to run on any platform including on-premises, public/private clouds, and containers. It provides powerful and rapid analytics on petabyte-scale data volumes. PostgresConf is a community-driven conference series delivering the largest education and advocacy platform to the Postgres ecosystem. Over 800 attendees are anticipated at PostgresConf 2019, and we want you to be part of it.

Pivotal, as a Diamond Sponsor, has partnered with PostgresConf 2019 to present Greenplum Summit, a full-day event dedicated to Greenplum Database. At Greenplum Summit you’ll examine customer case studies, develop new skills through in-depth tutorials, share emerging best practices in Postgres-based data analytics, and envision the future of data technology while networking with your peers.

Why You Should Attend

Greenplum Summit is where the best minds in massively parallel processing (MPP), open source, Postgres, and advanced analytics/AI come together. At this event you will:

  • Understand how MPP data platforms and powerful ANSI SQL analytics are driving business transformation
  • Find new ways to derive more business value from your data assets by applying machine learning, graph, geospatial, text, and other advanced analytics to your use cases
  • Learn how to take new data analytics and AI projects from experimentation to an operational business solution
  • Discover training and career opportunities for database and analytics professionals
  • Meet-face-to-face with the developers and executives driving Greenplum Database innovation

Stay connected

Join the conversation on Twitter! Follow @PivotalData and @Greenplum and #ScaleMatters for all the news and updates.

Lieu

Greenplum Summit and PostgresConf 2019 are taking place at the Sheraton New York Times Square. Located in the center of everything, the Sheraton is close to many of New York’s most famous landmarks. Discounted hotel reservations are available.



Lieu

811 7th Avenue, W 53rd St
New York, NY 10019

Monday, March 18 @ PostgresConf
Time TBD

Accelerated, Hands-on Greenplum Training Course
Petabyte Scale Data Warehousing with Open Source Greenplum Database
Marshall Presser, Data Engineer, Pivotal

Read More

It's more than just storing and retrieving data. Equally important are loading high volume data in parallel and running analytics in the database. This hands-on session will lead you through the entire process of creating, loading, and analyzing data in the Greenplum MPP database. It's PostgreSQL, but bigger and DWH-focused.

At the end of this workshop, attendees will learn modern DWH techniques in a PostgreSQL based Massively Parallel Processing platform. This includes the basic architecture of the Greenplum Database, the parallel techniques for loading, querying, and analyzing structured and semi-structured data, as well as the tools Greenplum provides for doing analytics in the database.

Workshop Agenda:

  1. Introduction to MPP and Greenplum
  2. Distribution -- a key to good performance in Greenplum
  3. Parallel loading -- loading multi Terabytes per hour
  4. Loading from s3 and external connectivity
  5. Polymorphic storage and external partitions
  6. Compare external tables to Foreign Data Wrappers
  7. Partitioning vs. Distribution -- how they interact
  8. Difference between PG and GP partitions
  9. Query response time exercises
  10. Running Analytics in Greenplum: MADlib exercise
  11. Analyzing Free Form Text with SOLR and GPText
  12. Monitoring and Managing Greenplum with Command Center
  13. Managing Concurrency with Resource groups and Workload Manager
  14. Running PL/Python and PL/R as Trusted Languages with PL/Container

Pre-requisites: Laptop with a modern browser and SSH client; Instruction on using SSH on Windows; Basic knowledge of SQL

Users will connect to a cloud based Greenplum Cluster.

There will be a maximum of 25 attendees.

Suggested Pre work:

Videos on YouTube Channel

GP Database basics

GP & analytics

GP & MADlib

Read Less

Tuesday, March 19 @ Greenplum Summit
9:00am–9:20am

Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI
Keaton Adams, Advisory Data Engineer, Pivotal

Read More

Welcome to Greenplum Summit 2019! We are excited to come together once again to share insights and updates on the latest advances to the world's leading fully-featured, multi-cloud, open-source, Postgres-based, Massively Parallel Advanced Analytical database. In the presentations and technical deep-dives that are in the lineup for this year's Summit, you will discover just how far Greenplum has progressed in the areas of integrated analytics, system configuration and monitoring, and ease of deployment, along with advances in industry-leading performance—all delivered by an energetic team focused on open-source innovation.

This session gives an overview of Pivotal Greenplum, a platform engineered to analyze data at speed and scale, providing the flexibility customers require for the integration of a wide variety of data sets, protecting and isolating workloads with the latest available container technology, with the ability to perform advanced analytics using integrated tools such as MADlib, GPText, and PostGIS, along with a host of well-known procedural languages. All of this is accomplished through familiar tools and features that a Postgres architect, administrator, and end user will quickly adopt in order to bring powerful analytics and insights to the organization that they serve.

Read Less

9:30am–9:50am

The Present and Future of Greenplum, a Massively Parallel Postgres Database
Ivan Novick, Product Manager, Greenplum Database, Pivotal

Read More

Greenplum Database is at the forefront of global R&D for large-scale big data and analytics use cases. In this session, we will outline the new capabilities and power in Greenplum Database Version 6, as well as summarize the ongoing engineering work in progress including Postgres merging, analytics in a post-Hadoop world, GPU acceleration, high concurrency mixed workloads, Apache Kafka integration, elasticity, disaster recovery and backup, and manageability at scale.

Read Less

10:00am–10:50am

AI on Greenplum Using Apache MADlib and MADlib Flow
Frank McQuillan, Director of Product Management, Pivotal and Jarrod Vawdrey, Data Scientist, Pivotal

Read More

Advanced analytics and machine learning are rapidly growing in importance in enterprise computing. Key enterprise data typically resides in relational form, and it is inefficient to copy data between systems to perform analytical operations.

In addition to leveraging the rich set of Postgres analytics like window functions, Greenplum offers machine learning, graph analytics, statistics, and data transformations via the mature Apache MADlib open source project. These capabilities are all Postgres-compatible but designed for massively parallel use cases.

When it comes to deploying to production, modern enterprise AI deployments are ecosystems of machine learning solutions that tightly integrate a feedback loop triggering automated updates to the underlying algorithms, and thus creating closed loop machine learning systems. MADlib Flow has been designed for containerized deployment of AI pipelines to Kubernetes, where Cloud Foundry with Postgres play a key role for low latency prediction.

In this session, we will give an overview of Apache MADlib on Greenplum and MADlib Flow. Topics will include scalability results, roadmap, and an example of a real-time financial transaction fraud prevention system that continuously learns new threat signatures.

Read Less

11:00am–11:20am

A Modern Interface for Data Science on Postgres and Greenplum
Scott Hajek, Senior Data Scientist, Pivotal

Read More

Data scientists today expect to work with tools that have good abstractions and interfaces. Pure SQL is not the best interface for data science, but the power and scale of SQL-based systems can be beneficial. This talk introduces a modern interface for Postgres and Greenplum that appeals to data scientists.

The importance of good abstractions and interfaces can be seen in the dominance of R, Python, and PySpark in the data science field and the similarity between their notions of dataframes. Data scientists (DS) do not relish the thought of directly writing SQL strings by hand. Nor for that matter do application developers, hence why the latter prefer object-relational mappers like ActiveRecord, Django, etc. In addition to the cognitive benefits of abstraction, such frameworks cut out error-prone manual steps, avoid dangerous string formatting, and enable more robust testing.

So why wouldn’t a data scientist just avoid SQL-based platforms altogether? Relational databases such as Postgres offer rich analytical abilities and stability, and their MPP variants offer massive scale in storage and distributed processing. Data scientists would value the ability to harness the scale of such systems while having nice abstractions to work with.

Ibis offers DS pythonistas the best of both those worlds. It is a framework for specifying queries and transformations with deferred execution on big data platforms. It looks and feels similar to DataFrame-based tools like pandas and PySpark. Lazy execution with client-side error checking helps by making certain mistakes fail fast, and it encourages delegating all the processing to the database. Ibis already supports Postgres and thus already works with a lot of the functionality on Greenplum. Minor extensions can be made to add the functionality that is special to GPDB. Ibis supports several other pluggable backends, so code written for Postgres/Greenplum could easily be run against other systems like BigQuery, HDFS, and Impala.

Read Less

11:30am–11:50am

How Baker Hughes, a GE Company, Migrated Its Data Lake to AWS and Greenplum
Jayaraman Thiagarajan, Senior Director, Data & Analytics, Baker Hughes

Read More

Baker Hughes, a leading oil & gas company, established its Big Data presence by setting up its mission-critical Data Lake on AWS with the consolidation and migration of enterprise data from 45+ data sources, including ERP and non-ERP data flowing into Greenplum Database on AWS amounting to petabyte size storage volume in a highly complex computing environment.

The challenges included the ingestion of high volume of enterprise data (8.5 billion records) ingested into Greenplum database with the creation of analytical and consumption layers for the business users to consume the business critical information using Tableau visualization tool. The tech stack includes Greenplum, HVR, Talend and Tableau ecosystems. The ONE DataLake on AWS is one of the major strategies for BHGE for all their future initiatives in the data & analytics space - making the best use of AWS and Greenplum's MPP capabilities.

Read Less

12:00pm–12:50pm

Lunch

1:00pm–1:20pm

Greenplum and Kafka - The Single Data Platform at Insurance Australia Group
Kieran Clulow, Director, Data Engineering, Insurance Australia Group

Read More

Insurance Australia Group Limited (IAG) is a multinational insurance company headquartered in Sydney, Australia. IAG has about 15,000 employees across Australia, New Zealand and South East Asia. It has annual revenue of $12B in insurance premium and a market capitalisation over $16B. IAG has operated in Australia for 165 years and is an organisation built around the purpose to make the world a safer place.

IAG has grown by acquisition of iconic brand such as NRMA or SGIO and along the way the insurer has been working to reduce the number of policy systems and claims systems it uses, creating a unified platform.

In the past 12 months, IAG has also consolidated its legacy data warehouses into a Single Data Platform with Greenplum and integrated it with Kafka to break down data silos between business applications, and with a view to enable new use cases such as IoT.

Read Less

1:30pm–1:50pm

Driving Data Science at Scale Using Postgres and Greenplum with Dataiku
Nicolas Gakrelidz, Technical Alliances Manager, Dataiku

Read More

This session will give Data Professionals (Analytics Leaders, Data Engineers, Data Scientists, Data Analysts) a roadmap for navigating the path to Enterprise AI and driving data science at scale using Postgres/Greenplum with Dataiku.

Digital transformation are the operative words in strategic plans in enterprises across all industries. Organizations must use data to continuously develop more innovative operations, processes and products. This means embracing the shift to Enterprise AI, using the power of machine learning to enhance – not replace – humans.

To do so effectively, organizations need to:

  1. Connect Technology and Subject Matter Experts: bring all the people, from business people to analysts to data scientists, together. This happens via horizontal (team-wide) and vertical (cross-team) collaboration.
  2. Embrace Self-Service: Enable self-service analytics by creating the tools for day-to-day analysis and agile use of data.
  3. Operationalize Machine Learning: Get models out of a sandbox environment and into production to deliver real results.
  4. Build For Tomorrow: Deliver short-term projects successfully while driving an enterprise transformation strategy.

Specifically, organizations need to be able to effectively leverage the data in Postgres and Greenplum to drive this enterprise transformation. This session will explain how to make it happen covering topics including:

  • How data scientists work - processes, tools, languages, data types and more
  • Making Data Teams more productive
  • Defining technical requirements for doing data science at scale
  • Postgres and Greenplum key capabilities
  • Demo: Enterprise AI in action with Dataiku, Postgres and Greenplum
  • How to take advantage of key Postgres and Greenplum features including Apache MADlib, PostGIS, GPText for text analytics, and more

Read Less

2:00pm–2:50pm

Greenplum Expert Panel: Greenplum Operations at Scale
Ailun Qin, VP, Morgan Stanley
Additional panelists TBA
Moderator: Greg Chase, Greenplum Business Development, Pivotal

Read More

This panel will feature three leaders from organizations that run extensive operations teams that manage Greenplum Database at large scale for production and business critical use cases. We will dig into pressing issues for operations leaders as they look to have stability and order in their deployments.

Read Less

3:00pm–3:30pm

Bringing Cloud Databases On-Premises with Greenplum and Kubernetes
Oz Basarir, Staff Product Manager, Pivotal

Read More

This session will showcase how customers are using Greenplum on Kubernetes. We will start with an introduction to the product and the various partners and components that make up the ecosystem of AI, BI, ETL, data preparation and data science tools. Then, we will explain how customers can develop data-driven smart apps using this platform and operationalize AI. Finally, we will provide technical details of customer use cases.

Read Less

3:30pm–3:50pm

Break

4:00pm–4:20pm

Greenplum and the Power of The Cloud - The Marketplace Offerings across AWS, Azure, and GCP
Jon Roberts, Principal Engineer, Data Innovation Lab, Pivotal

Read More

Learn about the Pivotal Greenplum in the Cloud Marketplace products as well as the unique, cloud-only benefits.

  • Demo Deploying
  • Use Cases
  • Cloud Features

Read Less

4:30pm–4:50pm

Achieve Extreme Simplicity and Performance with Greenplum Building Blocks
Derek Comingore, Manager, Data Engineering, Pivotal

Read More

Appliances have been the enterprise standard for retaining and running data warehousing systems for decades. The driving force behind the appliance model’s massive adoption has been simplicity. Enterprise customers have sacrificed both flexibility and openness for simplicity in the appliance era. Pivotal has been busy designing an open and modern reference architecture that encompasses aspects of the traditional appliance model coupled with highly sought flexibility known as Greenplum Building Blocks (GBB).

Advanced, commodity hardware including NVMe technology is being leveraged to achieve rapid analytics and artificial intelligence. The entire GBB stack is driven by our open-source massively parallel Postgres offering, Greenplum Database. In this session, Pivotal will provide both an introductory overview and demonstration of the GBB platform.

Read Less

5:00pm–5:20pm

Maximize Greenplum For Any Use Cases: Decoupling Compute and Storage
Shivram Mani, Principal Sr Engineering Manager, Pivotal and Francisco Guerrero, Software Engineer, Pivotal

Read More

Traditional data warehouses are deployed with dedicated on-premise compute and storage. As a result, compute and storage must be scaled together and clusters must be persistently turned on in order to provide data availability at all times. In the cloud, compute and storage can be decoupled by taking advantage of the ability to request on-demand infrastructure. Greenplum in Kubernetes brings the ability to scale compute horizontally, while S3 and Azure cloud provide storage. This means they can be scaled separately depending on the data engineers’ needs, separating data processing from storage.

In this presentation, we will demonstrate the ability to decouple compute and storage in the cloud using Greenplum and Platform Extension Framework (PXF). Deploying a Greenplum cluster in Kubernetes will give us an elastic MPP database engine. Moreover, PXF will allow us to access data residing in multiple clouds. As a result, we expect increased resource utilization and flexibility, while lowering infrastructure costs.

Read Less

5:30pm–5:50pm

How We Saved the American Taxpayer $150M—and Counting
Jarrod Vawdrey, Data Scientist, Pivotal

Read More

Often times we hear of the waste and exorbitant spending of programs run by the federal government, but rarely do we hear about self-imposed cost cutting or cost savings. Join me as I describe this real life unicorn scenario, where the clever application of Postgres technologies, combined with the technical know-how of a world class data science team, was able to save enough money to pay for the National Endowment for the Arts! This program is a game changer for the government, one we look to replicate across other departments and other agencies to streamline an often inefficient organization.

Read Less

6:00pm–7:00pm

Networking Happy Hour

Wednesday, March 20 @ PostgresConf

We’ve highlighted the Greenplum and Pivotal sessions being presented at PostgresConf below. Visit the PostgresConf website to see the complete agenda for March 20-22.

Time TBD
KEYNOTE

Massively Parallel Postgres: Scale Matters
Jacque Istok, Head of Data, Pivotal and Ivan Novick, Product Manager, Greenplum Database, Pivotal

Read More

More than 2.5 quintillion bytes of data are created each and every day—and at that rate: Scale Matters. Database workloads at scale are driving some of the most impactful use cases in the world, helping to solve both industry and government’s most interesting problems. Join two of Pivotal’s data thought leaders to hear about how to solve these problems leveraging Postgres at scale, and learn what’s next when it comes to creating a modern data ecosystem for a cloud native world.

Read Less

Time TBD
TECHNICAL SESSION

Bringing DevOps to Data Science: Operationalize AI Leveraging Postgres
Sridhar Paladugu, Advisory Data Engineer and Jarrod Vawdrey, Data Scientist, Pivotal

Read More

Successful enterprise AI applications in 2019 are ecosystems of machine learning solutions that tightly integrate a feedback loop triggering automated updates to the underlying algorithms - creating closed loop machine learning systems.

In order to efficiently build and scale these systems enterprises need reliable, highly performant and extensible data tools to not only wrangle and prepare complex disjoint structured and unstructured data but also build and deploy machine learning algorithms. With the diversity of the Postgres community projects including PostGIS for Geospatial, Apache MADlib Postgres based machine learning, Massively Parallel Postgres analytics engine Greenplum for big data, procedural language extensions to the Python and R package ecosystems and now MADlib Flow for containerized deployment of AI pipelines to Kubernetes, Postgres provides one of the most compelling software solution stacks for enterprise AI available today.

During this 50-minute breakout session, the presenters will:

  • Highlight the advantages of Postgres and Postgres community projects for enterprise AI
  • Recommend a deployment template for closed loop machine learning solutions using Postgres community projects
  • Provide a pre-release preview of the MADlib Flow ML pipeline deployment project
  • Showcase the art-of-possible with Postgres as an enterprise AI software solution with a demo of a real time financial transaction fraud prevention system - built using Greenplum, MADlib and Kubernetes - that continuously learns new threat signatures and can scale to handle a high transaction throughput and low latency response requirements

Read Less



Join Us

Greenplum Summit is part of the larger PostgresConf 2019 event where you can join the Postgres community to take part in educational sessions, networking opportunities, the sponsor expo, and in-depth pre-conference training.

There is no additional cost to attend Greenplum Summit. Simply purchase a PostgresConf Platinum or Gold pass, or a Day Pass for March 19, and your ticket will give you full access to Greenplum Summit.

Get a 25% discount on PostgresConf registration with promo code 2019_GREENPLUM
S'inscrire