Case Study: Pivotal HDB Shows Simplicity of ODPi Interoperability
Pivotal HDB, the Apache Hadoop® native SQL database powered by Apache HAWQ (incubating), is the first big data application successfully tested to be interoperable with multiple ODPi Runtime Compliant distributions. HDB has successfully passed internal testing on both the ODPi Reference Implementation of Hadoop, as well as one of the first ODPi Runtime Compliant distributions, Hortonworks HDP. A big milestone for Pivotal and ODPi, Pivotal product manager Vineet Goel shares how the efforts of ODPi are paying off in simplifying development of Hadoop native technologies.
You’re Thinking About Investing In Big Data? Consider These 5 Things
While many companies have taken on big data projects with success and others have faced challenges, there are some key questions to nail down before the project turns from an idea to proposal. In this post, data market strategist, Jeff Kelly poses 5 of the most important things you can ask yourself about leadership, skills gaps, change management, and more.
Fail Fast And Ask More Questions Of Your Data With HDB 2.0
Pivotal HDB 2.0, the Hadoop Native Database powered by Apache HAWQ (incubating), became generally in May 2016. This release marks a major milestone in the technology’s evolution from it’s massively parallel processing (MPP) roots towards a new category of cloud-scale analytical database, deeply integrated with the Apache Hadoop ecosystem. So, the technology is cool, but why does this really matter? In this post we look at this release through the lens of digital transformation requirements.
How To Deal With Class Imbalance And Machine Learning On Big Data
As a member of Pivotal Data Science Labs, Greg Tam penned this article as an exploration in class imbalance problems for machine learning. Pulling insights from real-world scenarios, Tam walks through a process for evaluating machine learning models by using several techniques and provides several examples of the SQL, which he runs on Pivotal HAWQ, the Hadoop Native SQL database.
An Open Source Reference Architecture For Real-Time Stock Prediction
Since inception, stock traders have used information to make decisions. Over time, this led to algorithmic trading—where machines use machine learning and predictive models to make trade decisions. Many other industries use real-time, predictive decisions to optimize business decisions, and this includes internet advertising, energy, logistics, and travel, to name a few. This post explains an architecture where open source components, including Apache Hadoop, can be used for real-time stock prediction.
Apache HAWQ: Next Step In Massively Parallel Processing
Historically, both massively parallel processing (MPP) engines and big data batch jobs, like Hadoop’s MapReduce, have had different benefits and drawbacks to their architectures. With the current release of Apache HAWQ (incubating), the best of these two separate worlds have been combined, and the design addresses previous limitations on both sides. This article, by one of Pivotal’s top big data engineers, explains the rationale and capabilities of the HAWQ engine.
Data Science Deep Dive: Applying Machine Learning To Customer Churn
In this post, Esther Vasiete, from the Pivotal Data Science Team, explains how data science and machine learning are used for predicting which customers have a high probability of leaving, also known as churn. Using examples from an actual customer engagement in the networking and communications sector, she outlines the impact of such programs and the approach to modeling data sources, features, scoring, and predictions using technologies such as Apache Hadoop and Apache MADlib. She also provides examples of churn indicators, gives sample, code, and sets the context for using machine learning in real-time applications such as CRM.
Introducing The Newly Redesigned Apache HAWQ
In September 2015, Pivotal announced that we donated the Pivotal HAWQ core to the Apache Software Foundation (ASF) and it is now an officially incubating project. Apache HAWQ is a redesign of HAWQ architecture to enable greater elasticity to meet the requirements of a growing user base. With the addition of YARN support and its acceptance as an Apache project, HAWQ is now more than ever a truly Hadoop Native SQL Engine. This blog is a technical primer for the background and architecture Apache HAWQ.
Christian Tzolov on Open Source Engineering, the ODP and Pivotal
Christian Tzolov has worked on some really amazing projects in his life—artificial intelligence, data science, and big data to name a few. As a big data and Apache Hadoop® specialist on our Field Engineering team, he spends a lot of time working on open source projects and helping customers solve problems. In this post, we get an in-depth Q&A where he shares quite a bit about his work with Apache® BigTop, Apache Ambari, Apache Zeppelin (incubating), Apache Crunch and more.
The Way to Hadoop Native SQL
In September 2015, Pivotal announced it has open sourced HAWQ and MADlib, contributing them to the Apache Software Foundation (ASF) where they are now officially listed as incubating. In this post, Pivotal’s data engineering leader, Gavin Sherry explains why HAWQ and MADlib are needed to create a Hadoop Native SQL infrastructure, and why the only way forward to do that is through open governance and and curation managed by the ASF.