Find out how we can help your digital transformation. Contact us to learn more.
Changing the Economics of Big Data. Forever.
Enterprises are aware that big data analytics and applications have emerged as essential foundations for competitive differentiation. However, every enterprise has different environments, applications and data requirements.
While Hadoop offerings continue to disrupt the Big Data market and will certainly serve as the core of big data management solutions, enterprises must often work with existing traditional database, data management and legacy data warehouse solutions.
Pivotal has defined this merging of traditional and next generation data management solutions as the Business Data Lake.
STORE EVERYTHING, ANALYZE ANYTHING, BUILD THE RIGHT THING.
Pivotal Big Data Suite delivers the world’s most advanced, flexible and budget-friendly Big Data analytic platform. This enables customers to leverage any mixture of Pivotal’s powerful capabilities, when and where they need it, resulting in a powerful and unbound flexible toolset of enterprise analytics.
Join us for this “not-to-miss” webinar and learn how the Pivotal Big Data Suite can:
- Easily enable you build the next modern data infrastructure – Business Data Lake
- Save you significant costs for storing all your data with Pivotal HD
- Provide you with an interchangeable package of Pivotal’s leading Big Data technologies to fit varied business needs
- Allow you to become a Data-Driven Enterprise
Michael: Hello and welcome to today's webinar. My name is Michael Cucchi and you've joined the Pivotal webinar on the announcements surrounding our big data suite. We're going to get started right now. Thanks for joining.
Just one notice, the webinar will be recorded for on demand viewing and you can receive a link on that after the event. Thanks for joining. All right. We're going to dive right in.
We're going to leave some time at the end for Q&A. If you have questions as we work through the material and the announcement today, please just send them in the chat window to the host or the presenter and we will take a break towards the end and try and leave 10 minutes to address questions.
Today, we're going to be diving in on Pivotal data offerings and specifically a new product offering called the Pivotal big data suite. Pivotal is a lot more than just data.
We have assembled a very unique collection of technologies in human beings that we think powers the next generation of a platform to deliver data-driven application and help enterprises innovate like the Internet leaders do, leveraging the same sets of technology to collect big data components and then utilize the world set analytics capabilities on top of that collected data, which will power this next generation application, which in turn themselves generate new data sources for us to actuate on.
We feel our technologies and our human beings assembled really help enterprises build out an environment and a platform where they can iterate to innovate on their application delivery.
Once they do that, leveraging our cloud foundry platform as a service, we can then accelerate the ability that they have to iterate around this what we call, 'Virtuous circle.'
Today, we'll be diving on our data offerings. At Pivotal, we are lucky enough to have one of the broadest portfolios of data capabilities available on the market today. We'll talk about the different aspects of that data offerings.
It's really taking the data management capabilities we have and applying advanced analytics and value added services on top of that collection of data.
We have a number of the world's best data scientist which make up an organization called, 'The Pivotal data lab.' That's a group of 60 to 70 data engineers and data scientists that can assist our customers and our prospects in both architecting and designing the right mixture of technologies and also implementing data science methodologies and algorithms to provide insights from that collected data source.
Pivotal labs is a collection of 500 plus agile developers who are able to come in and actually impart the knowledge of agile development to our customer base, to not only design applications, but also leave behind best practices so that enterprises can then on their own go off and innovate in application design.
Finally, I mentioned this earlier, but all of these can be hosted on top of Pivotal Cloud Foundry which is our enterprise offering of cloud foundry, the world's most advanced platform as a service.
When you combine all these together, you end up with a really seamless ecosystem and Pivotal is able to iterate and bridge you into the next generation of application.
Let's dive into our data solutions a little bit deeper and just why Pivotal exist today. About a year ago now, Pivotal was created. It was a sub-technologies fund out of EMC and VMware and a subsequent investment from GE.
We really created to help solve this huge challenge of what we call the big data utility gap on this slide. You can see that there's this influx of new data sources being generated by customers, but also by the term Internet things.
Sensor data is one of the highest growing data sources in the world. It's not just our customers data that we need to understand. It's also thinking that data with social sources and sensor data feeds and device feeds. That's really generating just a title wave of new data sources.
When you take that set of new data and trying to sort on our historic data management solutions, what happens is you're forced to not keep all that data.
As you can see here, we store only a portion of the data being generated and then we'll get even more worrisome or critical that we overcome is the fact that once we have the data, we have to actually be able to analyze and prepare the data for analysis, which is a really small percentage of that data that's actually being allies. As you can see here, 0.5% of the 3% that's prepared is actually analyzed.
Then really, the worst-case scenario or the final step to get over is, once you've analyzed the data, you have to do something with the data analysis, the insights that were provided.
Our data scientist have a little joke they say that when they're done their work, they know they fail that it just ends up in a power point on some executive deck as a chart, demonstrating some data points from the data.
Really what you have to do today to become a competitive in the space is not only obtain the insight from the data, but then operationalize it in the form of change to your business process or a new application that can power either a consumer innovation and a product innovation or again a business process or efficiency innovation.
What Pivotal is built to do is help our customers collect this data, provide an extremely flexible methodology to analyze the data and then go that extra step and help customers actually operationalize that data in the form of new business processes or new applications.
We work with a lot of customers helping them take this journey. The truth is, that every customer is different and every environment is different. There's customers that have actually already started the steps to be able to capture these new data sources. They need help finding the insights inside of that data.
There's customers that have not and they're only storing 2% of the data that they really need to be storing. They need help from us in terms of just being able to archive and collect the data in order to obtain those insights.
Then we have customers that have done a great job of collecting the data and are doing very effective analytics on it, but they need help building those application.
From those applications, we generate these new business models and then we iterate across of, which is the repeatable framework you see on this slide.
I'm about to introduce the term, 'Data lake or business data lake.'
That's really the first step in this journey, is expanding from a silo data management solution into a much broader and open and seamless data infrastructure that we call the data lake. I'll define that for you in just a second.
Once you've actually build out this heterogeneous data management solution that provides access to all of those entitled users so that they can get to the data and analyze that data.
You're then able to start thinking about how you can change your application ecosystem in your business from that data infrastructure. As you can see here, all of these leads us to our platform as a service offering, where you'll be able to obtain both in platform data services but also connects your platform as a service out to external data lake resources in the form of connection brokers.
As you could see on the bottom of this slide, as we mature our data infrastructures and our application ecosystems, the impact that it has to the business really balloons.
What you might be thinking of as an IT innovation so that your operations team can say yes to your line of business owners when they have a technical request, it starts there building out an infrastructure, but very quickly that new infrastructure is changing efficiencies and cost scales and it's applying new technology values to new requirements.
That starts to impact your IT leaders and your line of business owners. Ultimately this is about innovating your entire business. What starts as a IT innovation very quickly starts the manifest across the entire organization.
Diving into what it takes to actually get to this journey and to be able to walk across the different stages of this maturation that has to happen. Really what's going on is if you step back a 100-feet and take a look at the problem, we have new data challenges today.
What used to be a rather regulatory and focused and narrow use for data management and data warehousing has spawned into this new used cases. Not only do the data sizes grow, but the way that we need to leverage that data and analyze that data has changed.
On this slide, you can see basically the three main used cases. They're the trend in the industry called, 'The lammed to architecture.' This is just another take on that lammed architecture.
You really need to be able to have real time, near time, or interactive capabilities and batch capabilities. When you apply these different used cases to a traditional data management solution, you have used cases that just don't fit the technology set.
If you throw real time data challenges and the traditional data warehouse, you run into real performance problems. If you throw extremely complex at hut queries and analytics against your data warehouse you're running a problem.
There are used cases that the data warehouse is perfectly fit for. What we're talking about here is building out a variety of heterogeneous data management capabilities to supplement or surround your existing data management strategy.
For those of you that are start up companies, just building out a fresh architecture, terrific, a lot of the people on the call are jealous of you because you can start from scratch and build an ideal world and hopscotch a lot of the traditional investments.
For those of you that have a [one way 00:10:59] investment in data management technologies, what we're talking about is supplementing your existing investments with technologies that are feature fit and price fit to fit that used case instead of trying to handle everything with the hammer, just because that's your toolkit. We're talking about expanding your toolkits to fit these used cases.
From real time examples, Pivotal has customers today that in real time in sub-second time, they're running analytic models on things like cellphone call connections and cellphone routing for example, or making a decision about a fraud determination about a user or consumer on their website. That has to be done very quickly.
In an e-retail world, if a checkout process takes over three seconds, it's proven that shoppers will actually abandon the website and go use another website. We want to reduce fraud obviously and we want to make sure that we're not processing fraudulent purchases.
At the same time, we have to get that work done in an extremely time sensitive factor. We have this real time used cases and we'll talk about how Pivotal helps customers overcome this challenge.
This is a bleeding edge requirement and if you toss the real time requirement towards a traditional data management infrastructure, you just wouldn’t be able to get the work done in time.
Then you have near time or what some people call interactive requirements on data. These are time sensitive, but not sub-second critical such as real time.
Here you have a human being waiting for an answer or an application doing a transactional behavior. That is time sensitive but it might take multiple seconds or minutes. This might be the ad HAWQ analytical queries that I mentioned or transactional application interacting with big data store.
Then you have batch requirements. Here's a great example of where we were just using technology that was too expensive for the requirement.
With data transformations, with batch loads, if you acquire a company and you're trying to inject all of the data from the new company's data infrastructure into your existing one, or you're running monthly or weekly financial analysis and processing jobs, nobody is going to die and millions of dollars are not going to be lost if it takes over an hour or potentially a day.
When you have this batch used cases, you really don't want to throw a high value, high technology solution at the problem.
This is the industry or the section of data management that is being extremely impacted by new technologies like Hadoop or HDFS, because now we have the ability to leverage commercial off-the-shelf hardware and in a lot of cases extremely cost effective or open source solution to solve the problem that we used to throw or rather expensive enterprise data management technology at.
As this new used cases emerge, we want to approach them with different data management technology. Most of these companies again those of us that have been around or managing a company that's been an existence or an enterprise that builds out a data management infrastructure already.
What happens is we build out our infrastructure in silos. Finance has a data management infrastructure and manufacturing and marketing and IT has a data management infrastructure.
In most cases, all of these silos have those three requirements for real time interactive and batch. You end up with data that is isolated in this system.
In worth for marketing to get that data from finance that would help them understand how their lead management and advertising placements are impacting the bottom line of a business, they have to actually extract that data from the finance silo, process it and move it into the marketing silo in order to obtain any insights off of that.
It's extremely inefficient from a time perspective, but it's also very inefficient from a storage perspective and a cost perspective. What we want to do is find a way to obtain a commonality across all these data sources and not these silos down.
Supply the real time interactive and batch requirements with the right technology set and then entitle it, enable those marketing people to get to that finance data when they're permitted to seamlessly without requiring data movements.
When you take these data growths that I already talked about and you apply them to these silos, basically every one of these independent data investments is getting toppled over and it's over flowing with data requirements and that's why all of us are forced to do a drastic transforms on the data as we're processing it or more critically drop the data entirely and not keep it.
In the beginning of April this month, Pivotal announced the availability of something called, 'The Pivotal big data suite.' When we designed it, we really were focused on providing our customers and prospects and the marketplace as a whole with these three facets of ability.
We wanted our customers be able to store absolutely everything, so that they didn't have to transform that data and drop what could be critical data on the floor and not store it.
In fact, there's specifics that I will go into today that show that as your data size increases, the complexity of your algorithms simplifies.
When you're only keeping 5% of the data that you could be in your environment, you're forced to write this extremely complex, critical algorithm.
As you start to increase your data storage and keep more ancillary data around, you can use more brand force less scientifically complex algorithms to obtain the same insights. In fact you can obtain it quicker.
We want to enable you to keep all of your data, regardless of data type and let's talk about the differences there. We want you to be able to keep it forever and put it in a single place that this idea of a business data lake that I mentioned, so that your marketing resources and your finance resources can get the data they need, when they need it.
When they get to that data, we want them to be able to analyze any of it. Pivotal has the benefit of decades where data management massively parallel all database, capabilities, ad HAWQ analytical queries and some of the ... Actually the pioneers of some of the most popular data science algorithm libraries in use today.
We are able to roll that heritage, that strong heritage and intellectual property and technology into this new capability so that you have multiple methodologies to analyze your data, depending on those used cases we mentioned whether it's a real time or an interactive or a batch analytical requirement. We have the right tool to bring the bear on that processing requirement.
All of these enables your enterprise to really go off and build the right thing, whether that's overhauling your fleet management or routing of delivery trucks or overhauling a fuel consumption for your organization or staffing requirements or the scheduling requirements for nurses and medical facilities, or the treatment requirements, or building that next consumer facing application or a value added service on top of an existing product offering you have,
Really what we're doing here is building a big data infrastructure so that your businesses can run off and innovate as needed to compete today.
I promise to dive on this business data lake a bit.
This is a simplified version. We'd be happy to talk with you about how a data lake implementation can happen in your environment and everybody's just going to look different.
There's components of what you see on this slide that may or may not be a part of your solution. Really, what we've done here is that in order to take advantage of big data today, you need to be able to take those requirements I mentioned of real time interactive and batch and you need to be able to ingest data sources in those multiple methodologies both real time interactively and batch.
Once you've ingested that data, you need to manage your systems and your human beings and your access privileges and your data managements and work flow and data processing in a way that it enables the exact same methodologies on the action tier or the operational tier.
You're ingesting in real time interactively and batch, you're moving the data to the right data infrastructure and you're giving the users access to that data infrastructure when and how they need it.
Then your applications and your data scientist and your applications developers and your customers and your partners are all able to take action on that data on the right storage technology in all three of these methodologies, real time, interactive and batch. I'll give an example of that in just a minute.
Pivotal is really unique as I mentioned in having the extremely broad product offering both in the data management perspective, but also in an application infrastructure and application framework perspective.
As you can see here, we have a number of technologies that I'll dive in just a second from our enterprise to do offerings into our massively parallel analytical database, Greenplum and up into in-memory technologies called, 'GemFire, SQLFire and GemFire XD.'
Then very uniquely for us, we're able to provide and persist the data management technologies up into an application developer framework known as, 'Spring source.'
Using Spring XD which you can see on this slide in multiple phases, we're able to actually connect our data management or a data fabric up into the application framework, so that developers, as they're developing application can seamlessly call these data management technologies.
When they have a real time data ingest requirement or a real time stream processing requirement, they're able to actually leverage that in the developer's framework, which streamlines a lot of the implementation requirement, but also delivers the power of big data into that developer's hand so that as they're developing applications, they could seamlessly leverage this underlying infrastructure.
On the data ingest portion of this requirement, we have multiple technologies for this multiple requirements. Whether it's a streaming or micro batch used case, we have GemFire, SQLFire or GemFire XD, which is in-memory technology that can do both ingest and processing of data.
In the case of SQLFire, it's actually a full anti-compliant sequel in-memory database that can actually do transactional workloads for applications in-memory.
With GemFire, we have in-memory data grid. In both cases whether it's structured or semi-structured around structured data, you have this in-memory capability for both stream processing, analytics, real time analytics processing, scoring and modeling of data as it's ingested.
Something that's really interesting about these technologies is they can effectively remove the requirement for ETL entirely.
As the data is actually being ingested in real time, we're able to take action and process that data, which can remove the requirement for storing the data and subsequently extracting and transforming the data for use, so we can effectively process on ingest in real time without slowing down any of the performance or providing any extended latency for that data.
Again here you see how that plugs in to Spring XD inside the developer framework so that they can call that in-memory layer, whether it's a SQL interface or data grid interface for applications to leverage the in-memory layer.
Obviously, I skipped over this but on the batch requirements, Pivotal has a unique solution called, 'Data loader' that can do extremely parallel ingest requirements for your larger data sets.
Once that data's inside of the correct data management layer, we'll talk about the different data management infrastructures we can provide. You really need to provide the right set of analytic capabilities on top of it.
For this, we have a set of different technologies and we also obviously have an enterprise to do distribution that provides all the popular capabilities, libraries and processes that are included and applications that are included inside of a common apache to do distribution.
Things like Hbase Hive, MapReduce, all fully supported in our enterprise offering. I'll demonstrate to you in a second how the combination of these technologies enables yo to have a single source of truth.
You're able to basically do real time processing, interactive processing and batch processing on the same data set. You're also able to leverage the Hadoop technologies that you know and love from the Apache distribution to process those same data types.
You can be doing in-memory processing on files that are resonant inside of HDFS or you can persist data from in-memory down to HDFS. Then you can process that HDFS stored data either with our extremely powerful ad HAWQ analytical query engine, or with the common distribution capabilities inside of Apache Hadoop.
The data distillation capabilities, we also have been spending a lot of times writing unique connector capabilities and transactions distill capabilities across all these offerings, so that you can move data between in-memory layer down to the persistent HDFS layer or out on to our extremely high-performance analytical database Greenplum.
With this methodology, we're able to streamline data movement and process data and rest whenever possible.
Another point we're making on this previous slide is that Apache Hadoop and big data management in general is not easy. There's a significant learning curve that's associated we're starting to leverage HDFS and Hadoop inside your infrastructure.
What we've really paid a lot of attention to is closing the skill gap for our customers, so that you don't have to run off and learn a whole bunch of MapReduce for example.
We have a number of a 100% compliant SQL interfaces so that the SQL applications and SQL engineers and SQL data scientists don't have to reinvent that we all in Oracle leverage the platform.
We also were able to natively run R and Python and Java calls and data science application so that you don't have to do translations. We do this all through broad user defined function capability and we can natively execute Python, Java, R and obviously SQL.
The difference why I was earlier talking about supplementing in a traditional EEW methodology with our [inaudible 00:27:04] of a business data lake is all of these new requirements, not just the ingest and processing requirements we talked about, but also just data types and data quality and the ability to integrate unstructured with semi-structured and structured data sources. Then of course the mixed workloads we talked about on the previous slide.
Now let's get into exactly what we announced and why we think it's so important. When we started innovating and introducing the idea of a business data lake to our customers and our prospects and analysts and the press, everybody's nodding their head and said, 'Absolutely, that is what I want. Please build me a business data lake.'
The good news is that in a very unique capabilities from Pivotal enabled us to come in and actually provide the majority of the requirements to build out that data lake.
We have a number of really strong partners and partnered with our federation partners, EMC, VMware and a number of really strategic next generation service providers like Cap Gemini for example.
We're able to come in and wrap our data management technologies with humans intelligence and other third party software packages to build out this data lakes.
The good news is we had all of the right data management capabilities right inside of our portfolio. I'm just going to take a second to walk down them so that everybody's familiar with them.
We have about a decade of experience developing one of the world's highest performance and most advanced analytical databases that's known as Greenplum DB. This is a massively parallel processing, ad HAWQ analytic database.
This is where you move your data sets in order to do extremely high performance complex analytics on it. We would argue for the last decade, 'This is really what big data analytics was. It was powered by technologies like Green plum, massively parallel processing databases.'
Unique to Pivotal about two years ago, Pivotal saw the momentum with HDFS and with Hadoop. I'm proud to say that we invented early in developing our own enterprise to do distributions. That is now called, 'Pivotal HD.'
Some of you on the call may have known that as Greenplum HD prior to Pivotal's inception.
We were very early adopters of Hadoop and we've been hard work hardening and integrating both standard Apache Hadoop capabilities with our proprietary analytic capabilities and services. That's all bundled up in what we call, 'Pivotal HD.'
Because of that early investment and believed in Hadoop as a disruptor and really truly the next foundation for data management, we believe HDFS is at the core of the next generation of data management.
We started again very early on porting the query engine and the query optimizer from Greenplum. We're taking this decade worth of analytical MPP query design. We ported it on top of HDFS, on top of Pivotal HD and that's called, 'HAWQ.'
That is a parallel query engine that is a 100% SQL compliant. It's the highest performance query at SQL query engine on top of Hadoop available today. It inherited all of that intellectual property that we work so hard to build Greenplum over the last 10 years.
Then out of VMware's assets responding to Pivotal where GemFire and SQLFire which are the in-memory technologies that I mentioned both in in-memory data grid and a SQL compliant interface to that in-memory data grid known as SQLFire.
At the same time that we started migrating and porting the Greenplum technology on to HDFS with HAWQ, we did the exact same thing with SQLFire. GemFire XD that you see here is SQLFire ported on top of HDFS.
I know that there is a lot of in-memory momentum in the market which is terrific. We love the idea. We believe in-memory is a key component of both the Hadoop stack, but we also are once again bringing decades worth of in-memory capabilities are in top of Hadoop.
GemFire XD is truly majorly differentiated from technologies like Apache's part for example. GemFire XD is a SQL compliant in-memory database that's able to actually completely offload OLTP requirements to in-memory.
Extremely high concurrency application, extremely low latency requirements, in-memory analytics processing on top of Hadoop, that's what you get with GemFire XD.
That's the good news. We had all the right tools in the tool belt to build out this next generation data lakes. The bad news was that it was very complicated to consume and to leverage these technologies.
As you can see on the right side of the slide, our unit of measure across all of this technology is very widely. When a customer said, 'Yes, please build me a data lake.' It was actually pretty hideous to determine what technologies are required.
This is actually one of the major headwinds I've heard talking to a lot of customers. When they see the data lake, they want to move into it and they don't know where to start.
When you look at their data sets and their processing requirements you might say, 'That is an MTP analytic database requirement. You really need to start with Greenplum.'
The problem is that you don't know what your next challenge is going to be with data processing. As you do those analytics with Greenplum, you might determine an analytic model that needs to run in-memory.
Now you're coming back and you're redesigning your data management strategy and you have to go now implement GemFire or SQLFire or GemFire XD to now implement the analytics that you determine with the analytics database.
You just don't know what you're going to need until you need it is effectively what I'm saying. What we did was we decided to basically tie together all of our capabilities into a single offerings. The first thing we had to do was rationalize on the unit of measure.
Prior to this, you would have to determine how much data you wanted in the analytical database. How many nodes of Hadoop did your environment require? On top of that Hadoop, how much processing SQL queries you need on top of that HDFS layer? How many nodes of HAWQ?
Then as I pointed out earlier, how much GemFire, how much SQLFire? Are you going to want to do in-memory on top of HDFS because that's going to require GemFire XD.
What we did was we rationalized on the core, the CPU core. We think this is really important. As I started this webinar thing, we want our customers to be able to store everything forever. It just does not make sense to tax or charge our customers for every terabytes that they move into their big data store.
One one hand, we're telling the customer to keep all their data because it's going to help them innovate and it's going to fill their data length with its valuable data. They can do less complex queries across larger data sets. They'll be able to build new applications.
We're ending you a new bill for every terabyte that you add to the infrastructure. It just wasn't going to work.
Instead of rationalizing on node or on terabyte stored, we really viewed our offering in a very what we think is customer focused way. We feel like we're delivering unique value than market with our value added services, with the ability to do the world's highest performance SQL queries on top of HDFS with HAWQ.
What links to that processing requirement is the CPU core. After rationalizing on CPU core, we then decided to bundle together all of our entire portfolio of data management capabilities. That's what this product announcement on April 2nd was, 'The Pivotal big data suite.'
What this is, it's a subscription service that entitles you to leverage absolutely any of the data management technology that I've been mentioning this entire webinar.
You get to use GemFire, SQLFire, GemFire XD, the analytical database Greenplum and it's inherited HAWQ query engine on top of the two and you get to decide when and where you use them.
Probably most importantly is this bottom layer, which is with the subscription we're providing unlimited use of our enterprise Pivotal HD, our Hadoop offering, fully supported, you can actually now grow and store absolutely everything in your environment without the tax of a subsequent software cost for every additional terabyte or node that you put in your environment.
Subscribe is what the big data suite can deploy ten, a hundred, a thousand, 4,000 nodes. Some of our largest customers are upwards of 2,000 nodes of Pivotal HD. You get to do that without incurring any additional cost.
Then on top of that Hadoop infrastructure, you better able to flexibly deploy any of these other five technologies that you see above the batch layer in this diagram.
Then outside of the subscription, we obviously can bring to bear this 60 plus world's leading data scientist to help you design the right mixture of these technologies and then designing algorithms and data science to dig into the data that you're storing inside your data lake to build those insights.
Then of course the 500 to 600 agile developers inside of Pivotal labs that can come in and help you learn agile development. They can independently develop big data driven applications for you, but more importantly they can teach your developers to self sustain and move forward building the next generation of big data driven application on top of this underlying flexible subscription.
Before I start taking questions let me just give you a little bit more detail on the offering. It is a subscription based solution as I mentioned. We're initially offering two and three-years subscriptions to the entire data portfolio at Pivotal.
It's a software only subscription. It's core based as I mentioned. It scales per core. You would subscribe to the big data suite and you would obtain a concurrent core license, a number of CPU cores that you can leverage across any of the software technologies I mentioned.
We are providing discounts for the three-year subscription to encourage customers to invest in Pivotal as you're literally a partner for building out a big data infrastructure moving forward and this flexible licensing piece I'm going to talk about in just one second.
What I think is probably the biggest headline of this offering is the ability to seamlessly scale your Pivotal HD infrastructure without incurring subsequent costs as your data size grows.
What's this type of technology enable to something that we call, 'Close loop analytics?' Some of us call it, 'Self-learning analytics.' Let me walk through out as my work. This slide is supposed to build but it's not because I'm presenting so I apologize.
Basically what can happen with these technologies with the mixture of this real time interactive and batch technologies inside of a data lake. You can effectively leverage GemFire, SQLFire or GemFire XD, which is our in-memory technology.
On ingest in real time, as the center piece are coming back from a wind farm or a cellphone calls being initiated, or an online application is making a transactional request.
In real time, in path, you can run analytics on that data as it's being ingested. As you're running with analytics and you're supplying this real time functionality, both from an OLTP or an OLAP requirement perspective, you're simultaneously persisting that data down into your big data container of HDFS, Pivotal HD.
You're taking real time action. You're potentially transforming the data in real time in memory or you're potentially doing transactional request response in-memory and you're persisting the actual native data down into HDFS.
Then in Pivotal HD, you could leverage MapReduce from a traditional open source Apache perspective or you could leverage our ad HAWQ analytical query engine, SQL query engine on top Hadoop HAWQ on the same data set.
That can enable you to provide the next model refresh or the next analytical algorithm that you can then refresh or ingest back up in the GemFire in memory and you're effectively iterating on your algorithm to make it more and more accurate over time based on real time results or you're updating your online application in real time through this closed loop capability.
Your interactive requirements and your batch requirements are effectively happening offline and your real time requirements are happening online or in-line.
Of course because HAWQ is based on the Greenplum database technology and the query engine that's extremely maturing inside of Greenplum, all of your standard analytical applications and data science applications and business intelligence application can simply connect to the SQL compliant query engine and functions down on that same exact common data store.
This type of methodology is getting you what I was talking about earlier about breaking down these silos. You have both the real time and interactive and a batch access requirement by multiple stakeholders, whether it's a business intelligent requirement, a live application or a real time data ingest requirement.
It's happening on all of the same data sets, but the data is being managed with the technology that makes sense for the used case. A memory for real time, a massively parallel query engine for interactive and obviously traditional Apache Hadoop capabilities for batch.
Lastly, I just want to talk a little bit about this flexible licensing capability because we think it's extremely differentiated and unique offering in the space.
Many of you on the call are probably happy Greenplum database customers and you can continue to leverage the Greenplum database that you know and love.
Then as you start to leverage this unlimited Pivotal HD capability and start storing the 97% of the data that you've probably been dropping on the floor deleting and you start to build out this data lake and fill your data lake with all this potentially critical data that you've never seen before and you're using the open source Apache or Pivotal provided query tools on that and you find new insights inside that data.
With flexible licensing, you are able to either distribute licenses that have not been used yet or you could potentially decommission some of your Greenplum database infrastructure and you can re-leverage those licenses to bring up GemFire XD and HAWQ so that all of a sudden you can now do powerful ad HAWQ analytic queries against your HDFS data store.
You can power or inject those insights in those algorithms and those new models into GemFire XD so that you can basically transition as required or transition as value is derived from this other portion of your data management infrastructure.
Once again, you continue to leverage that MPP database you know and love today. You start storing the data types you've never stored before. As new value is derived from those data types, you are empowered to move your license locations as long as you go to see the concurrent license limit.
You can, inside of a single day or even inside of a single hour decommission some Greenplum, turn on HAWQ, turn on GemFire XD, obtain new insights. If you want to, decommission them again and go back to the original infrastructure.
With this subscription, we really feel that we're delivering to the market this flexible toolkit, but more importantly we're giving the customer and our prospects the power to leverage the right technology at the right time.
We're removing the handcuffs of saying, 'You have to buy more from us in order to be flexible about how you migrate and take that big data journey that we talked about,' at the very beginning of the slide deck.
We're running down on time. I want to leave a good 10 minutes for Q&A or give people back 10 minutes of their busy day. Obviously, today discussed, this thought leadership or reference design architecture that Pivotal introduced about seven months ago called, 'The business data lake.'
It's basically a heterogeneous mixture of data management technologies with a common processing and entitlement ingest and operational capability across the same data set.
With the big data suite, we're now delivering to the market the product and the consumption model that can power these data lakes. Better yet, they could enable customers to migrate into a data lake methodology over time at their own pace, without being locked in to technology investment decision that might not be applicable in this rapidly changing environment where your requirements could change by day.
With a combination of the business data lake and the Pivotal big data suite, we really feel we're delivering the capability to store everything, analyze anything and then subsequently build the right application from that underlying business data lake.
With that, I just want to leave you all with a two links in ways to find out more information and contact us. You can go to gopivotal.com today and on the very front page you will see the announcement about the big data suite or you can file the URL that's at the bottom of this slide to get to the product pages associated.
Inside that page, gopivotal.com/bigdata, you will find a set of white papers and data sheets covering all the technologies and the business data lake that I mentioned.
We have a number of analysts reports and briefs on honor announcement. I think there's three different analyst reports on both our vision of the business data lake and the impact of the big data suite that's going to have to power those.
We've also developed a very easy use value tool that's linked and available right on our website publicly that enables you to analyze how the big data suite can change both how much data you're storing and also empower you with [inaudible 00:47:47] set of analytical capabilities on that data that you've stored.
Finally but obviously last but not least is you can obtain the actual technologies we're talking about and you can evaluate this mixture of massively parallel analytical database capabilities, the ad HAWQ query engine on top of Hadoop, our Pivotal HD enterprise Hadoop distribution.
You can in fact obtain a lot of this inside of a self-contained VM. Do hop on gopivotal.com to get full information, analyst opinions on what we just delivered to market and also actually test drive the technologies we're talking about.
With that, let me just quickly move over to the WebEx interface and handle any questions that we may have had while I was presenting.
Please comment on Intel buying into Cloudera. That's a great question actually. We obviously are really passionate about HDFS and the Hadoop ecosystem. We were excited to see Intel's investment in Cloudera. It really validates the early vision that we had that HDFS and Hadoop is going to power the next generation of data management.
We really do believe that at the core is HDFS, what I think we differentiate ourselves on is the fact that we are the only vendor in the states that is bridging these heritage technologies for data management onto the HDFS stack.
Again, on top of Pivotal HD which is our enterprise Hadoop distribution, you're able to leverage the same query engine and query optimizer that is inside of the Greenplum database.
That delivers with that extremely high performance ad HAWQ queries over HDFS and then again the GemFire technology which is extremely unique in the space, now available on top of Pivotal HD.
The investment in Cloudera really tells us that HDFS is here to stay and it validates the fact that we've poured so much time, effort, intelligence and money into creating what we think is the most unique Hadoop stack available today.
There are some questions around the Hadoop distribution landscape. What I think I'll use that question to take the opportunity to point out that Pivotal HD is based on Apache Hadoop and is a 100% open source compliant.
Then what we do is we harden the open source distribution. We do QA and regression testing across it. Then we do the exact same thing with all the popular packages like Hive, HBase, et cetera.
Then we actually do a scale out testing which is very unique to Pivotal. Pivotal maintains something called the analytics workbench which is a thousand node cluster. It's the largest publicly available Hadoop cluster in the world today. A thousand nodes and we actually do scale out testing of Pivotal HD before leaping at the market.
We're the only vendor that does that or test our distribution and at those scales. One other slight plugs customers can test drive the analytic workbench.
If any of the listeners today would like to actually obtain access to this thousand node cluster with the Pivotal HD stack available to them, please get in touch and we can reserve a time slides for you to actually hop on this thousand node cluster and test drive the world's largest Hadoop cluster.
The differentiation points for Pivotal are that it is an enterprise distribution but it is a 100% compliant and based on the open source apache. Then we add our own proprietary capabilities around install manage and monitoring.
We have some proprietary capabilities around virtualization and how Pivotal HD is hosted on virtual environment. When you host Hadoop on virtual infrastructure, you need to understand the virtual infrastructure underneath because Hadoop, the HDFS file system was designed for bear bones infrastructure.
The redundancy, the data redundancy built into the file system can be broken when it's run inside a virtual environment without diving too deep there. You need to have innate understanding of the virtual environment.
We've done some very advanced work there so that Pivotal HD is extremely reliable in robots and maintains data redundancy in virtual infrastructure.
We have a number of proprietary data ingest capabilities, data connector capabilities, I mentioned PXF, which is the Pivotal Extension Framework. This technology allows all of the different data management solutions I mentioned to access common data types along with Apache Hadoop open source data types like [inaudible 00:53:20] and the combination of all these makes our HD offering extremely differentiated.
Under the covers for those of you that are in love with Apache Hadoop, the open source distribution, under the cover of Pivotal HD is the core of the open source Apache Hadoop distribution.
Spring XD, couple questions on that. I think we'll save that for a follow up webinar. We did release the next version of Spring Source which included the next version of Spring XD.
Spring XD is an extremely powerful data components to the Spring Source developer framework that allows you to make seamless calls from the applications layer, down into the data framework.
There's a question about Spring XD providing ETL capabilities. That's absolutely right and in fact you can do ETL inside the application as the data's being managed by the application before it's sent into the data management layer.
We believe that extremely made capability from Pivotal and you'll see continued investment from Pivotal to link our application fabric capabilities, which include Spring Source and TC server and a web server as well, along with a messaging cue and raise key value store.
We're going to be linking all of these powerful big data management capabilities into that application developer framework.
Also subsequently linking all of the data management capabilities you've heard about today, down into our platform as a service. The combination is going to provide developers and IT operations team and data managers and data architects and engineers, all with a common framework that is integrated very tightly, which again I believe Pivotal is one of the very few vendors on the market that can actually do that.
Interested to know updates on working with the data lakes architecture. Just get in touch with us. We'd be happy to come in and bring some of our data science resources and data architects from the Pivotal data labs to come in and work with you all to help assess your situations and provide an idea of how a data lake can impact your environment.
Can the data lake from Pivotal be on private? Absolutely. A lot of the technologies we're talking about here today can be installed on private infrastructure both on bear bones, physical hardware or in virtual environments for all the technologies mentioned, whether that's a private cloud or a public cloud.
What Pivotal Cloud Foundry does as a platform as a service is expands across your private cloud or your virtualized data center and out to your infrastructure as a service public cloud environment.
It provides you with a new next generation hyper visor if you will or operating system that stands across an abstract the heterogeneous infrastructure.
You can take the big data suite technologies and install them physically in your own data center. You can install them virtually in your own virtualized data center. You can install them on infrastructures of service or you can install all of these and leverage all of these on top of our platform as a service, which would create a unified environment across all those infrastructures I just mentioned.
Another question about private ... It looks like that was asked a few times. Questions about Apec and about data servers. Pivotal does not provide infrastructure as a service. Pivotal does not provide data as a service today. Pivotal provides the technologies so that your enterprise can become a next generation data-drive enterprise.
There will be partners of ours then leverage Pivotal Cloud Foundry or platform as a service and our data service is inside of that to provide data as a service and platform as a service offering to you all to be consumed as a service.
Pivotal is not in the business for doing that. Pivotal provides the technology that service providers will leverage to do that or better yet that your enterprise will leverage to do it for yourself.
The last question and we're out of time and thank you for all the good questions and the participation. The single VM is available, the technology that I've mentioned today is available in the single VM.
What's inside that single VM is Pivotal HD, the enterprise to do distribution, HAWQ and GemFire XD. Effectively the Pivotal Hadoop stack is available inside of a single node VM.
You can download and leverage Greenplum database, GemFire or SQLFire as well, but the single node VM obviously for technical reasons is the integrated Hadoop Pivotal HD stack of Pivotal HD HAWQ and GemFire XD. Then you can download a separate packages of Greenplum, SQLFire and GemFire.
All right. Thank you so much. Again, the webinar has been recorded. Please reach out to us, it's gopivotal.com/big-data for all the information and also for forms and ways to get in touch with us directly and hear from our data scientists or data architects.
Thanks so much and enjoy the rest of your week. Thank you for joining us.