The Business Data Lake – an Evolution in Data Infrastructure


The What You Can Do With Hadoop Webinar series contains both technical and business topics, use cases, thought leadership from industry experts, customer examples and more.

Guest Speaker: Steve Jones, Strategy Director – Big Data & Analytics, CapGemini

Capgemini and Pivotal are joining forces to take Hadoop to the next level of business transformation. Announcing our strategic partnership last December, Capgemini and Pivotal are co-innovating around Big Data and analytics to address the increasing volumes of data organizations are facing today. Our first area of focus showcases a new approach to data access and analytics: the Business Data Lake.

With a Business Data Lake, analysts can traverse through the data and move, transform and create analytical sandboxes on-demand to determine the 'integration value' of the information that lives in the data. Analysts can therefore in-act consumption to answer complex questions, as the business requires, organizing and shifting through the chunks of data on-demand to provide answers. And IT achieves speed, cost reduction and increased security while providing on-demand data for analytics.

The Business Data Lake promises to increase the amount of data being captured and stored using Hadoop and more importantly increase the amount of data being analyzed and operationalized within the business – turning insight into action. Please join us for this exclusive webinar with Capgemini and learn how the Business Data Lake:

  • Can become the rich data repository for future business needs
  • Allows IT to address business needs in a timely and efficient manner
  • Utilizes Hadoop as the data substrate to help build Big Data applications of the future
  • Impacts business and operations with greater degree of insight
  • Helps increase top line revenue
  • Plans to bring the enterprise and industrial Internet in parallel with the consumer Internet

Nikesh Shah: Thank you everyone for joining today for What You Can Do With Hadoop Webinar Series. My name is Nikesh Shah. I'm the Senior Product Marketing Manager for Pivotal HD. Joining me today is Steve Jones, Strategy Director for Big Data and Analytics at Capgemini. We're excited to kickoff our first installment of this monthly webinar series. This month's focus is on Business Data Lake, which is really a new approach to capturing, storing and analyzing and even operationalizing enterprise data.

Before we jump in today's presentation, there are some housekeeping items to cover. Your phones have been placed in listen-mode only. If you wish to submit a question, please do so via the Webex Q&A chat box to all panelists. We'll have time at the end of today's session to answer your questions, but feel free to submit them as we move along. Please note, today's session will be recorded and available for replay under the resources tab on gopivotal.com.

At a very high level, the purpose of today's webinar is to give you the opportunity to learn more about why Capgemini chose Pivotal as their strategic co-innovation partner around Business Data Lake? We'll talk about the ever-growing importance of data in the digital transformation and how to achieve local views with global governance and really deep-dive on what a Business Data Lake is and how it can help enable enterprises to be more data-driven?

On December 4th of last year, Capgemini selected Pivotal to be their only co-innovation partner. The first step of this partnership is really focused on delivering this new approach to leveraging big data across the enterprise. By combining the expertise of Pivotal's engineering and data science with Capgemini's business information management expertise, we're creating a center of excellence in India with 500 dedicated experts and assess to over 8000 information management practitioners and 6000 developers.

Capgemini is advising Fortune 1000 organizations in how to leverage data and analytics to improve business performance since the inception of the digital era. With this newer world of cloud, social, big data and mobile, Capgemini selected Pivotal for it's tweaked and new approaches and ways of thinking about how to make apps, data and analytics a much more seamless and rapid process in the enterprise. This first co-innovation step will focus on Business Data Lakes, which will combine the big data volumes from new sources with legacy data to provide business realm and analytics on a robust platform.

With that, it's my pleasure to introduce Steve Jones, a reputed forefront advisor of new technologies for companies such as Google, IBM, Oracle, Pivotal as well as an author of Enterprise SOA Adoption Strategies and other papers and the Strategy Director for big data and analytics at Capgemini's business information management practice.

Steve Jones: Thanks very much, Nikesh. Now, I only have the phone. I'm trying to look at how I share my screen, which is going to be the front sharing application. Just a second, there we go. Everybody should be able to see a PowerPoint. I'll get on. Now, thank you very much. What I want to talk about now is the Business Data Lake and really what the Business Data Lake is about is the fact that the history of information in IT has been around a couple of things. One of the big ones has been ETL, Extract Transform and Load.

We've done those enterprise decisions really because of the cost of storage of information. That really brings to the point the first driver is why should the business care? The first principle question isn't what is the technology doing? What can we do with it? It's why should the business care? At Capgemini, we've been doing a lot of research with MIT Sloan, the business school of MIT. Let's really look at what digital transformation delivers to businesses? What digital transformations really matters is how information-driven, how able to leverage information to make smarter decisions is that organization?

This splits companies roughly into four categories. You've got the beginners people, who really haven't done much; the fashionistas, who people who take a very point and high level technology-oriented approach to do it; conservatives, who take a more sort of managed approach; and then the digerati, the people who actually got there in terms of agility. What's really interesting is when you look at it from a revenue perspective, clearly being a fashionista appears to be the way to go, because you go from being 4% behind the market in revenue to being 6% ahead.

That's a 10% leap on average in revenue by being a fashionista, by taking those point solutions and being conservative, being more managed about it by stretching the governance and coming out with a more sort of common solution for the business, resulted in a 6% reduction in revenues. Quite clearly from that perspective, it appears that technology fashionista is the way to go. However, we'll know that revenue is sanity and its profit is sanity. When you look at profitability, the fashionistas have an improvement in profitability as the beginners. That's around about 13% on average.

The conservatives have significantly in terms of profitability, a leap of 33% in terms of profitability, the digerati than above that as well and over 50% improvement in profitability to being digerati. When it really comes down is market valuation. This is where it gets really impressive, because the research found that beginners are valued higher than fashionistas i.e. if you can't really demonstrate how you're taking a managed approach to it, you can't really show how you're repeatedly … It's going to become a digital organization. Then, the market penalizes you a 5% reduction. Despite that improvement of profitability, despite that improvement in revenue, the market kills you.

Being conservative, you have to show this is the root we're taking; this is the roadmap we're taking towards becoming a digital organization. It gives you a 14% leap in your market valuation against the market average. That really says that it's about the effective use of data, it's about being able to demonstrate how you'll become the digital organization. It's not simply about using technology. That might sound like a strange thing of digital technology-centric webinar to talk about. Really it's about how do you educate the business and why they care?

This research is why when you speak to business people, they need to care about information and why with the Business Data Lake? We're looking at how is the business? Can you on-board all these information, manage all these information and do so in a way that delivers value, time and again, not just a point solution, not just stepping up a deep cluster and solving one problem, but how can you actually fill a whole business, solve the challenge?

What then do we have today is information really is so important. Well, we'll know what we have. We have completely fragmented mistakes. We have the operations by an operational data store. We have people accessing the ELPs and line of business systems directly. We have some local business data marks with an organization. Today we're having 19 data warehouses over 200 ELPs. This is the norm. Then spreadsheets everywhere, spreadsheets rule large areas of business.

I guess we may have a corporate data warehouse, an enterprise data warehouse. I'm not really making this sort of the corporate goals. Unless, when we look at the consistently compromised, the fit for purpose of those pieces, the detail what they contain, the freshness, the fidelity of that information, looking at that EDW for a second, which is being the traditional approach. I have this prevented environment, let's replace it with an EDW. That will solve all our problems.

EDW is a fantastic corporate level goal. Delivering in a corporate level, the diesel required, that's a sort of day delayed or whatever type pace. From a local business perspective, that EDW doesn't include all of the local information I need to run my business, because if I'm fro a complex organization I can't possibly have all that information in one place. It's already fragmented in a course. We're putting EDW as an orbit. We'll talk about why in a second. Within Excel … Excel is perfectly fit for purpose, business people uses Excel spreadsheets for a specific purpose. The data however lacks some detail. We all know the fidelity of it, the quality of it is often deemed low.

It doesn't even … People doesn't know where all the pieces come from, yet they're still making decision based on that. What this tells us is that the current load of the state does not consistently compromise views. We don't have even that conservative … that managed way of moving towards being a digital organization. Now, why does the single view fail? Well, the single view fails for a fairly simple reason. Having the CEO saying a single view fails, because everyone reviews the world differently. In every decision, there are people with personal KPIs. The sales guy has a different KPI to the finance guy. They're deliberately antagonistic. The supply chain guy has a different KPI.

All of those KPIs drive the need for that local view, because it's seen for that local perspective. Every division has their own KPIs. To get all these people together in a room, you get all of these business areas together, people with contenting KPIs, looking at it from an operational perspective, from a customer perspective, from a supply chain perspective, from a finance perspective. Many say, 'Guys, all we've got to do is agree on absolutely everything in our business.'

It's an impossible objective. Even when you look at the idea, just the human dynamics of getting people to agree, it takes too long. When you start looking at the volumes of data in transaction assistance and if you're selling a complex product that's manufactured that goes into satellites, for instance, versus selling something simple like a whiteboard marker. The invoice for that, the customer data you need for that, the procurement cycle of that are completely and absolutely different.

Trying to create a single view is IT trying to push a marginalized view on to a heterogeneous environment. That's why it fails. That's why we see data proliferation, because data has value. The business needs to create those local views, because that's how the business operates. What do we get even if we manage to get some single canonical form? Well, we think it's the change cycle. We get a new data requirement. We need to agree a standard definition. We update the schema. We update the ETL. We extract the new attributes. We load it into the EDW. We start again.

That can take weeks if you're good, months if you're bad and quarters or years depending on how long and complicated your schema environment is in loading corporate goals. At the same time, the ambition of IT is to decommission all of these local solutions. IT once in a while, all the ships are there in your local markets. I'll get the list of the requirements not met by the corporate data warehouse. Some of those are going to be far too expensive for the EDW. They're not just for you. We can't really justify the expense of putting it into a corporate level solution just for you. You don't really need which point of the business is lost, why you encourage on using the solution that goes.

You've got this contexture. This is something that we've been failing when we're just been looking internally. When we looked at core transactions, getting the whiteboard guys and satellite guys agreeing has been impossible. Getting the finance guys and the sales guys to agree to what a customer is. To finance, it might be somebody who has actually paid a bill. For sales, it's somebody who I can potentially sell to; different definitions even in simple words with things like MDM. We create pseudo promises like prospect.

Then, we look at the other data we got coming, the emails, the documents, the partner, the market data, the machine monitoring data. If I'm monitoring data from a television set these days, this is monitoring data coming from that satellite. It has very, very different sets of data that I need to interpret very differently. Then, the whole world of social media and human interactions, unstructured data, semi-structured data, these volumes are getting larger and larger.

The idea of having an organization sitting down in a meeting and creating a single schema includes every single bit of information. They make everyone seeing it. Every area of the business agreed on that single view. It's quite clearly an impossible dream. In IT, we really pushed that dream because it made our lives easier, because if we just can't deliver you to agree on that scheme, it makes that ETL easier. That means we don't have to store anymore information that we want. We can just use that single view and build something once.

That comes down to really the heart of the problem today. The IT culture fights the business culture. In IT, we want one view for everybody. We want to stop those local markets. We want a long-term solution. We want to share everything. We want to govern everything. We want that control over everything. On the business side, it's almighty. I'll remind you, that make my job that we can take the eyes possible. I want to access what I want, when I want to. If I need something just for three weeks, because we're doing marketing campaign and we got storm in the northeast of the U.S. I want to be … It's six months.

It's September, before I get my updated data warehouse that enables me to manage storms in the northeast of the U.S. It's too late, because the summer has already passed, we're in the winter. I need that short-term change. I need to share where I need it. I don't need to share all of my information and therefore linking on that in terms of why do I have to agree on that is why do I share everything? Why do I need to agree on everything? If I'm the only person who needs that information, why do I need to share it? Tell me the value of governing that. From a business side, there isn't. We build this contention between IT and business.

Well, IT is trying to push these single views. The business is saying, 'The local view wins.' What we know from history? What does this result in? Is well, we have very high use in Excel and in those lines of business solutions. The corporate data warehouse or EDWs has proportionately significantly lower usage. They use at the corporate level. They use the financial consolidation level. When you come day-to-day operation level people are using it on their day-to-day operations, they cannot use that because it does not have all of the information available.

Underpinning is a challenge. In IT we have a strategy for the single view; the single approach across all the views. Peter Drucker put it best. He said, 'Culture eats strategy for breakfast.' We're fighting the business culture. The business culture is local. The business culture is based on those local views like KPIs of enabling collaboration where it tends, not trying to govern everything and be spent it everywhere.

What do we really need in the future? What's the real change that we are trying to do here? Well, the first thing is starting at the bottom. We need to start everything. We need to get away from this world in which we're prejudging the destination because that's what the single economic approaches do. You create a schema and we prejudge what needs to be stored. If it's not in the schema, we don't store it.

We know that as soon as a new requirement comes up, we have to go back to the cycle. Let's change the cycle. Let's store everything. We need unstructured data in there, structured data in there, external data. We need to store it cheaply. We can't pay $30,000, $80,000 per terabyte. We need to do it cheaply. We need to move away from ETL, Extract, Transform, Load. We need to move towards a world … If I got everything stored. I don't need to do ETL anymore because I've extracted it. I've loaded it into the device tier which will be Hadoop.

Then I can distill. I can create a view with just the subset of everything that I need for my local challenge. I need business fine-tuning so I can create those views. I need to be able to re-use those information maps. If I create a view full of current orders in EMEA and somebody else wants to use it, they can re-use the view that I created. It needs to be able to give a rapid change, so I can use distillation because everything now being landed, I can keep changing my distillation without ever having to go back to the source system because I know I have everything.

I need to encourage therefore local requirements. I need to stop fighting the business culture. I need to say, 'If you want to I can speak to somebody today.' This means I could end up with 600 solutions what the business likes. Yes, you might. The business likes them. Business wants to work in that way. We need ELT to concentrate those efforts to where it delivers the value not trying full margins and notions onto the business.

That enables us to build from the bottom. When you look at how organizations record financial information, financial information is captured in a myriad of different ways. The invoice field will be different between different systems and different regions. The textual will be different. At the corporate level there'll be a book. The book says, 'This is how you record revenue. This is how you record margin.' Then, every division complies with the book. If they don't comply with the book at the lowest level, they comply at the book at the level they need to roll up.

They're encouraging that local view of the finance. Then on top, we're saying, 'Govern whoever it matters.' That financial book, that big cap from the CFO; governs only what matters. It doesn't say these are the fields in the invoice. It doesn't say this is how exactly how you apply the individual local views because those are local. It says this is what we need to view our finances at a corporate level. What we're doing in Business Data Lake is taking what is proven to work in business culture within finance and applying it to the rest of the organization.

It's something at Capgemini that we've done quite a few years in the MDM and reference takes place, which is not doing a custom MDM historically custom and after with something that included all date that you might want a better customer. We changed that. We changed to what's the minimum needed to identify the customer and provide the cross reference across all other systems and information that exists because by having that cross reference and create the total view, but I can create all of the subsets as well.

I mean four step hold governance only when I'm sharing if a division have some information that is a very specific to them and they don't need to share it, but they need to report on it and they need to combine it with other information in the organization, let them do that. Don't require them to standardize that local information and go like all standardization process pick the information, which is solely related to the local operation challenges. We have to store everything and land it as it is; changing ETL into distillation, encouraging local requirements and then governing where it matters. We're able to change to a new approach that matches how the business wants to work, back to that importance of digitization.

This is providing that conservative managed approach that information fabric via which the business can move forward. How does this work with Pivotal? Let's get to the technology because at the end of the day with apparently showing a PowerPoint slide unless you can make it work. It's the difference between why the rubber hit the sky and the rubber hits the road.

The first thing is store everything. That's Pivotal HD. Being able to provision at the Hadoop environment a few minutes begin to scale it and we have to revalidate radically different cost space. It just … It's hard to believe some of the numbers when we do this at Capgemini into the SQL. We're taking traditional appliance or large scale data warehouse solutions and moving on to do. We're taking about 80% plus cold spaces. It's significant. Its low cost. It's simplifying employment because I've landed everything. It means I've got everything available.

I can distill on demand. I can use technologies like HAWQ. One of the things I love about HAWQ is the fact that you can use it to do distillation. Use it on the SQL of all of your transactional information. Then actually to do the next pages to press the button as if it where and have a full data warehouse, create an MTP data warehouse created within like a Hadoop environment that the business can actually use. I have this distillation process where I'm making it and using SQL for my structured data. That's really important.

One of the pieces we firmly believe in Capgemini is … As my speakers have previously [inaudible 00:22:47] is, 'Java guys are great. Java guys are good. We can do lots of stuff as I speak as a Java guy.' We're more expensive than the SQL guys because we can do more things. When you're talking that language of data for businesses and minimalizing the impact of change of the business, SQL is the language of data. By having HAWQ you can use that language of data.

We can then use Pivotal data dispatch to move the data between environments and deliver that distillation. We can create business-centric views. We can encourage that hold the local deployments. The ability to use HAWQ to create that local data warehouse, perhaps each division creates their own local data warehouse, but in they shared infrastructure, moving away from these data silos to a common data subset that they all share.

One of the things that software enabled us in terms of the big data piece is the importance of fast with SQL fire and Gemfire and Gemfire XD. One of the pieces is the ability of this side of the business this needs to be fast. I got tons of data I'm ingesting. I need it to react in real-time and do report in real-time. I want these calculations very, very fast. Deemphasize use SQL fire to do that. Use in memory SQL data base because that's what matches that set of business requirements. That's the real piece.

The point here really is … I'll get in to it … is by enabling that real-time in it and having that ability to make the choice of in-memory or disc space, putting it in the hands of business, no prejudging and saying, 'We're getting to do everything in memory.' You know what some things don't require that speed. Some things do; some things don't. Therefore having a solution that can enable that flexibility is critical.

The men on top of the governance, governance is about providing the global view above the local view. Match the data, reference data, using what we call information RADAR approach; we look at the whole identity data and then use that to do the cross reference. What does this look like? Now, I will move to slide 22. We get it … Hopefully we'll get. I'm going to load everything. The second thing I'd like divide with everything is keep the history. This is an important point. Its predictive analytics, a very much the future of where the value comes? We're seeing Amazon pay it shipping it before I've ordered it. That's all about predictive analytics.

Predictive analytics is about having the history. One of the things that we've already talked about in Data Lake is by landing everything in all symmetrically keeping the history. In 18 months, two years or whatever points you are ready to predictive analytic. You build up the history of that information. She doesn't have to wait for another two years to construct the history, to keep the history. You then take the business scenario to … We'll just talk about three; North America operations, marketing campaign and something for EMEA. You can distill all of our information to those local views.

Those local views are business-driven. North America has this. The EMEA has this. Marketing has something else. They all got their own views to match what they want. In IT, in our technology side, we're managing the bytes, the substrate. We're supporting the distillation. We have one information cloud to switch buzzwords in which they're creating their own local views, their local distillations on that global view of information.

It's automated. Now, if you ever want a need to share information then that's what MDM and governance really comes in. MDM and governance says I need a single view of customer. I can do that by having my customer MDM pulled in. I can use that to identify across my local views, 'Oh, here are all my customers.' I can pull in other information and say one of the top levels I can tell what your specific orders and invoices are.

In EMEA we have 28 fields and invoice. In the U.S., we have 17 those of 12 field overlap. I don't care. The top level I only care for five fields. I care about three sold to the customer. I care about what revenue we got from it. That's all I'm worrying about in this particular goal. I tried a global view of customer and revenue could include product if I wanted by using MDM to aggregate the local views. When you think about how our business actually works and operates, it doesn't work because I keep single global empty; those local pieces collaborating that really drives the value.

Information getting at MDM and RDM pulled in to the business Data Lake to help create global views on local information. Why do we that full succeed, because we're focusing governance on in two different places? We're focusing MDM and RDM. I'm saying only agree a way you collaborate. Only agree when more customer entities. I've worked with that customer a few years ago where in a seminar I'm going to sell to consumers. In Europe they sell via mega distributors. There's no point having a global view of customer for two reasons. 1) Those mega distributors aren't in South America. 2) Because global view means nothing.

There's no value of having a single view of customers when its consumers are in South America and distributors in Europe. Bryan, I'm saying, 'We'll have a European view which focuses on distributors and a South American view which focus on consumers. We did care about was only sales, revenue related to the product.' We focused governance there on product and invoice and revenue.

That's about focusing on where the business to live its values. It's getting a business to say, 'This is where we want to collaborate.' Just governing that, then within the local views bring out and say, 'This is how you distill for your local piece,' and concentrating the governance just within that smaller set. If anything is done not reference to the program … The more attributes you have to governance the harder it gets on the … The biggest question you often get is 'Why do I have to agree on that?' By removing that question we really change away from this force from a genius approach toward the leverage is the business culture to make agile IT easier.

That's really what we're trying to say here. By using Hadoop is not substrate. By using distillation, you make your job in IT easier because you're not trying to change the business. Architecturally what we got on depending that is something ingests perspective. When we talk about sources, today we're talking about Senso data, Senso data streams in real-time. One of the great fires that use it … You don't get big data that I've fast data. Whether its quick stream enough to Senso data, whatever it is it's created continually and needs to be monitored continually.

I need to be able to ingest in real-time. It's not good enough just to drop it three times a day. I might need to react in real-time. That's where we look at using Gemfire and Gemfire XD to ingest information in real-time, first we have to react in real-time. I mean dropping it to identify it. Micro batch and traditional batch ingestion can still be done. On the insights side jumping across, are therefore needs to be out to do those real-time insights. That's all from pushing information back into source system.

If the analytics identifies whether the stock order needs to be made, I'm better off pushing that to SAP to make the order happen than having a report that somebody may or may not look at. Really understanding the real-time insight how do I make it actionable? How do I put it in front of somebody, or to make them commit an action, or have it assistant to automatic deforming? It is no good enough now to have something just goes with big; you've got to be out to do it fast. Interact with batch, obviously still counts.

Then in the middle, what do we got? Well, at the bottom we've got Pivotal HD. We've got a data substrate that everything lands in. when data is finished in the transactional perspective it ends up in Pivotal HD. We then have that distillation tier. We push the information through HAWQ into an MPP database or into SQL files; something needs really fast lightning in it memory analytics and sort of intellect insights pages.

We use SQL as the processing tier. Sure, we can use Map Reduce and no SQL type of tiers. The majority way to support business with Hadoop is not try and train every business user to use Map Reduce. It's to enable them to use the SQL tools. You think about today we're having sets of reports. You can recreate exactly that feeling. How they want to consolidate on the infrastructure and to be able to make change much more rapidly? All of those fragmented data warehouses consolidate them onto a single data structure and recreate them with HAWQ, pulling in information from master data, reference data, data management.

On the top making sure that unified systems management in the operation. You have to share common commodity hardware of client-based infrastructure full on these technologies. It means I can scale more dynamically. I can share space between my Gemfire SQL fire and between via Hadoop and HAWQ. I can really look and managing a data infrastructure of my business and provisioning those local views as it's needed.

There's the summary of what we've been talking about and we'll get on to the Q&A pieces in a bit. I'm just going to stress something on this one. This is all about insight of the point of action. A history in IT has been a marginal view by having ETL. What ETL is? Is it just assumes the destination. It just assumes what you're loading into and not may have been stacked around and small data architecture may define that schema. Let's assume that is the final piece. We all know that change happens. We all know that new requirements come right? The point of action is a local thing.

The point of action is where somebody makes a decision and makes it better. The key philosophy change here is away from moving away from a world which we say, 'If you in the business could just confirm … ' and the word conform schema is often used, just conform the single view. If you could just use this one end data warehouse with this one schema, then it will be much easier for us in IT. That hasn't succeeded. It won't succeed.

It has failed today and will fail on even greater stone in future because what we need to be doing in IT is recognizing whether the point of action is, whether better decisions may be made and then working as how we deliver better insight and better analytics to that point of action. That comes down begin to do four things. The first one is landing everything in Hadoop, storing everything. By everything we need not just what the world looks like today, but store in the history of it how I said changed every time. We need to move away from a single canonical we can define schema that include all the Facebook data and all the Senso data, everything ever be in a way that makes sense to everybody approach.

Two, I'm going to encourage local. I've landed the data. Now, I'm going to encourage local requirements to be delivered at. I'm going to use HAWQ to create a local view, a SQL view makes sense to the local business, in the business language SQL, the language of data that they understand. I'm going to move away from the governance perspective. I'm going to govern only the common. I'm not going to try and create a schema that if I printed it as good. I wanted it in 12 fonts to cover several walls. I'm only going to focus on governance or whether the business says it delivers value. I'm going to use MDM approaches RDN approaches to deliver that. I'm not going to try and govern everything.

I'm going to treat global as a local view. What I mean by that is saying to the CFO, 'What is it that you want to see?' 'I want to see customer and revenue.' 'Fine, we'll create that view.' I was speaking to a CFO last year and she said a fantastic thing which was they have a data warehouse financial information. She found what she thought was an error. She phoned up the financial director in charge of that division and said, 'Well, there is a big error in these inboxes.' The person said, 'Well, actually there isn't an error. You just don't understand how our local purchases work.'

Her point was that if she couldn't see the KPI that she wanted to see, she could only see the weeds. Can we please create a local view that the CFO wants to see and not show all the weeds? If this doesn't issue on a KPI, she phones the financial director and it will be that person's responsibility to get her the information and fix it. Really try to build a new data infrastructure that represents how the business works and how to make that insight level of point of action.

I'm going to skip back two slides just before we finish, on to this slide here. The piece to takeaway really on the business type is these pieces on loading everything and the importance of Hadoop within this is if you build separate clusters for separate solutions then your ability to share is reduced. By having a single data substrate, the business can distil any information they want is what will help drive collaboration and governance because you are experienced in doing this many years; when you show people that they want information from another division that would make their job better.

That's the point until I'm justification to put in place of governance. There's energy behind liking that governance work, by not prejudging, by not doing ETL, by landing the data and then distilling the data. In IT we provide ourselves with the ability to make that conversation happen. By moving away from single canonical form something that prejudges the destination, towards the world in which we're more rapidly able to adapt to what the business wants and enable just deliver digitization and deliver that digital translation.

With the Pivotal technologies tech we now have a unified solution that can work alongside your existing data involvement to improve it or to create and inspire new platform via which you can achieve all of life goals. The reason why we work with Pivotal for us the Hadoop is part of the story. It's the landing place. It's no good if you don't do that fast is the story. It's no good if you don't have an integration story. It's no good if you don't have a great distillation story. It's really no good if you don't have a fantastic SQL story. To build in the business title like was creating an architecture which answers all of our questions and that's really where the Pivotal technologies start came in. That's me finished. With that we just move to the analysts and consumers, Nikesh.

Nikesh Shah: Steve, thank you for that overview on the digital transformation and the Business Data Lake. We move over as he mentioned on the Q&A portion for the today's session. Please remember you can submit your questions in the Q&A chat box to all panelists.

Steve Jones: Nikesh, do you want me to share it back with you again?

Nikesh Shah: Yeah, maybe you could. That would be great. Thanks for sharing. Okay, it seems that one question has come in and it … We talked about the Business Data Lake. How does it really tie into the existing data infrastructure that many organizations have, for example, like Terra Data, which organizations have already gone and invested millions of dollars? How can they self-complement the existing data architecture that existing today?

Steve Jones: That's a great question. I think that this is the one of the phases between Pivotal management team and Capgemini. We really looked at there's no point having a width and replace strategy of … Wouldn't it be great if nothing exist and we build it from scratch? There's really two things where we look at … Really leveraging those pieces and a lot of our systems, the Terra Dietz, the Extradites … There is a lot of value in that corporate enterprise data warehouse. There's two key vision type level helps. That's actually the white paper we done bolding to the Business Date Lake that addresses that.

The first one is there's a lot of data within a data warehouse which really isn't part of that value add at the end. We do the ETL process and then we do a series of semantic transformations within the data warehouse. By using the business data lights we could do two things; first of all change the ETL cycle. We can lower the data Hadoop and do the distillation straight away. Loading everything over than just prejudging what needs to go into the schema which makes change much easier in the sort of Terra Dietz or Extradites environments.

The second thing we can do we can pull out some of those transformational layers and put them into HAWQ, because at the end of the day that is SQL-to-SQL transformational layers. That means that you can overpass the Terra Dietz and the Extradites is what they're really, really good at. Some of those fast reporting that predefines schemas let go. Most of the transformation of your data that into that format into HAWQ, which means you do a lot, you spend less money unless you have a simply book instead of happened to upgrade. So often you can upgrade a lot less often in Terra Dietz. You can just focus on Terra Dietz or extra data having the data as really allow you for that reporting and not having all those transformational layers.

It's really does two course in IT. One, changing the ETL cycle by moving the distillation and landing Hadoop instead of doing ETL. Then the second one is the moving out those layers which are there for good beyond reasons to do schematic transformation from the loading into those destination schemas, putting them into HAWQ to get me more spec capacity in the warehouse environment. I'm not upgrading as often as like I focused on corporate warehouse on just those corporate requirements.

Nikesh Shah: Okay. I think there was a question from early on from the very beginning, Steve, I think from one of your earlier slides, tell me what acting what are the cut offs to the distinguished group?

Steve Jones: What are the … Sorry?

Nikesh Shah: what are the cut off to distinguish the group? I think early on in a presentation I'm not even really sure what …

Steve Jones: I'm not going to assumption what that means … the question clarifies it … The groups … The groups within the business? This was exactly what I was doing earlier today. The answer is exactly how many … What the business wants? If for instance you've got a marketing firm, it wants to create a specific campaign for the world cup this year. They want to create a data warehouse just to track that marketing campaign. Its productivity would be very, very bad at supporting that if I would be willfully supporting that.

What we're really saying now is, 'You know what you can do it,' provision it, distill it from the information because we got all the information we need. Distill your own view, so it's really trying to go after … Its really looking at not just the traditional big data mole and data warehouse spaces it's also how do I support things that somebody like as export to the data and put in a client solution or most likely play it to the local access database or a local excel spread sheet. How could I put it in a more managed and reliable environment? The group really comes down to question now.

The groups is about what your business wants to report on and your job within IT changes away from faulting a solution towards helping them identify whether they have commonality. The fact like the two groups own almost the same thing, getting them to work together the sideway. It will be cheaper for you if you too create one view that expands both of your pieces and you have additional business benefits. Change that role in IT from faulting a simple weak to really helping the business to collaborate.

Nikesh Shah: Yeah. I think another question just came in, the gentleman is asking are we heading the direction that MDM and RDM application will be hosted on top of HDFS after the data store?

Steve Jones: Well, I think in terms of the history, actually one of the pieces in … whether there are in HDFS is probably maybe not so much hosting the IT a less percent of the data flows through that. I think will they host in mapping solutions. On top of a Gemfire, Ron is the originally person … I think that's probably more likely because, as you look at the speed of the business in real-time ingestion real-time allocation and real-time reaching of data and yes, I think … We're already seeing the MDM spaces the performance required by an MDM as you look at websites and online transactions and Senso data, they're getting faster and faster and faster; very much of pace in terms of MDM and RDM become an integral part of your information environment; absolutely.

Will the history of how it's changed be stored into HDFS? Absolutely. I think that's the pieces. The real-time active year what goal is? Probably in memory, on the memory. The history of how it's changed over time? Certainly in HDFS.

Nikesh Shah: Great. We're going to have a question. What are the security constraints the financial institutions has to add on all the HDFS, both secure etcetera? How does the Business Data Lake handle the security concerns?

Steve Jones: I think one of the pieces here around our job in IT is today a lot of the financial services' security concern is we have to do it multiple times. Every little local solution and having submit every Excels spreadsheet exists. The Excel spreadsheet such as sort of a security but shouldn't. We have to apply it to every single data warehouse as they demand. What we really looking at the Business Data Lake really is a focus of working together and working with our side as well is putting in place common security, so that distillation can apply common security models horizontally.

Instead of being able to say, 'Everyone who goes, there are any changes to extract and I have to make my security every time if I move to distillation, put security into my distillation and move to Business Data Lake and put security between HDFS and my distillation tier.' Suddenly I'm able to do that in a much more standardized way. Actually the financial services error, we really see as a big benefit of being moved towards a lake type architecture or enabling a single implementation of security for access in order to ability rather than every single person you place in separate piece, somewhere inside the security level, somewhere outside.

Nikesh Shah: Okay, great. Another question we have is how does the Business Data Lake help with the creation of big data apps?

Steve Jones: Well, I think one of the pieces is you can't have big data apps. That's a great statement. One of the real pieces is when you think about big data apps; that's about the local view. The ad shell piece, the Pivotal talked about having that ad shell cycle ride out is about providing access to the information. I need to be out to create an application to process that information. I need to integrate that information to the point of action because fundamentally that's what big data apps are all about, use that insight of the point of action.

That's why we see this Big Data Lake architecture, this unification of real-time of active elements with traditional BI and analytics. In IT, we spent 30 years creating a hard wall between the real-time guys, the operational guys and the BI and analytics guy. When we look at the business title, what we're saying is that reach has gone high. There's no longer wall between the areas. There's no longer a bridge you have to pass. It's a unified environment, which enables both the real-time, enables both big data applications we've done because what we need is we need access to the data and we got to deliver it in a local perspective. What you couldn't do is big data apps based on a single canonical form. That's really … It's absolutely a driver of Business Data Lake.

Nikesh Shah: Okay. Well, I think I don't see any other questions coming in, so I do let want you folks know that if you find out more information on the co-innovation business data lakes between Capgemini and Pivotal you can check out either website for data sheets, white paper and technical perspectives as Steve mentioned. You can also go on YouTube for some quick videos on explaining the opportunity and the technology behind the Business Data Lake.

What's coming up next is Pivotal is in a lead sponsor at Strata, which is being held on February 11th to the 13th in Santa Clara, California. I just want to make sure for those who are attending, you can check out this panel discussion along with the New York stock exchange and Kaiser Permanente and with Wikibon on Business Data Lakes on Thursday February 13th at 10:40 am. Please look out for our next month while you can do with Hadoop webinar series which will be held on Thursday February 20th. The topic will be on Pivotal extensive framework, which is basically a fast extensible frame work for connecting MPP and in-memory real-time SQL processing engine to Hadoop … to the Hadoop ecosystem data stores like Hive HP and other known Hadoop data stores as well.

On behalf of my colleagues here in Pivotal and Capgemini, I'd like to thank you for your participation. Steve, I want to thank you personally for your time and really going deeper dive on the Business Data Lake and what we're seeing in an organization. That concludes our session for today. Thank you for joining everyone.

Steve Jones: Thank you.

Nikesh Shah: Bye now.

Steve Jones: Bye-bye.