Find out how we can help your digital transformation. Contact us to learn more.
Enabling Personalized Healthcare with Biomedical Informatics
Aridhia is a world leading health informatics company developing technology that supports the innovation, collaboration and management of chronic diseases, stratified medicine and biomedical research through the use of biomedical informatics and analytics. Specifically using Hadoop and other Big Data and application technology from Pivotal for gene sequencing and mapping to uncover correlations between genotype and clinical data, Aridhia is able to provide an unparalleled analytics platform aimed at revolutionizing the treatment and management of chronic diseases.
Learn how Aridhia is using Pivotal technology to maximize the potential of existing data within the healthcare system, enable healthcare providers to inform clinical outcomes from insights gained from complex datasets and provide a data science safe haven that promotes collaboration between healthcare, academia and industry.
Michael: Good morning. Good afternoon. Good evening. Welcome to our event today where we're going to be discussing collaborative data science and next generation health care analytics with our guest Aridhia. Thanks for joining and we're going to get rolling right now.
Today, we're joined by a very special guest. My name is Michael Cucchi, I'm the Senior Director of Product Marketing at Pivotal. I manage product marketing across all the big data, data science and analytic solutions that we provide and joining me today is we're really lucky to have Chris Roche who is the Chief Commercial Officer at Aridhia. He's been working with them for over a year now working to basically reinvent their infrastructure so that they can basically power the next generation of health care analytics.
I'm not going to steal his thunder but Aridhia is really laser focused on solving chronic diseases, improving clinical trial accuracy and really enabling personalized health care by delivering data and powerful analytic tools directly in the hands of data scientists and practitioners that need it today. They've been working with Pivotal to build out this new infrastructure with goals of maximizing availability of data within the health care system to the users of the intelligence system they've build out. They want clinical outcomes to be immediately available to their user base and as I was mentioning earlier they want a one-on-one locally interact where appropriately but they want to really build out a centralized management solution for this and at Pivotal we call that a business data lake. Again without delay, I wanted to transition into Aridhia's use of our technologies but first I wanted to paint a picture for the journey that Aridhia is on and the journey that Pivotal is helping all enterprises embark on.
Chris is again Chief Commercial Officer at Aridhia. He's been there for coming up on a year now and before that he actually worked with EMC where he played a key role growing the business from 5,000 to 60,000 people worldwide so just huge tremendous growth in big data management and storage management infrastructure. At Aridhia he basically leads all sales, all marketing and all commercial developments so it's a great perspective. I'm really happy he's joining us. Let me quickly dive in with some background and then he can give us all the details on how Aridhia is leveraging data analytics and building new applications and so seguing to that really what Pivotal is built to do is create excellent software development companies out of all enterprises.
Today we're changing industries. We're creating new industries. We're reinventing technologies and in the case of Aridhia we are literally pioneering new medical practices by using the principles you see on this slide. What we believe at Pivotal is that the world is going to be innovating and differentiating through applications and we think that what helps applications and businesses change is all powered by data driven decisions. It all starts with being able to capture large amounts of data, apply really advanced predictive and prescriptive analytics on that data and then generate innovation out of that insight whether that's an application or a business practice. Again, we want to enable enterprises to not only do this but do it extremely quickly so that they can iterate and advance their environments, their businesses and their applications very very aggressively and use them to compete.
We have a number of technologies, software, services, human beings that we pull together to do that and again I concentrate on the data and analytics side of this circle which are handled through a number of data management technologies that I'll touch on very very briefly at the end of the webinar. We have a number of advanced analytics tools that can dig on the data that's stored inside of these business data lakes and we also have a number of advanced data scientists and you'll hear from Aridhia and Chris how they've built out their own data science expertise.
We have a group of data scientists that can come in and help kick start your enterprise, train data scientists at your site, or build actual machine learning or predictive algorithms and leave them with your company to continue to advance. Finally we have a set of agile developers, almost 500 agile developers, that can actually help you build next generation applications, web applications, mobile applications and enterprise applications from that foundation of data and analytics.
Lastly, we have a platform as a service which is called Pivotal Cloud Foundry based on the Open Source Cloud Foundry Initiative which allows those applications to be rapidly developed and iterated on inside of a platform as a service. So what you see in the circle is basically extremely powerful data management solutions, extremely intelligent analytics solutions, extremely agile development methodologies and all of that being hosted on a platform that can let you iterate very very quickly.
My last little piece of advice is that we work with a lot of different companies. We show them that circle and they say that's great I want it. How do I get there? The truth is every person on this call today, every one of you, is working for an enterprise that's at a different stage of maturity towards a data driven environment like you're going to hear from Aridhia in just a second that they've achieved. It really is built from these major building blocks which are collecting all data that you need including public data and mixing that with enterprise data. In the case of Aridhia aggregating data sets from many many different studies or independent enterprises and pulling them together so that you can have coherence across many data sets all at once and again then being able to generate insights from that data.
It's not enough to just store it you've got to actually have the power to dig into it and get answers and then you've got to do something with it. We see a lot of enterprises have trouble with this. Many people are playing with Hadoop and starting to store a lot of data. Much much less of the market is actually doing advanced data science and then a small slice of the market is actually doing something with the data science. Changing the way that they're doing science, building new research tools based on analytics and still even less are able to do this extremely rapidly.
Finally we think that it takes a set of technologies to do this, right, big data, predictive and prescriptive analytics, agile development and eventually a platform as a service so you can do it very very quickly. Again I talked about human beings but whether you're providing this expertise through your own enterprise or your outsourcing it or you're coming to Pivotal for it, you really need to understand how to build the technical environment that's going to host your data. You need to have the data scientists that understand how to build the intelligence out of that data. You need to have next generation developers and you need to eventually move your development and application approach to a platform as a service and so that's what these pieces on top of this slides are.
Without any further adieu and I apologize for being long-winded let's get to Mr. Chris Roche. I'm going to stop sharing Chris and you can go ahead and we'll pass Presenter to you and go ahead and share your part of the presentation. Thank you so much again for joining us. It's invaluable to have you talk about how our technologies are being used in your business.
Chris Roche: Thanks, thanks very much Mike. Thanks for the opportunity. Good morning, good afternoon and good evening to the listeners. Thanks for your time for coming to hear our story and our journey with Pivotal. As I say either Mike introduced around collaborative data science. The other element I would put on there that's extremely important to us is on our title slide there which is a data safe haven. It's really important in our industry that the data is safe and the information governance processes are adhered to and we've used the technology and I'll talk about that to how we've put that together.
I looked at the attendee list. It's quite a broad audience. I'll give you a brief introduction to Aridhia, a context of our marketplace and then focus a little bit of time on the business model change that we've done and the enabling technology change that Pivotal has helped us with and then finish of with what I hope are three or four real good use cases that we'll bring the point home. As Mike said, we're a clinically lead technology driven company where our founders are Dr. David Sibbald and Professor Andrew Morris who is the Dean of Medicine at the University of Edinburgh and the Chief Scientist for Scotland and leads Edinburgh's data science capability which is certainly one of the largest in Europe.
We're a multidisciplinary team of data scientists, clinicians, computer scientists, software developers and we have a broad clinical faculty that we bring in and as Mike said we focus on clinical and health informatics. We've been building a number of systems over a number of years focused on chronic disease, bringing the phenotype and genotype data together and this idea of how do you translate using applications to translate from researching to clinical practice and more recently getting very heavily involved into the genomics and stratified medicine part of the world.
Just for some context for everybody listening I think everybody is aware that the cost of health care and the economics just really doesn't work for a number of countries. If you look at the left side of this model, we're in this one size fits all model. Where an illness model, you get ill before you get treated. You have one type of drug to many people and the dream of personalized medicine, the personalized approach, is patients don't respond in the same way to the same drug. How can we design and build drugs that are individually personalized to you and move to a more predictive wellness model. A model where hopefully we can keep people well for longer as opposed to the model we're in at the moment where people are ill and then they're staying chronically ill for a long period of time.
The great thing for our industry Aridhia, is that data is at the center of this drive to help change healthcare. Both from the left hand side of this diagram, the great advances in mapping the genomes so digitalizing the person to also sensors. That fast data that's coming off people's bodies through various different whether they're taking the sensors apart for [inaudible 00:11:05] or it's in a smart jacket or whatever but this great breadth of data that's coming at us is really helping to drive our industry in making medicine an information business.
Specifically one way of doing this medicine as a business is looking at medicine is moving into a long tail market and if anybody out there is familiar with Ian Anderson's work on the long tail and how it ultimately changed retail and banking. People will understand this isn't about just giving the same drug or the same drug for many diseases. This is about taking an individual, designing an individual drug, for a specific person and that really puts medicine right in the ultimate long tail.
Everybody is a member of their health system. Everybody visits it once or twice. Everybody then therefore moving into the ultimate long tail market and as you digitize the human beings for the genome this brings some interesting challenges when you want to deal with data in there. There's the industrialization of this. Doing this at scale is one of our challenges we're looking at.
There's a multi channel aspect to this. We've got different data coming in from different sources whether it be genome, the clinical record, the observed data, whether it's sensor data. Really, this ends up to us in this long term model as risk stratification. How can we predict, how can we build risk models like risk of readmission models, like in the UK or in Scotland or how do we use genomic data to build genomic risk based model? How do we start predicting and stratifying people so that we can then tailor treatment to them but that's not the only challenge we face in health care and research.
The other one is we're just not moving fast enough. On average it takes about 17 years for a piece of research to move from the research world into clinical practice. A number of issues under that one is the industrialization. Researchers aren't paid to industrialize their [inaudible 00:12:57], prove that the piece of research works. One area around this is reproducible research. How can I put my research up and know that it's been built on a certain code base to a certain data type and put it up for peer review so all these challenges are all coming together.
What we're noticing now then, what we also notice and you can go that our industries response to this is collaboration. Very much how can we do things better together. No matter where you go in the world now, no matter what research institute you go to, no matter what health care institute you go to, visit any of their webpages, they'll all talk about we need to collaborate more together.
One thing that's changing in the way research is funding, there's a big funding model in the EU, here over in Europe, called Horizon 2020. Very much unless you're three institutes working together in collaboration and have an idea of how you're going to operationalize your research into clinical practice you won't get funded by the big funding vehicles now. Really, now these are some of the challenges that culminated in driving our business that how do we build collaboration to accelerate translation and research into health outcomes. Also this new agenda around the cost model and the funding model of health care especially here in Europe which is a lot of the health care systems are also being asked not to just generate the health outcome but also to start using their money as an investment, not as a cost center and start generating the wealth outcome as well.
We have these challenges and we looked at these challenges and we looked at people who dealt with some of these big challenges over the few years around data and the internet giants, the Amazons and the Googles and how they applied their consumerization techniques to that. We said well what if we apply consumerization to health care and the research domain but specifically we then designed in collaboration and privacy as things are really important to our industry because the information governance is so important in our industry. If we did that and we adopted these cloud, mobile, big data technologies we could build a consumer grade biomedical informatics platform and safe haven that the future based next generation could use to accelerate and translate wealth and creation.
We looked at that and we took all our experience and we worked with Pivotal to come and talk to some of the ways and we did this and the first client we did this for was quite a ground breaking client called Stratified Medicine Scotland Innovation Centre and this was a ground breaking approach to building a technology platform with safe haven to support clinical trials and genomic data creation. Where we brought together, in an actual joint venture, the four main research universities of Scotland, all the hospital trust boards, a gene sequencing company and ourselves to look at how can we bring the genomic data, how can we bring the observable data, how can we bring that into the design the right drug for the right patient. How can we do that for the pharmaceutical industry to help them look up cohorts of genetically sequenced people and they can design better drugs and this is quite innovation where we think we're the first people to market on this. We've been going for just over a year now and so this is up and running.
The interesting thing is when we came to build this and we looked at how do we do this, we could see that there was going to be a major technology challenge for us in this. The time that we're building on our previous systems, our Microsoft technology, we had already come to the conclusion that that wasn't scalable when we had to deal with that technology to some of the genomics. We looked across the market and we did a lot a lot of research and we wanted to work with a company who really embraced these ideas, the technology around consumerization. We just didn't want someone who was going to try and give us an extension and say here, if you just extend our technology a little bit it will work. We really want you to look at something new and look at how the [inaudible 00:16:48] and the people who've been embraced that.
We did not know at the time there were lots of people in the industry just talking about it's all about one data type. It's all about Hadoop or it's all about SQL or it's all about this and really we knew that we were going to be coming to the challenge of a mixed data type. When we were building out our Stratified Medicine Scotland Innovation Centre that what really attracted us to Pivotal at this point was their approach. First their approach to an integrative data fabric. They were probably the only people in the industry at the moment who weren't taking this religious one side or the other. They were really clear that it was a platform, that your Hadoop, your SQL and subsequently your memory data needs to all work together. We will build a platform to give you services and take a platform approach and it wasn't at the time just their platform approach to data.
It was also their approach to licensing which was very attractive as well. Very flexible approach to licensing that would follow my data model as well and that all comes home to us and what really clinched it for us is this is one of our key design principles, collaboration as a design principle. They also had a great piece of collaboration software, open source one that allowed us to build some of our capability on top of that called Chorus.
The other great thing is that they were scalable, their partnership and the history of EMC and the history of just being a platform business. I didn't have any fear that they were then going to step into one of these vertical stacks, a vertical company, were going to step into my turf. They were there to build out a platform that my platform could sit on and they would build services for me so it would just the data layer and the licensing layer.
It was just so many differentiators that just made us to literally place betting our business on actually working with them. We're a small SME, small medium enterprise. Taking that approach was really going to be it was a big decision for us to make this move and what we're even more happy with is when we make the decision is that the business engagement as well. We're very collaborative as well. We get great engagement with the exec team. We get great engagement with the local staff so if just the collaboration came through then not any of their technology, not any of their licensing, but in their approach and collaboration was so so important to us in the world.
Once we had the success of building this unique innovation center we then said well actually let's look at how we can completely rechange our business model and earlier this year very much as you say mirroring Mike's view of data translated and then iterate around that cycle which started that we would take our technology and we would put it into not just for genomic scale but for any scale. Lots of research projects, lots of health care, don't have huge sets of data, less than 100 gig of data. We said well, why don't we serve our safe havens in a cloud based environment that allows all data types to come into together that allows you with the right information governance so that you invite your people, you invite your data in. Then on top of that what really we were then starting to build applications to do this and this is where again our choice of Pivotal really played out with the Cloud Foundry.
We were then able to reduce our application development time significantly through the use of Cloud Foundry and we knew that we needed to move away from trying to build one data model that ruled the whole of health care and say let's take this health care data lake approach and let's rapidly build applications that we need on top. When you're dealing with us like with rare disease and cystic fibrosis, multiple sclerosis, some of our projects, you don't know what the application is going to be at the time. You need to be able to build it rapidly, both to capture the data and deliver the results so it just dovetailed really nicely with that speed and translation that we could do to take research into the clinical practice.
The final bit we built onto our model, our business model, to make that change was about if we're talking collaboration and a clinician is helping us to design a model which they do. We work with them. We want them to be rewarded for that work so we will set up royalty schemes. We obviously got a joint venture in Scotland with a number of the trust boards so [inaudible 00:20:58]. They get jointly rewarded but if a clinician helps us we will make sure that, if we use that on a global basis the work they've helped us with, that we will fund back into them. We thought that was quite unique and that really takes collaboration at your business model, at your data level and at your translational level. We really thought that was a great model to take to market.
We encapsulated all of this in April of this year when we launched our new collaborative data science platform called AnalytiXagility which delivers scalable safe havens that come preloaded with all the analytical tools and technologies that you need when you invite your people, your participants, cross researchers or could be across health care or with other people and you build the data into that.
Our business model. what the technology allowed us to happen was to expand our business model to another level where as the base services we now have are these collaborative work spaces all built on Pivotal. Analytic tools like preloaded R consoles, Hadoop, SQL. Based to the strongest and the highest level of information governance because with some of the ability for us to use some of the Chorus technology we've been able to build in some very significant information governance, protocols and auditing. All is taking off fantastically scalable if we need it, compute and storage.
Then that allowed us to then start okay, we don't have to worry. Pivotal will help us with their services. We can then focus our small team on building extended services in health care research, personalized medicine and academia. For example we would build a de-identification service on top of our platform. We would build a publishing service for researchers so that they could take around code and models and immediately publish them to say the BMJ, the British Medical Journal in their format or we would write APIs out to for example Thomson Reuters genomic data sets or genomic capability. It's allowed us to then focus us on building applications and building data APIs and data services on top of our core platform. Our business model has really taken off from taking a change and really to bring that home I've picked three or four use cases that I think just will hopefully underpin not only each of those areas but the breadth of data and the breadth of analytics that we're doing.
Many people globally will be very well aware of the Genomics England Project which is the largest full genome sequencing program, coordinate medical program in the world. Where we're looking here in the UK to map a 100,000 genomes. There's a fantastic video on this on the Genomics England website and to look at that Genomics England knew that they needed to have new technologies to move this. They ran a competition earlier in the year around to finding enabling technologies that would allow rapid sequencing and interpretation and how that would work. We thought well let's test what we've built so we tested it and we're really pleased to announce that we were one of the winning companies specifically around the analytics.
We focused on the industrialization. How do we plug any sequencing pipeline into our platform and then be able to validate that sequencing pipeline and instrument it and then link the data into clinical practice so there's some clinical relevance on the clinical side and we couldn't have done that without. We played with the Pivotal technology, the scalability of it, when we went into that competition. Our partnership with a global company really helped us. Obviously Reuters was involved, the University of Glasgow and our Stratified Medicine people in Scotland so we were really pleased. It validated our model and what we were doing to win probably one of the most unique projects in the world and we're obviously delivering that project at the moment and looking to see what more we can do in that program.
The second case study, I'll talk about is something that's very very relevant to Scotland. Scotland, I don't know if you're aware, has one of the highest incidence of multiple sclerosis, MS. It's a huge challenge in our health care system. We've been doing some work and collaboration with the Anne Rowling Regenerative Clinic and the Chandran Laboratory in the University of Edinburgh. Again, a fantastic Ted talk from Professor Chandran there about how can we predict the individual future of the disease activity. If you're unfortunately diagnosed with multiple sclerosis because MS, different forms your disease pathway will be different in each. How can we take in an early stage brain scan, how can we take in genomic data, how can we take in interfaces with clinical data as well and so take imaging, genomics and clinical data and then build some models that will allows us to try and predict that disease activity.
Our first engagement in this which is really interesting is around we had to build a rapid app using the Cloud Foundry technology to collect more data from the actual patient themselves. Now you can imagine building an app it has to be very specific for an MS patient. Just down to things like the size of the buttons, how they interact with different people, so you can't build a standard one and then try and fit it to all. You've got to have this ability to rapidly deploy and rapidly capture the data and then from that keep designing. The strength of the platform shows us that we can take in different data types and then develop rapid applications on it and integrate that and place the identification services in the middle of that so I thought that was quite a nice use case. The platform has allowed us to go and win that type of business.
The next one. In my previous job I got lots of people from lots of industries telling me that their data was critical so I thought well let's focus on some real critical data and let's focus on some operational use data in the health care system. This is a project that we're working with collaboration with Philips and one of the hospitals around traumatic brain injured patients sitting in ICU or the intensive care. Where we're taking a number of their biomarkers in real time and looking to see if we can predict whether they'll go into a decline. There's a special algorithm we've got there called the algorithm and the platform has allowed us to help in two ways.
One there is just the speed of being able to process the data. The fact that many people when they're running don't use the power of their memory analytics has allowed us to do that and the collaboration elements of the platform has allowed our data scientists to collaborate with the clinicians in the gene shop lab around how can we help them actually write better code. How can we help them without having to email data sets around or who's working on what, who's working on this that we can just work on that together. Help them learn from what we know about using your memory analytics. About looking at just actually the code is written and then being able to get this down from hours to actually seconds to be able to help a clinician then take a better decision on that so we thought that was an interesting use case.
Third one so I stepped into education really. We're doing a lot of work in the education sphere where this ability and we're very keen on this. How do we drive capability build and we know that there's going to be a shortage of data scientists in the world so we've taken actually some of the approved data science course, the accredited course from EMC, we've loaded that into our platform. We rang up 10 universities and said would you like to see this run and run a one week course for you and they all signed up and said that's just fantastic. We'd absolutely love to see that and we'd love to see that in action.
The work space idea and the ability to invite students into run their coursework, to have the prebuilt models in there. Maybe to have some code in there that isn't quite correct and they've got to work out what's wrong with the code and then write new code themselves. All in the one environment where they don't have to keep swapping out or moving things. Just the user experience was fantastic to them and there's been a real buzz around how can we take this and use this to underpin our degrees, our masters and our boot camps around how do we get this capability up so that's been really really positive for us. The platform allowed us to then diversify our business into the education market as well.
The final one that has come out. We have a lot of people, a lot of health care systems come to us and say well, we want to take an early step. We don't know what to do. Can you show people beyond the possible and this idea of we call them health care data challenge days. Some people call them hackathons but we've run a number now where we've been able to use the platform to either [inaudible 00:29:41] and load up prescribing and admissions data to then look and see if there's any patterns in what's being prescribed to readmissions. That was one exercise we did over five days.
The other one would be with the University of Manchester where we took in all their city data and then for a day a 100 or so scientists all had a big hackathon on what they could find in the data. The one that we're running next week for the whole of North London, NHS, we're in a private hackathon with their people and we've loaded in a number of their different data sets. The flexibility of the platform, the fact that it's very easy to provision. We don't have to worry about the scalable size has allowed us to move our business model again into being able to run these data challenge days very quickly, provision them very quickly. Invite the people into their own private work space, give them the data in their work space and let them work around it.
We've had really a great time at Pivotal. They've both been great friends to us on the technology. We love what their doing on the technology, the data level. We love the way they stand in the platform business. We absolutely love the way they've taken Cloud Foundry and that's just helped us add another dimension to our business. They take it through to the licensing model. They take it through to the engagement model so we've been really happy. It was a big decision for us about 18 months ago to say are we going to move and change our whole technology base. We're pleased we did it and we're really happy with the results and just getting out.
Hopefully you've seen some really [inaudible 00:31:15] use cases so Mike that's me done. I will hand it back to you now I believe.
Michael: Awesome, thank you so much and we're going to just quickly switch who's running the Preso and there you go. All right, hopefully everybody can see Behind the Scenes so that was amazing. I have to say I've been in technology a long time and it's always been for more enterprise application focused engagements but in big data and providing advanced data science is just making a difference for real world human beings in our society which is amazing to see. This is a great example of where a bunch of technology, data management, data scientists and people came together but what's really happening is we're changing the future of health care and hopefully accelerating the time to solve some of these major diseases.
We have the pleasure of working with companies like Aridhia to do this in health care. We also work with companies like GE now in the energy sector and again there are just really altruistic benefits of reducing cost to deliver power, extending power and energy to areas that don't have it, reacting to outages dynamically and predicatively to reduce impacts. We do similar stuff with manufacturing and agriculture. Trying to enable higher productivity from farming in tougher regions to farm and producing things like smart tractors and advancing food production around the globe. This is one of those connections between technology and society that is just amazing to see so Chris thank you so much.
I wanted to accentuate a couple of the things you talked about and from a technical perspective and talk about the underlying infrastructure that's powering the business change and research changes that you guys are enacting so powerfully. You mentioned this a couple of times but you actually said in-memory analytics a couple of times and what that is is basically the ability to take action in real time.
What Pivotal realized even back when we were split into multiple entities was that their were new use cases for data emerging and as Chris said these pioneers in technology like Facebooks and Twitters and Googles an Amazons and Yahoos of the world were all leveraging these next generation data management solutions because they had figured it out early on. They were going to be able to differentiate and compete by leveraging data in new and dynamic ways.
What really happened was at the time a matter of five, six years ago, we were really handling data through a very regimented methodology. We would take data into our environments. We would process it. We would adapt it. We would truncate it. We would chop it off and we'd only keep what we felt we were really going to need and so we built systems of record to do this for us. This is where traditional business intelligence resides but what's happened is now we have these new use cases for data and it's split into these three categories you see on this slide.
There's the real time use cases that again you heard Chris mention. In terms of real time analytics or taking big data and big data science and putting it in the hands of a user or an application that's actually leveraging it in subsecond timing so impacting a real time experience with massive amounts of data and very advanced data science. That's the real time situation so that would be again, real time predictive data science powering treatments with Aridhia or powering the methodology around an actual clinical trial in real time as the user is submitting the trial or collecting information.
You can see a couple examples of this in other industries like routing cell phone calls predicatively across telco networks or doing fraud predictions or pricing prediction. For financial companies taking analysis of their entire portfolio risk instantaneously across tens of thousands of investors. Really powerful things but ultimately what's important there is it's happening in subsecond timing. Inside of that real time space we see analytical use cases like Chris talked about very effectively and then we see transactional so taking real time action or powering a transactional or interactive application in real time.
Then there's near real time which is basically traditional data science or predictive data science where a human being is asking really advanced complex questions of data and they're waiting there for that answer. We're talking here minutes of time that we're willing to wait to get an answer back from data but we're also talking about really advanced predictive and prescriptive machine learning algorithms. We call this ad hoc analysis of data because the questions change very very dynamically and they're very difficult questions to answer and they're moving across really large data sets but we still care about it happening quickly.
Then finally we have batch which really is the technology of Hadoop has come around to innovate and change the way that we handle batch analytics and batch transactional requirements. These are the types of processes that could days or a weekend or when you acquire another company or when Aridhia is ingesting a large data set from an external study for example. They can wait a number of hours for that data to get into their environment and nobody is going to lose their life or a clinical trial is not going to be impacted and miss deadlines for submission etcetera.
What happened is with these three different very very distinct use cases rose up in the last decade really for ways to treat data. If you throw a traditional data management methodology at these three use cases you run into real problems. Traditional data management solutions simply cannot provide real time performance. They just cannot process, ingest, do real time analytics. They're not built for it and they're not based on in-memory technologies which is really the way we solve real time challenges. Traditional methods simply can't solve real time. Traditional method is breaking or stretching and severely limiting near real time or interactive business.
What we started this webinar talking about was making sure you can keep all of your data and so if you throw a traditional expensive data management solution at near real time or interactive workloads you end up first of all not meeting the capacity requirements and the performance requirements but second of all it becomes extremely expensive to store these huge massive data sets that we're all talking about today. That same problem continues into the batch world where we certainly used to leverage our data management, traditional data management infrastructure or data warehouse infrastructure, for doing batch processing but again here extremely expensive to leverage these non-business critical data management and maybe historical data science questions on data. You want to leverage upcoming technologies like Hadoop to do that because they provide this drastically new price performance and cost ratio to your businesses.
When we took a look at those new challenges on data and we stepped back and we looked at the traditional ways to achieve or tackle those problems and saw the gaps. We worked with our partners and with our advanced customers like Aridhia to design this thing. This next generation architecture or a reference architecture for how to mix data management solutions and data management methodologies and your human beings and your data scientists and your workflow management along with open source third party tools like Chris mentioned with Chorus. In order to basically build a stack that can supply the three different requirements real time, interactive and batch solutions in technology stacks and do that both at the moment of ingesting data and then obviously subsequently as you move to take action on your data.
What you see here is a drastically simplified version of what Aridhia and our advanced customers are building out which is a business data lake. It effectively means building out the capability to do real time interactive and batch methodologies on data and also aggregating your data into a centralized approach so that your data scientists and your business analysts and your applications and your application development teams can all get to a single point of truth with your data set but then your data is basically managed in such a way that it moves through the data management technology tier and I'm going to circle that really quickly so we're talking now here inside this orange box.
We want to make sure that your users get instant access to data. We want to make sure that it's raw data. We want to make sure that we're not chopping or truncating or losing critical data granularity as you're bringing it into your environment. Keep all your data, retain it in a centralized management single moment of truth, single point of truth and then move that data to the right technology for the right use case. What that looks like for us and for Aridhia is a mixture of data management technologies that are integrated into a single solution and what Chris was saying about our licensing perspective, what we saw early on was that customers were going to have meshed requirements for data management solutions but more importantly that as they mature they're going to change over time.
If you know you want to do advanced data science you might leverage what's called a massively parallel analytics database which is on the left here, advanced analytics. You may start to develop a proclivity and an expertise around data science and then you realize wow, I've really got to start storing data sets I've never stored before. All of a sudden the idea of implementing Hadoop in your environment makes a lot of sense because for a low cost you can store social, public, private, private third party data all in the same place. Hadoop is really terrific at allowing you to store structured, unstructured and semi-structured data at a low cost but then you're going to reach this inflection point where you have a lot of data inside of your Hadoop ecosystem, your Hadoop infrastructure and you're going to want to start to do advanced analytics on top of Hadoop. Then as you find these new models and new insights out of that Hadoop analytics that you're doing, you're going to want to inject it into a real time experience whether that's for an analyst or a business process or a truly a real time application.
Again, when we squinted and we saw that reference architecture then we focused on our portfolio and we said what technologies are going to enable the enterprise and enable a company like Aridhia to build that reference architecture and these were the building blocks that we arrived at. What we did was we took and this is really unique across the industry. We took all of these moving parts and we bundled them together inside of a flexible subscription service.
Aridhia can leverage Greenplum Database for advanced analytics on their data sets and they can actually migrate their licensing and their investment with Pivotal over time into a technology called HAWQ which does advanced analytics over Hadoop. Again they can shift and redistribute their license location across in-memory technologies as they move from a data science exploratory situation or stage into trying to operationalize that data science in the form of real time analytics or an application. They can literally decommission some Greenplum Database and they can reuse the licenses up inside of these in-memory technologies.
That's really it for the advertisement but we feel this is a really unique offering. Pivotal is one of the only technology companies in the world that has the breadth of offering in data and the expertise in data science and finally the ability to build applications on top of all that and host it in a platform as a service. Aridhia is a shining example of a company that in just a few months time has really accelerated and leapfrogged the industry and innovated their infrastructure so that their line of business and their data science and their health care insights can really accelerate unfettered.
Here's some contact information and while we do that I'm going to bring up the chat window and Chris and I can handle some of these questions that have been coming in. If maybe Christina or somebody wants to read them out that's fine but here I see one right here. Chris, please. First question is for you.
Can you share how big the total data footprint is that you're working with and maybe a little bit about data growth because I know you're just starting to lean in on the infrastructure? Are we talking hundreds of terabytes here?
Chris Roche: Yes and it varies from different customers. What we have to build in is we have some customers who have an analytics challenge and we needed to be able to provision things very quickly to them on a small data size but obviously help address their analytics change. Then to the other end of it we have clients like Genomics England where the initial scoping is 100,000 genomes they want to do. In Scotland we scoped it to about 10,000 full genomes but then we just needed something that we know. That 100,000 is only the tip of the iceberg. There's 60 odd million people in the UK and the logical conclusion is at some point you've got to assume everybody will be having some part relevant to their genome.
What we wanted to do was pick the technology that would allow us to scale past things that we could even think of today, yeah. A couple of things we've done to do that. We've obviously chosen your technology because we know how well it scales. We positioned some of our technologies in the Edinburgh High Compute Center where the UK's huge other high performance compute so we can tap into there as well. We've built for scale but we can provision down to literally literally tiny tiny. You wouldn't call it a lake. You'd call it a puddle. Where somebody still needs that advanced collaboration and that advanced so absolutely. It goes from a few gigabytes right the way up to many many petabytes that we deal with at the moment.
Michael: Awesome. Another great question here. How do you handle native apps for genomics?
I'll take the first half and then Chris if you have some color to add please do but again, one of the interesting things, one of the unique things about Pivotal is we basically inherited a number of enterprise leading technologies from our parent companies EMC and VMware when we were created. One of the cool things is that Greenplum Database has an extremely advanced predictive algorithm and analytics database tool. It has inside of it, what its engine, at its engine is an amazing ad hoc parallel query engine and a query optimizer. That's what interfaces. It's a PostgreSQL based solution that you ask questions of it and it runs off and does advanced predictive analytics.
We ported that technology on top of Hadoop. That's what HAWQ is called so what's nice about that is we've inherited the maturity to plug into SQL based and ANSI SQL compliant applications and analytics tools natively. That's one of the benefits of working with Pivotal is Hadoop is a really young technology. It's an amazing innovation. We believe very very wholeheartedly in it but it's a young open source initiative that's developing rapidly.
One of the things it's missing is advanced tool sets that provide ecosystem maturity so that the applications and interfaces that you're all used to working with can actually make queries against HDFS or Hadoop based data and that's one of the unique benefits of Pivotal. The HAWQ query engine is based on Greenplum Database which our both ANSI SQL compliant, 100% SQL compliant, so SQL based and Postgres based user interface tools just work.
Chris do you have any insights in terms of specific genomic tools that you're leveraging?
Chris Roche: Not specific because we're on the cutting edge. We're discussing with another partner who is bringing in different things on how we can build their technologies but I think you hit the nail on the head is that integration back to legacy and that ability to reuse what you've got but then to rapidly build the new things because [inaudible 00:48:52] in the genomics world is so new around everything. I think we're building, a lot we're building from the start, which is good in some ways but the reason we chose was exactly the ones you pointed out because it would allow us to reuse some of the great work that's done already but it also allows us to rapidly develop new tools and new apps which is of course the focus of where we're going is how do we rapidly build these new apps. That we're going to hit challenges that we don't know that we don't know at the moment.
Michael: Yeah, so question for you on basically your data sourcing and pipelining. Can you describe one or two pipelines of joint genomic and [EHR 00:49:33] data analysis, if you can? If not that's fine. I have a bunch of other questions we can move to.
Chris Roche: On the information governance I can't. There's some NDA rules. I can't explain that. If people want to email me separately and then we can look at signing an NDA on that type of stuff. That's probably difficult for me to talk about some of that specifically especially where we are in some of our bids at the moment.
Michael: Excellent so question on the data lake in general. How do you know it's necessary?
The data lake architecture is a methodology. It basically there's a similar structure to something called the lambda architecture that you can all look up. It's basically the idea of supplying capabilities for multiple use cases on data real time, interactive and batch. The question was how do you know it's absolutely necessary to have a data lake architecture.
It really comes down to requirements on your data and what your business is going to leverage your data for. One could argue you're building a data lake the minute. You have a data lake already. It maybe made up of one silo data management technology that might be an [EDW 00:50:46] or you might be a startup and you might be leveraging just Hadoop because you started fresh and you're just using Hadoop as your foundation level.
What will happen is as you mature you'll move to a point where you're storing data, now you want to analyze data and you'll start doing more and more complex analysis on that data. All of a sudden you're going to have different requirements and tools that you're going to need [inaudible 00:51:07] supply your lines of business with those capabilities and then the performance will start to be edged up and finally when you move to operationalize this stuff that's really where the real time components come in.
I would argue a data lake is a concept. It's an ideology but ultimately as you mature you're going to find that you're going to need different tiers of data management methodologies and that is one way to tackle to them is this data lake architecture design. You can start with just Hadoop and open source tools and then as you get to more advanced analytics you're going to need something like Pivotal's HAWQ which can do massively parallel ad hoc query and it also does it extremely quickly and in an extremely compliant manner. Again you'll just start to mature and you'll start to have to add a lot of this to your environment.
Pivotal and all of our partners have data architects and we'd be happy to come in and provide insight in terms of how much of each of these technologies you're going to need as you move to build that out because it is important. You may need more Hadoop, more real time, or more advanced analytic engines in each situation. It really depends on how you're using the technology.
Question for you Chris, where were some of the challenges of implementing the Pivotal platform as you move to production?
Chris Roche: Some of the challenges. We've had a great experience with Pivotal. Let me say that up front. More of the challenges haven't been around the technology per se. We have a lot of challenges in our industry around information governance so any challenge that was happening on the technology side, let's be frank wasn't that going as slow as some of the information governance rules that we have to adhere to and focus on that. On a technology I've been really impressed at how smoothly we've been able to build our technology and build our stack. I'm racking my brains now because day-to-day it's information governance that I'm looking after and getting people to work together.
It's been a great. I don't want to make stuff up. It's been just quite seamless. We've taken a number. We've run some of the technology on VMs. We've run some of the technologies on the prebuilt EMC DCA. We've integrated it with for our data lake in genomics that you were talking about into ECM's Isilon stack and that's all been right from design right the way through. We've found it very very easy and very user friendly to deal with. I'm sure we're pushing the boundaries of things and one day we had a little bump in the road. Again, this comes back to the collaboration of working. It's been really easy to work with to get those things solved.
I was just talking to some of the guys today. They're doing some scalability testing on the genomics but they're saying well, we're just trying to get that right but here's no challenge there. Anything we come up to is small and the service we get is pretty good. I think the challenges in our business at the moment lay elsewhere around getting people to work together, getting research teams to want to work together, getting the acceptance of a cloud based safe haven. Making sure that you push that it's the information governance is all good.
Michael: I hear that a lot. That the largest challenge is changing the business and the business process and the IT organization. This is an entire change not just from a technology perspective. It's the way you build your business and do development or do your business on a daily basis that has to adapt and take advantage.
From a technical perspective, I think data governance is a great point. There was one question on security in the chat and I think that's a great point. The Hadoop ecosystem again is relatively young. Pivotal has a number of advanced services. We can add to the Hadoop stack that can help to manage data movement, data governance, etcetera. We have an extensive partner network. Not everything comes from Pivotal so for things like data masking and data encryption for example, there are partners we can bring to bear to help lock down and create a highly governed data infrastructure.
One of the other challenges I'll just say out front is adopting the new technologies and linking them to historic technologies so some of our customers begin to build out their next gen environment with us and they put all new projects or hosts everything, net new, in that new environment. Others are trying to bridge and build mesh data management infrastructures back with their existing traditional environments. That adds some complexity and the last I think would be as you. First of all a lot of planning. I think Chris just said that. You really want to take your time to understand what types of data you're dealing with, what the access, ingest and processing requirements are across them. That helps build out this meshed infrastructure.
I think the last is just a lot of communication. As you leverage this new stuff again you're reinventing your infrastructure which is going to force change in the business and finding line of business champions is a big piece of making that happen like Chris for example.
I think we're right up against the time. It was a great webinar. Chris, excellent presentation. Awesome case studies. I thought that was terrific. I really appreciate all of the great questions. There were just a couple we didn't answer. We'll definitely follow up with you guys offline and again, I think Christina pointed this out but the contents of this webinar will be recorded and made available within 48 hours and again, Chris thank you so much and so proud to see what you're doing at Aridhia. Please keep up the great work and we'll stay in touch.
Chris Roche: Yeah, thank you very much and my details are on the screen. If anybody wants to ask me questions offline please just send them through.
Michael: Awesome. Thanks everybody. Have a great rest of your day.