Wednesday, November 6, 2013

Efficient big data capabilities help Cerner drive needed improvements into healthcare outcomes

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

The next edition of the HP Discover Podcast Series delves into how a healthcare solutions provider leverages big-data capabilities. We’ll see how Cerner has deployed the HP Vertica Analytics platform to help their customers better understand healthcare trends, as well as to help them better run their own systems.

To learn more about how high-performing and cost-effective big data processing forms a foundational element to improving healthcare quality and efficiency, join Dan Woicke, Director of Enterprise Systems Management at Cerner Corp. based in Kansas City, Missouri.

The discussion, which took place at the recent HP Vertica Big Data Conference in Boston, is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Gardner: We're going through some major transitions in how healthcare payments are going to be made -- and how good care is defined. We're moving from pay for procedures to more pay for outcomes. So tell me about Cerner, and why big data is such a big deal.

Woicke: The key element here is that the payment structure is changing to more of an outcome model. In order for that to happen, we need to get all the sources of data from many, many disparate systems, bring them in, and let our analysts work on what the right trends are and predict quality outcomes, so that you can repeat those and stay profitable in the new system.

My direct responsibility is to bring in massive amounts of performance data. This is how our Cerner Millennium systems are running.
We have hundreds of clients, both in the data center and those that manage their own systems with their own database administrators (DBAs). The challenge is just to have a huge system like that running with tens of thousands of clinicians on the system.

We need to make sure that we have the right data in place in order to measure how systems are running and then be able to predict how those systems will run in the future. If things are happening that might be going negative, how can we take the massive amounts of data that are coming into our new analytical platform, correlate those parameters, predict what’s going to happen, and then take action before there is a negative?

Effect change

We want to be able to predict what’s happening, so that we can effect change before there is a negative impact on the system.

Gardner: How does big data and the ability to manage big data get you closer to the real-time and then, ultimately, proactive results your clients need?

Woicke: Since January we've begun to bring in what we call Response Time Measurement System (RTMS) records. For example, when a doctor or a nurse is in our electronic medical record (EMR) system is signing an order, I can tell you how long it took to log into the system. I can tell you how long you were in the charting module.

Woicke
All those transactions produce 10 billion timers, per month, across all of our clients. We bring those all into our HP Vertica Data Warehouse. Right now, it’s about a two-hour response time, but my goal, within the next 12 months, is to get it down to 10 minutes.

I can see in real time when trends are happening, either positive or negative, and be able to take action before there is an issue.

Gardner: Tell us more about about Cerner -- what you do in IT.

Woicke: We run the largest EMR in the world. We have well over 400 domains to manage  -- we call them domains -- which allows us to hook up multiple facilities to those domains. Once we have multiple facilities connecting into those domains, at any given time, there are tens of thousands clinicians on the system at one time.

We have two data centers in Kansas City, Missouri and we host more than half for our clients in those data centers. The trend is moving toward being remote-hosted managed like that. We still have a couple of hundred clients that are managing their own Millennium domains. As I said before, we need to make sure that we provide the same quality of service to both those sets of clients.

Single database

Cerner Millennium is a suite of products or solutions. Millennium is a platform where the EMR is placed into a single database. Then, we have about 55 different solutions that go on top of that platform, starting with ambulatory solutions. This year was really neat. We were able to launch our first ambulatory iPad application.

There are about 55 different solutions, and it's growing all the time with surgery and lab that fit into the Cerner Millennium system. So we do have a cohesive set of data all within one database, which makes us unique.

Gardner: Where does the data come from primarily, and how much data we are talking about?

Woicke: We're talking about quite a bit of data, and that’s why we had to transform something away from a traditional OLTP database into an MPP type database, because those systems that are now sending data to Cerner. 

We have claims data, and HL7 messages. We're going to get all our continuous care records from Millenium. We have other EMRs. So that’s pretty much the first time that we're bringing in other EMR records.

You’ll have that claim data that comes in from multiple sources, multiple EMRs, but the whole goal of population health is to get a population to manage their own health. That means that we need to give them the tools in their hands. And they need to be accurate, so that they can make the right decisions in the future. What that's going to do is bring the total cost of your healthcare down, which is really the goal.
What that's going to do is bring the total cost of your healthcare down, which is really the goal.

We have health-plan enrollments, and then of course, within Millennium, we're going to drill down into outcomes, re-admissions, diagnosis, and allergies. That’s the data that we need to be able to predict what kind of care we are going to have in the future.

Gardner: So it seems to me that we talk about "Internet of things." We're also going to the "Internet of people." More information from them about their health comes back and benefits you and benefits the healthcare providers. But ultimately, they can also provide great insights to the patients themselves.

Do you see, in the not too distant future, applications where certain data -- well-protected and governed of course -- is made into services and insights that allow for a better proactive approach to health?

Proactive approach

Woicke: Without a doubt. We're actually endorsing this internally within the company by launching our own weight-loss challenges, where we're taking our medical records and putting them on the web, so that we have access to them from home.

I can go on the site right now and manage my own health. I can track the number of steps I'm doing. Those are the types of tools that we need to launch to the population, so that they endorse that good behavior, which will ultimately change their quality of life.

Right now, we're in production with the operation side that we talked about a little bit about earlier. Then, we are in production with what we call Health Facts, a huge set of blinded data. We hire a team of analysts and scientists to go through this data and look for trends.
You can see what that’s going to do for the speed of the amount of analysis we could do on the same amount of data. It’s game changing.

It’s something we haven’t been able to do until recently, until we got HP Vertica. I am going to give you a good example. We had analysts log a SQL query to do an exploratory type of analysis on the data. They would log that at 5 p.m., then issue it, and hopefully, by the time they came back at 8 a.m. the next day, that query would be done.

In Vertica, we've timed those queries at between two and five seconds. So you can see what that’s going to do for the speed of the amount of analysis we could do on the same amount of data. It’s game changing.

There were a lot of competitors that would have worked out, but we had a set of criteria that we drilled down on. We were trying to make it as scientific as possible and very, very thorough. So we built a score sheet, and each of us from the operation side and Health Facts side graded and weighted each of those categories that we were going to judge during the proof of concept (POC). We ended up doing six POCs.
We got down to two, and it was a hard choice. But with the throughput that we got from Vertica, their performance, and the number of simultaneous users on the system at a given period of time, it was the right choice for us.

Gardner: And because we're talking about healthcare, costs are super important. Was there a return on investment (ROI) or cost benefit involved as well?

Extremely competitive

Woicke: Absolutely. You could imagine that this would be the one or two top categories weighted on our score sheet, but certainly HP Vertica is extremely competitive, compared to some of the others that we looked at.

Gardner: Dan, looking to the future, what do you expect your requirements to be, say, two years from now? Is there a trajectory that you need to take as an organization, and how does that compare to where you see Vertica going?

Woicke: Having Vertica as a partner, we navigate that together. They invited me here to Boston to sit on the user board. It was really neat to sit right there with [HP Vertica General Manager] Colin Mahony at the same table and be able to say, "This is what we need. These are our needs coming around the corner," and have him listen and be able to take action on that. That was pretty impressive.

To answer your question though, it’s more and more data. I was describing the operations side, where we bring in 10 billion RTMS records. There's going to be another 10 billion type of records coming in from other sources, CPU, Memory, Disk I/O, everything can be measured.

We want to bring it into Vertica, because I'm going to be able to do some correlation against something we were talking about. If I know that the RTMS records show a negative performance that's going to happen within the next 10-15 minutes, I can figure out which one of those operational parameters is most affecting that outcome of that performance, and then can send the analyst directly in to mitigate that problem.
By bringing in more and more data and being able to correlate it, we're going to show all the clients, as well as the providers, how their system is doing.

On the EMR side, it’s more data as well. On the operations side, we're going to apply this to other enterprises to bring in more data to connect to the experts. So there is always somebody out there. That’s the expert. What we're going to do is connect the provider with the payers and the patient to complete that triangle in population health. That’s where we're going in the next few months.

Gardner: I certainly think that managing data effectively is a huge component of our healthcare challenge here in the United States, and of course, you're operating in about 19 countries. So this is something that will be a benefit to almost any market where efficiency, productivity, quality of care come to bear.

Woicke: At Cerner Corp., we're really big on transparency. We have a system right now called the Lights On Network, where we are taking these parameters and bringing them into a website. We show everything to the client, how they're performing and how the system is doing. By bringing in more and more data and being able to correlate it, we're going to show all the clients, as well as the providers, how their system is doing.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Monday, November 4, 2013

Different paths to cloud and SaaS enablement yield similar major benefits for Press Ganey and Planview

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: VMware.

The next VMworld innovator panel discussion focuses on how two companies are using aggressive cloud-computing strategies to deliver applications better to their end users.

We'll hear how healthcare patient-experience improvement provider Press Ganey and project and portfolio management provider Planview are both exploiting cloud efficiencies and agility. Their paths to the efficiency of cloud have been different, but the outcomes speak volumes for how cloud transforms businesses.

To understand how, we sat down with Greg Ericson, Senior Vice President and Chief Innovation Officer at Press Ganey Associates in South Bend, Indiana, and Patrick Tickle, Executive Vice President of Products at Planview Inc. in Austin, Texas.

The discussion, which took place at the recent 2013 VMworld Conference in San Francisco, is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: VMware is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Gardner: We heard a lot about cloud computing at VMworld, and you're both going at it a little differently. Greg, tell us a bit about the type of cloud approach you’re taking at Press Ganey.

Ericson: Press Ganey is the leader in a patient-experience analytics. We focus on providing deep insight into the patient experience in healthcare settings. We have more than 10,000 customers within the healthcare environment that look to us and partner with us around patient-experience improvement within the healthcare setting.

Ericson
We started this cloud  journey in July of 2012 and we set out to achieve multiple goals. Number one, we wanted to position Press Ganey's software as solution products of the next generation and have a platform that was able to support them. 

We went through a journey of consolidating multiple data centers. We consolidated 14 different storage arrays in our process and, most importantly, we were able to position our analytic solutions to be able to take on exponentially more data and provide that to our clients.

Gardner: Patrick, how has cloud helped you at Planview? You were, at one time, a fully a non-cloud organization. Tell us about your journey.

Tickle: Planview has been an enterprise software vendor, a classic best-of-breed focused enterprise software vendor, in this project and portfolio and resource management space for over 20 years.

Tickle
We have a big global customer base of on-premise customers that built up over the last 23 years. Obviously, in the world of software these days, there's a fairly seismic big shift about being in software as a service (SaaS) and how you get to the cloud, the business models, and all those kinds of things.

Conventional wisdom is for a lot it was that you can't get there unless you start from scratch. Obviously, because this is the only thing we do, it was pretty imperative that we figure out a way to get there.

So two or three years ago, we started trying to make the transition. There were a lot of things we had to go through, not just from an infrastructure standpoint, but from a business model and delivery standpoint, etc.

The essence was here. We didn’t have time to rewrite a code base in which we've invested 10-plus years and hundreds of thousands of hours of customer experience to be a market-leading product in our space. It could take five years to rewrite it. Compared to where we were 10 years ago, when you and I first met, there are a lot more tools in the bag for people to get to the cloud that there were then.

So we really went after VMware and did the research sweep much more aggressively. We started out with our own kind of infrastructure that we bolted together and moved to a FlexPod in our second generation.

We have vCloud Hybrid Services now, and leveraging our existing code base, and then the whole suite of VMware products and services, we have transformed the company into a cloud provider. Today, 90 percent of all our new Planview customers are SaaS customers. It's been a big transition for us, but the technology from VMware has been right in the center of making it happen.

Business challenges

Gardner: Greg, tell us a little bit about some of the business challenges that are driving your IT requirements that, in turn, make the cloud model attractive. Is this a growth issue? Is this a complexity issue? What are your business imperatives that make your IT requirements?

Ericson: That’s a great question. Press Ganey is a 25-year-old organization. We pioneered the concept of patient experience and the analytics, and insight into the patient experience, within the healthcare setting. We have an organization that's steeped in history, and so there are multiple things that we're looking at.

Number one, we have one of the largest protected health information (PHI) databases in the United States. So we felt that we had to have a very secure and robust solution to provide to our clients, because they trust us with their data.

Number two, with the healthcare reform, the focus on patient experience is somewhat mandatory, whereas before, it was somewhat voluntary. Now, it's regulated or it's part of the healthcare reform. When you look at organizations, some were actually coming to us and saying, "We want to get however many patient surveys out that we need to satisfy our threshold."
Our scientists are also finding a correlation between the patient experience results and clinical and quality outcomes.

Our philosophy is why would you want to do that? We believe that if you can understand and leverage the different media to be able to fill that out, you can survey your entire population of patients that are coming into not only your institution but, in the accountable care organization, the entire ecosystem that you’re serving. That gives you tremendous insight into what's going on with those patients.

Our scientists are also finding a correlation between the patient experience results and clinical and quality outcomes. So, as we can tie those data sets together in those episodic events, we're finding very interesting kinds of new thought, leading thought, out there for our clients to look at.

So for us, going from minimally surveying your population to doing census survey, which is your entire population, represents an exponential growth. The last thing is that, for our future, in terms of going after some of those new analytics, some of the new insight that we want to provide our clients, we want to position the technology to be able to take us there.

We believe that the VMware vCloud Suite represents a completeness of vision. It represents a complete a single pane of glass into managing the enterprise and, longer-term, as we become more sophisticated in identifying our data and as the industry matures, we think that a public cloud, a hybrid cloud, is in the future for us, and we're preparing for that.

Gardner: And this must be a challenge for you, not only in terms of supporting the applications, but also those data sets. You're getting some larger data sets and they could be distributed. So the cloud model suits your data needs over time as well?

Deeper insights

Ericson: Absolutely. It gives us the opportunity to be able to apply technology in the most cost-value proposition for the solutions that we’re serving up for our customers.

Our current environment is around 600 server instances. We have about 300 terabytes (TB) running in 20 SaaS applications, and we're growing exponentially each month, as we continue to provide that deeper insight for our customers.

Gardner: Patrick, for your organization what are some of the business drivers that then translate into IT requirements?

Tickle: From an IT perspective, it changed the culture of the company, moving from being a on-premise perpetual kind of "ship the software and have a customer care organization that focuses on bug and break-fix" to a service-delivery model. There were a lot of things that rippled through that whole thing.
We had to move from an IT culture to an OPs culture and all the things that go along with that, performance and up time.

At the end of the day, we had to move from an IT culture to an operations culture and all the things that go along with that, performance and up-time. Our customer base is global. So it was being able to provide that around the globe is. All those things were pretty significant shifts from an IT perspective.

We went from a company that had a corporate IT group to a company that has a hosting and DevOps and Ops team that has a little bit of spend in corporate IT.

Out of the gate, the first step at Planview was moving to colo. SunGard has been a great partner for us over the last couple of years as our ping, power, and pipe. Then, in our first generation, we bolted together some of our storage and computer infrastructure because it wasn’t quite all the way there. Then, in our most recent incarnation of the infrastructure we’re using FlexPods at SunGard in Austin, Texas and London.
OPEX spend

We're always having to evaluate future footprints. But ultimately, like many companies, we would like to convert that infrastructure investment from a capital spend into an OPEX spend. And that’s what’s compelling with vCloud Hybrid Service.

What we've been excited about hearing from VMware is not just providing the performance and the scalability, but the compatibility and the economic model that says we’re building this for people who want to just move virtual machines (VMs). We understand how big the opportunity is, and that’s going to open up more of a public cloud opportunity for us to evaluate for a wide variety of use cases going forward.

Gardner: How big a deal is it when we can, with just a click of a mouse, move workloads to any support environment we want?

Tickle: It's a huge deal. Whether it’s a production environment or disaster recovery (DR) environment, at the end of the day it's a big deal for both of us. For a SaaS company the only matter is renewals. It’s happy customers that renew. That transition from perpetual-plus maintenance to a renewal model, where you're on the customer service watch at another level, and it's every minute of every day.

Everything that we can do to make the customer experience, not just from our UI and our software, but obviously the delivery of the service, as compelling as possible, allows us to run our business. That can be a disaster scenario or just great performance across our geography where we have customers and then to do that in a cost effective way that operates inside our business model, our profit and loss.

So our shareholders are equally pleased with their turn off. We can't afford to have half of the company’s OPEX go into IT, while we’re trying to make customers as successful as they possibly can. We continue to be encouraged that we’re on a great path with the stack that we're seeing to get there.

Gardner: I think it's fair to say that cloud is not just repaving old cow paths, that cloud is really transforming your entire business. Do you agree, Greg?

Rejuvenate legacy

Ericson: I agree. It allows us, especially an organization that’s 25 years steeped in history, to be able to rejuvenate our legacy applications and be able to deliver those with maximum speed, maximizing our resources, and delivering them in a secure environment. But it also allows us to be able to grow, to flex, and to be able to rejuvenate and organically transform the organization. It's pretty exciting for us and it adds a lot of value to our clients indirectly.

Gardner: Greg,what are some of the more measurable pay-offs when you go to cloud? Are these soft payoffs of productivity and automation or are there hard numbers about return on investment (ROI) or moving more to a operation cost versus capital cost? What do you get when you do cloud right?

Ericson: We justify the investment based on consolidation of our data centers, consolidation and retirement of our storage arrays, and so on. That’s from a hard-savings perspective. From a soft-savings perspective, clearly in an environment that was not virtualized, virtualizing the environment represented a significant cost avoidance.
Our focus is on a complete solution that allows us to really focus in on what's important for us, what's important for our clients.

Longer-term, we're looking at how to position the organization with a robust, virtual secured infrastructure that runs with a minimum amount of technical resources, so that we can focus most of our efforts on delivering innovative applications to our clients.

The biggest opportunity for us is to focus there. As you look at the size of the data set and the growth of those data sets, positioning infrastructure to be able to stay with you is exciting for us and it’s a value proposition for our clients.

Entire environment

With a minimum amount of staff, we were able to move in nine months and virtualize our entire environment. When you talk about 600 servers and 300 TB of data, that's a pretty sizable enterprise and we're fully leveraging the vCloud Suite.

Our network is virtualized, our storage is virtualized, and our servers are virtualized. The release of vCloud Suite 5.5 and some of the additional network functionality and storage functionality that’s coming out with that is rather exciting. I think it's going to continue to add more value to our proposition.

Gardner: Some people say that a single point of management, when you have that comprehensive suite approach, comes in pretty handy, too.

Ericson: It does, because it gives you the capability of managing through a single pane of glass across your environments. I was going to accentuate that we’re about 50 percent complete in building on our catalog.

For our next steps, number one is that we’re looking at building upon the excellence of Press Ganey and building our next-generation enterprise data warehouse. We’re looking at leveraging from a DevOps perspective the VMware vCloud Suite, and we already have some pilots that are up and running. We'll continue to build that out.
Not only are we maximizing our assets in delivering a secure environment for our clients, but we're also really working toward what I call engineering to zero.

As we deploy, not only are we maximizing our assets in delivering a secure environment for our clients, but we're also really working toward what I call engineering to zero. We’re completely automating and virtualizing those deployments and we're able to move those deployments, as we go from dev to test, and test to user acceptance testing, and then into a production environment.

Tickle: As we all know, there are lot of hypervisors out there. We can all get that technology from a wide variety of sources. But to your question about the value with the stack, that’s what's we look at and again. What's important now is not just the product stack, but the services stack.

We look at a company like VMware and say, "Site Recovery Manager in conjunction with vCloud Hybrid Services brings a DR solution to me as SaaS vendor and that fits with my architecture and brings that service stack plus."

There's no comparing another hypervisor vendor to build out that stack of service. Again, we could probably talk about probably numerous, but that’s when I listen to the things that go on at the event and get to spend time with the people at VMware. That whole value stack that VMware is investing in is what looks so much more compelling than just picking pieces of technology.

Gardner: Looking to the future, Greg, based on what you've heard at VMworld about the general availability of vCloud Hybrid Services and the upgrade to the suite of private cloud support, what has you most excited? Was there something that surprised you? What is in the future road map for you?

A step further

Ericson: A couple of different things. The next release of NSX is exciting for us. It allows us to be able to take the virtualization of our network a step further. Also to be able to connect hypervisors into a hybrid-cloud situation is something that, as we evolve our maturity in terms of managing our data, is going to be exciting for us.
One of the areas that we're still teasing out and want to explore is how to tie in that accelerator for a big-data application into that. Probably, in 2014, what we're looking at is how to take this environment and really move from a DR kind of environment to a high-availability environment. I believe that we’re architected for that and because of the virtualization we can do that with a minimum amount of investment.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: VMware.

You may also be interested in:

Wednesday, October 30, 2013

Learn how Visible Measures tracks an expanding universe of video and viewer use big data

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

The next edition of the HP Discover Performance Podcast Series examines how video advertising solutions provider Visible Measures delivers impactful metrics on video use and patterns.

Visible Measures measures via a massive analytics capability an ocean of video at some of the highest scales I've ever heard of. By creating very deep census data of everything that's happened in the video space, Visible Measures uses unique statistical processes to figure out exactly what patterns emerge within video usage at high speed and massive scale and granularity.
 
To learn more about how Visible Measures measures, please welcome Chris Meisl, Chief Technology Officer at Visible Measures Corp., based in Boston.

The discussion, which took place at the recent HP Vertica Big Data Conference in Boston, is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Gardner: Tell us a little bit about video metrics. It seems that this is pretty straightforward, isn't it? You just measure the number of downloads and you know how many people are watching a video -- or is there more to it?

Meisl: You'd think it would be that straight-forward. Video is probably the fastest growing component of the Internet right now. Video consumption is accelerating unbelievably. When you measure a video, not only you are looking at did someone view the video but how far they are into the video. Did they rewind it, stop it, or replay certain parts? What happened at the end? Did they share it?

Meisl
There are all kinds of events that can happen around a video. It's not like in the display advertising business, where you have an impression and you have a click. With video, you have all kinds of interactions that happen.

You can really measure engagement in terms of how much people have actually watched the video, and how they've interacted with a video while it's playing.

Gardner: This is an additional level of insight beyond what happened traditionally with television, where you need a Nielsen box or some other crude, if I could use that term, way of measuring. This is much more granular and precise.

Census based

Meisl: Exactly. The cable industry tried to do this on various occasions with various set-up boxes that would "phone home" with various information. But for the most part, like Nielsen, it's panel-based. On the Internet, you can be more census-based. You can measure every single video, which we do. So we now know about over half a billion videos and we've measured over three trillion video events.

Because you have this very deep census data of everything that's happened, you can use standard and interesting statistical processes to figure out exactly what's happening in that space, without having to extend a relatively small panel. You know what everyone is doing.

Gardner: And of course, this extends not only to programming or entertainment level of video, but also to the advertising videos that would be embedded or precede or follow from those. Right?

Meisl: Exactly. Advertising and video are interesting, because it's not just standard television-style advertising. In standard television advertising, there are 30-second spots that are translated into the Internet space as pre-roll, post-roll, mid-roll, or what have you. You're watching the content that you really want to watch, and then you get interrupted by these ads. This is something that we at Visible Measures didn't like very much.

We're promoting this idea of content marketing through video, and content marketing is a very well-established area. We're trying to encourage brands to use those kinds of techniques using the video medium.
The first part that you have to do is have a really comprehensive understanding of what's going on in the video space.

That means that brands will tell more extensive stories in maybe three- to five-minute video segments -- that might be episodic -- and we then deliver that across thousands of publishers, measure the engagement, measure the brand-lift, and measure how well those kinds of video-storytelling features really help the brand to build up the trust that they want with their customers in order to get the premium pricing that that brand has over something much more generic.

Gardner: Of course, the key word there was "measures." In order to measure, you have to capture, store, and analyze. Tell us a little bit about the challenges that you faced in doing that at this scale with this level of requirements. It sounds as if even the real-time elements of being able to feed back that information to the ad servers is important, too.

Meisl: Right. The first part that you have to do is have a really comprehensive understanding of what's going on in the video space.

Visible Measure started with measuring all video that’s out there. Everywhere we can, we work with publishers to instrument their video players so that we get signals while people are watching videos on their site.

For the publishers that don't want to allow us to instrument their players, then we can use more traditional Google spidering techniques to capture information on the view count, comment count, and things like that. We do that on a regular basis, a few times a day or at least once a day, and then we can build up metrics on how the video is growing on those sites.

Massive database

So we ended up building this massive database of video -- and we would provide information, or rather insight, based on that data, to advertisers on how well their campaigns were performing.

Eventually, advertisers started to ask us to just deliver the campaign itself, instead of giving just the insight that they would then have to try to convince various other ad platforms to use in order to get a more effective campaign. So we started to shift a couple of years ago into actual campaign delivery.

Now, we have to do more of a real-time analysis, because as you mentioned, you want to, in real time, figure out the best ways to target the best sites to send that video to, and the best way to tune that campaign in order to get the best performance for the brand.

Gardner: And so faced with these requirements, I assume you did some proofs of concept (POCs). You looked around the marketplace for what’s available and you’ve come up with some infrastructure that is so far meeting your needs.

Meisl: Yes. We started with Hadoop, because we had to build this massive database of video, and we would then aggregate the information in Hadoop and pour that into MySQL.
There are all kinds of possibilities that you can only do if you have access to the data as soon as it was generated.

We quickly got to the point where it would take us so long to load all that information into MySQL that we were just running out of hours in the day. It took us 11 hours to load MySQL. We couldn’t actually use the MySQL. It was a sharded MySQL cluster. We couldn’t actually use it while it was being loaded. So you’d have to have two banks of it.

You only have a 12-hour window. Otherwise, you’ve blown your day. That's when we started looking around for alternate solutions for storing this information and making it available to our customers. We elected to use HP Vertica -- this was about four years ago -- because that same 11-hour load took two hours in Vertica. And we're not going to run out of money buying hard drives, because they compress it. They have impressive compression.

Now, as we move more into the campaign delivery for the brands that we represent, we have to do our measurement in real-time. We use Storm, which is a real-time stream processing platform and that writes to Vertica as the events happen.

So we can ask questions of Vertica as they happen. That allows our ad service, for example, to have much more intelligence about what's going on with campaigns that are in-flight. It allows us to do much more sophisticated fraud detection. There are all kinds of possibilities that you can only do if you have access to the data as soon as it was generated.

Gardner: Clearly if a load takes 11 hours, you're well into the definition of big data. But I'm curious, for you, what constitutes big data? Where does big data begin from medium or non-big data?

Several dimensions

Meisl: There are several dimensions to big data. Obviously, there's the size of it. We process what we receive, maybe half a billion events per day, and we might peak at near a million events a minute. There is quite a bit of lunchtime video viewing in America, but typically in the evening, there is a lot more.

The other aspect of big data is the nature of what's in that data, the unstructured nature, the complexity of it, the unexpectedness of the data. You don't know exactly what you're going to get ahead of time.

For information that’s coming from our instrumented players, we know what that’s going to be, because we wrote the code to make that. But we receive feeds from all kinds of social networks. We know about every video that's ever mentioned on Twitter, videos that are mentioned on Facebook, and other social arenas.

All of that's coming in via all kinds of different formats. It would be very expensive for us to have to fully understand those formats, build schemas for them, and structure it just right.

So we have an open-ended system that goes into Hadoop and can process that in an open-ended way. So to me, big data is really its volume plus the very open-ended, unknown payloads in that data.
We're continuously looking at how well we optimize delivery of campaigns and we're continuously improving that.

Gardner: How do you know you're succeeding here? Clearly, going from 11 hours to two hours is one metric. Are there other metrics of success that you look to -- they could be economic, performance, or concurrent query volumes?

Tell me what you define as a successful analytics platform.

Meisl: At the highest level, it's going to be about revenue and margin. But in order to achieve the revenue and margin goals that we have, obviously we need to have very efficient processes for doing the campaign delivery and the measurement that we do.

As a measurement company, we measure ourselves and watch how long it takes to generate the reports that we need, or for how responsive we are to our customers for any kind of ad-hoc queries that they want or special custom reports that they want.

We're continuously looking at how well we optimize delivery of campaigns and we're continuously improving that. We have corporate goals to improve our optimization quarter-over-quarter.

In order to do that, you have to keep coming up with new things to measure and new ways to interpret the data, so you can figure out exactly which video you want to deliver to the right person, at the right time, in the right context.

Looking down the road

Gardner: Chris, we're here at the Big Data Conference for HP Vertica and its community. Looking down the road a bit, what sort of requirements do you think you are going to need later? Are there milestones or is there a road map that you would like to see Vertica and HP follow in order to make sure that you don't run out of runaway again sometime?

Meisl: Obviously, we want HP and Vertica to continue to scale up, so that it is still a cost-effective solution as the volume of data will inexorably rise. It's just going to get bigger and bigger and bigger. There's no going back there.

In order to be able to do the kind of processing that we need to do without having to spend a fortune on server farms, we want Vertica, in particular, to be very efficient at the kinds of queries that it needs to do and proficient at loading the data and of accommodating asking questions of it.
In order to be able to do the kind of processing that we need to do without having to spend a fortune on server farms, we would want Vertica.

In addition to that, what's particularly interesting about Vertica is its analytic functions. It has a very interesting suite of analytic functions that extends beyond the normal standard SQL analytic functions based on time series and pattern matching. This is very important to us, because we do fraud detection, for example. So you want to do pattern matching on that. We do pacing for campaigns, so you want to do time series analysis for that.

We look forward to HP and Vertica really pushing forward on new analytic capabilities that can be applied to real-time data as it flows into the Vertica platform.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Tuesday, October 22, 2013

Complex carrier network performance data on HP Vertica yields performance and customer metrics boon for Empirix

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

The next edition of the HP Discover Performance Podcast Series explores how network testing, monitoring, and analytics provider Empirix developed unique and powerful data processing capabilities.

Empirix uses an advanced analytics engine to continuously and proactively evaluate carrier network performance and customer experience metrics -- amid massive data flows -- to automatically identify issues as they emerge.

To learn more about how a combination of large-scale, real-time performance and pervasive data access made the HP Vertica analytics platform stand out to support such demands for Empirix, join Navdeep Alam, Director of Engineering, Analytics and Prediction at Empirix, based in Billerica, Mass.

The discussion, which took place at the recent HP Vertica Big Data Conference in Boston, is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Gardner: Why do you have such demanding requirements for data processing and analysis?

Alam: What we do is actively and passively monitor networks. When you're in a network as a service provider, you have the opportunity to see the packets within that network, both on the control plane and on the user plane. That just means you're looking at signaling data and also user plane data -- what's going on with the behavior; what's going at the data layer. That’s a vast amount of data, especially with mobile, and most people doing stuff on their devices with data.

Alam
When you're in that network and you're tapping that data, there is a tremendous amount of data -- and there's a tremendous amount of insights about not only what's going on in the network, but what's going on with the subscribers and users of that network.

Empirix is able to collect this data from our probes in the network, as well as being able to look at other data points that might help augment the analysis. Through our analytics platform we're able to analyze that data, correlate it, mediate it, and drive metrics out of that data.

That’s a service for our customers, increasing value from that data, so that they can turn around a return on investment (ROI) and understand how they can leverage their networks better to increase operations and so forth. They can understand their customers better and begin to analyze, slice and dice, and visualize data of this complex network.

They can use our platform, as well to do proactive and predictive analysis, so that we can create even better ROI for our customers by telling them what potentially might go wrong and what might be the solution to get around that to avoid a catastrophe.

New opportunities

Gardner: It’s interesting that not only is this data being used for understanding the performance on the network itself, but it's giving people business development and marketing information about how people are using it and where the new opportunities might be.

Is that something fairly new? Were you able to do that with data before, or is it the scale and ability to get in there and create analysis in near-real-time that’s allowed for such a broad-based multilevel approach to data and analysis?

Alam: This is something we've gotten into. We definitely tried to do it before with success, but we knew that in order to really tackle mobile and the increasing demands of data, we really had to up the ante.

Our investment with HP Vertica and how we've introduced that in our new analytics platform, Empirix IntelliSight 1.0, that recently came out, is about leveraging that platform -- not only for scalability and our ability to ingest and process data, but to look at data in its more natural format, both as discrete data, and also as aggregate data. We allow our customers to view that data ad hoc and analyze that data.

It positioned us very well. Now that we have a central point from which all this data is being processed and analyzed, we now run analytics directly at this data, increasing our data locality and decreasing the data latency. This definitely ups our ante to do things much faster, in near real time.
We're right where the data is being generated, where it’s flowing, and because of that we're able to gain access to the data in real-time.

Gardner: Obviously, the sensors, probes, agents, and the ability to pull in the information from the network needs to reside or be at close proximity to the network, but how are you actually deployed? Where does the infrastructure for doing the data analysis reside? Is it in the networks themselves, or is there a remote site? Maybe you could just lay out the architecture of how this is set up.

Alam: We get installed on site. Obviously, the future could change, but right now we're an on-premise solution. We're right where the data is being generated, where it’s flowing, and because of that we're able to gain access to the data in real-time.

One of the things we learned is that this is a tremendous amount of data. It doesn't make sense for us to just hold it and assume that we will do something interesting with it afterward.

The way we've approached our customers is to say, "What kind of value do you seen in this data? What kind of metrics or key performance indicators (KPIs), or what do you think is valuable in this data? We then build a framework that defines the value that they can gain from data -- what are the metrics and what kind of structure they want to apply to this data. We're not just calculating metrics, but we're also applying some sort of model that gives this data some structure.

As they go through what we call the Empirix Intelligent Data Mediation and Correlation (IDMC) system, it's really an analytics calculator. It's putting our data into the Vertica system, so that at that point we have meaningful, actionable data that can be used to trigger alarms, to showcase thresholds, to give customers great insight to what's going on in their network.

Growing the business

From that, they can do various things, such as solve problems proactively, reach out to the customers to deal with those issues, or to make better investments with their technology in order to grow their business.

Gardner: How long have you been using Vertica and how did that come to be the choice that you made? Perhaps you could also tell us a little bit about where you see things going in terms of other capabilities that you might need or a roadmap for you?

Alam: We've been using Vertica for a few years, at least three or four, even before I came on-board. And we're using Vertica primarily for its ability to input and read data very quickly. We knew that, given our solutions, we needed to load a lot of data into the system and then read a lot of data out of it fast and to do it at the same time.

At that time, the database systems we used just couldn't meet the demands for the ever-growing data. So we leveraged Vertica there, and it was used more as an operational data store. When I came on board about a year-and-a-half ago, we wanted to evolve our use of Vertica to be not just for data warehousing, but a hybrid, because we knew that in supporting a lot of different types of data, it was very hard for us to structure all of those types of data.

We wanted to create a framework from which we can define measures and metrics and KPIs and store it in a more flat system from which we can apply various models to make sense of that data.
Ultimately, we wanted to allow customers to play with this data at will and to get response in seconds, not hours or minutes.

That really presented us a lot of challenges, not only in scalability, but our ability to work and play with data in various ways. Ultimately, we wanted to allow customers to play with this data at will and to get response in seconds, not hours or minutes.

It required us to look at how we could leverage Vertica as an intelligent data-storage system from which we could process data, store it, and then get answers out of that data very, very quickly. Again, we were looking for responses in a second or so.

Now that we've put all of our data in the data basket, so to speak, with Vertica, we wanted to take it to the next level. We have all this data, both looking at the whole data value chain from discrete data to aggregate data all in one place, with conforming dimensions, where the one truth of that data exists in one system.

We want to take it to the next step. Can we increase our analytical capabilities with the data? Can we find that signal from the noise now that we have all this data? Can we proactively find the patterns in the data, what's contributing to that problem, surface that to our customers, and reduce the noise that they are presented with.?

Solving problems

Instead of showing them that 50 things are wrong, can I show them that 50 things are wrong, but that these one or two issues are actually impacting your network or your subscribers the most? Can we proactively tell them what might be the cause or the reason toward that and how to solve it?

The faster we can load this data, the faster we can retrieve the value out of this data and find that needle in the haystack. That’s where the future resides for us.

Gardner: Clearly, you're creating value and selling insight to the network to your customers, but I know other organizations have also looked at data as a source of revenue in itself. The analysis could be something that you could market. Is there an opportunity with the insight you have in various networks -- maybe in some aggregate fashion -- to create analysis of behavior, network use, or patterns that would then become a revenue source for you, something that people would subscribe to perhaps?

Alam: That's a possibility. Right now, our business has been all about empowering our customers and giving them the ability to leverage that data for their end use. You can imagine, as a service provider, having great insight into their customers and the over-the-top applications that are being leveraged on their network.

Could they then use our analytics and the metadata that we're generating about their network to empower their business systems and their operations to make smarter decisions? Can they change their marketing strategy or even their APIs about how they service customers on their network to take advantage of the data that we are providing them?
The opportunity to grow other business opportunities from this data is tremendous, and it's going to be exciting to see what our customers end up doing with their data.

The opportunity to grow other business opportunities from this data is tremendous, and it's going to be exciting to see what our customers end up doing with their data.

Gardner: Are there any metrics of success that are particularly important for you. You've mentioned, of course, scale and volume, but things like concurrency, the ability to do queries from different places by different people at the same time is important. Help me understand what some of the other important elements of a good, strong data-analysis platform would be for you?

Alam: Concurrency is definitely important. For us it's about predictability or linear scalability. We know that when we do reach those types of scenarios to support, let’s say, 10 concurrent users or a 100 concurrent users, or to support a greater segmentation of data, because we have gone from 10 terabytes to 30 terabytes, we don't have to change a line of code. We don't have to change how or what we are doing with our data. Linear scalability, especially on commodity hardware, gives us the ability to take our solution and expand it at will, in order to deal with any type of bottlenecks.

Obviously, over time, we'll tune it so that we get better performance out of the hardware or virtual hardware that we use. But we know that when we do hit these bottlenecks, and we will, there is a way around that and it doesn't require us to recompile or rebuild something. We just have to add more nodes, whether it’s virtual or hardware.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Open FAIR certification launched

This guest post comes courtesy of Jim Hietala, The Open Group Chief of Security.

By Jim Hietala

The Open Group today announced the new Open FAIR Certification Program aimed at risk analysts, bringing a much-needed professional certification to the market that is focused on the practice of risk analysis. Both the Risk Taxonomy and Risk Analysis standards, standards of The Open Group, constitute the body of knowledge for the certification program, and they advance the risk analysis profession by defining a standard taxonomy for risk, and by describing the process aspects of a rigorous risk analysis.

Hietala
We believe that this new risk analyst certification program will bring significant value to risk analysts, and to organizations seeking to hire qualified risk analysts. Adoption of these two risk standards from The Open Group will help produce more effective and useful risk analysis. This program clearly represents the growing need in our industry for professionals who understand risk analysis fundamentals.  Furthermore, the mature processes and due diligence The Open Group applies to our standards and certification programs will help make organizations comfortable with the ground breaking concepts and methods underlying FAIR. This will also help professionals looking to differentiate themselves by demonstrating the ability to take a “business perspective” on risk.

In order to become certified, risk analysts must pass an Open FAIR certification exam. All certification exams are administered through Prometric, Inc. Exam candidates can start the registration process by visiting Prometric’s Open Group Test Sponsor Site www.prometric.com/opengroup.  With 4,000 testing centers in its IT channel, Prometric brings Open FAIR Certification to security professionals worldwide. For more details on the exam requirements visit http://www.opengroup.org/certifications/exams.

Available November 1

Training courses will be delivered through an Open Group accredited channel. The accreditation of Open FAIR training courses will be available from November 1, 2013.

Our thanks to all of the members of the risk certification working group who worked tirelessly over the past 15 months to bring this certification program, along with a new risk analysis standard and a revised risk taxonomy standard, to the market. Our thanks also to the sponsors of the program, whose support is important to building this program. The Open FAIR program sponsors are Architecting the Enterprise, CXOWARE, SNA, and The Unit.
Thanks to all of the members of the risk certification working group who worked tirelessly over the past 15 months to bring this certification program to the market.

Lastly, if you are involved in risk analysis, we encourage you to consider becoming Open FAIR certified, and to get involved in the risk analysis program at The Open Group. We have plans to develop an advanced level of Open FAIR certification, and we also see a great deal of best practices guidance that is needed by the industry.

For more information on the Open FAIR certification program visit http://www.opengroup.org/certifications/openfair

You may also wish to attend a webcast scheduled for 7th November, 4pm BST that will provide an overview of the Open FAIR certification program, as well as an overview of the two risk standards. You can register here.

This guest post comes courtesy of Jim Hietala, The Open Group Chief of Security. 

You may also be interested in: