Friday, November 14, 2014

HP Analytics blazes new trails in examining business trends from myriad data

The next BriefingsDirect deep-dive big data thought leadership interview examines how HP analyzes its own vast data warehouses to derive new insights for its global operations, extensive supply chain, sales organization, global marketing groups, and customers.

We'll explore how the Analytics Group at HP, based in India, sifts through myriad internal data sources, as well as joins with other public data sets, to deliver entirely new intelligence value that helps make business more responsive and efficient.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

To learn how, BriefingsDirect sat down with Pramod Singh, Director of Digital and Big Data Analytics at HP Analytics in Bangalore, India, at the recent HP Big Data 2014 Conference in Boston. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us a little bit about the Analytics Group at HP, what you do, and what’s the charter of your organization.

Singh: We have a big analytics organization in HP, it’s called Global Analytics and serves the analytics for most of HP. About 80 to 90 percent of the analytics happening inside of HP comes out of this eco-system. We do analytics across the entire food chain at HP, which includes the supply chains, marketing, and sales.

What I personally lead is an organization called Digital Analytics, and we are responsible for doing analytics across all digital properties for HP. That includes the eCommerce, social media, search, and campaign analytics. Additionally, we also have a Center of Excellence for Big Data Analytics, where we're using HP’s big-data technologies, which is that framework called HAVEn, to help develop big-data solutions for HP customers, as well as internal HP.
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
Gardner: Obviously, HP is a very large global company. What sort of datasets are we talking about here? What’s the volume that you're working with?

Data explosion

Singh: As you know, a data explosion is happening. On one end, HP has done a very good job over the last six to seven years of getting most of their enterprise data into something called an enterprise data warehouse. We're talking about close to two petabytes of data, which is structured data.

Singh
The great part of this journey is that we have taken data from 700-800 different data marts into one enterprise data warehouse over the last three to four years. A lot of data that is not part of the enterprise is also becoming an important part of making the business decisions.

A lot of that data I personally deal with in the digital space, is what we call the human-generated data, the social media data, which no enterprise owns. It’s open for anybody to go use that. What I've started to see is that, on one hand, we've done a really good job of getting data in the enterprise and getting value out of it.

We've also started to analyze and harvest the data that is out in the open space. It could be blogs, Twitter feeds, or Facebook data. Combining that is what’s bringing real business value.

The Global Analytics organization is more than 1,000 people spread through different parts of the world. A big chunk of that is in Bangalore, India, but we have folks in the US and the UK. We have a center in Guadalajara, Mexico and couple of other locations in India. My particular organization is close to 100 people.
We've also started to analyze and harvest the data that is out in the open space. It could be blogs, Twitter feeds, or Facebook data. Combining that is what’s bringing real business value.

I have a PhD in pure mathematics, and before that I had an MBA in marketing. It's a little bit of an awkward mix there, and got in into analytics space in mid '90s working for Walmart.
I built out Walmart’s Assortment Planning System in late '90s and then came to HP in 2000 leading an advance data-mining center in Austin, Texas. From there I evolved into doing e-business analytics for few years and then moved to customer knowledge management. I spent five years in IT developing analytics platform.

About year-and-a-half ago, I got an opportunity to lead the big-data practice for this organization called Global Analytics. In five years, they had gone from five people to more than 1,000 people, and that intrigued me a lot. I was able to take the opportunity and move to India to go lead that team.

More insights

Gardner: Pramod, when we look back into this data, do you gain more insights knowing what you're looking for, or not knowing what you're looking for? What kind of insights were the unexpected consequences of your putting together this type of data infrastructure and then applying big-data analytics to it?

Singh: We deal with that day-in and day-out. I’ll give you a couple of examples there. This is something that happened about three or four years ago with HP. We were looking at a problem that was a classic problem in marketing to the US small and medium-sized business organizations (SMBs). We had a fixed budget for marketing, and across the US, there are more than 20 million SMBs. The classic definition of an SMBs is any business with 100-500 employees.

HP had an install base of a small part of that. We realized that particular segment of a SMBs is squeezed between a classic consumer, where you can do mass marketing, such as TV advertising, and an enterprise, where you can actually put bodies, your people who have relationship. SMBs are squeezed in between those two extremes.
The question then became what do we do with that? Again, when you do data mining and analytics, you may not know where this will lead you.

On one hand, you can't reach out to every single one of them. It’s just way too expensive to do that. On the other hand, if you try to go do the marketing, you don’t get the best out of it.

We were starting to work on something like that. I was approached by a vice president in marketing who said revenues are declining and they had a limited marketing budget. They didn’t know what to do.

This is where one of those unexpected things came in. I said, "Let’s see in that install base whether there are different segments of customers that are behaving differently." That led us on kind of a journey where we said, well, "How do we start to do that right? Let’s figure out what are the different attributes of data that I can capture."

On one hand, if you look at SMBs, you can capture who they are, what industry segment they're in, how many employees they have, where are they based, who the CEO is. It's what we call firmographics.

On the other hand, you have classes of data involving their interaction with HP. It could be things like how many PCs or servers they bought, how long ago did they buy it, how much money they spent, the whole transactional aspect of it.

Then, there are some things that are derived attributes. You may be able to derive that in the last one year they came to us four times. What interaction did we have on the website,? For example, did they come to us through a web channel? If they did, how many email offers were sent to them? How many of those were clicked? How many of those converted? Those are the classes of data that we could capture.

The question then became what do we do with that? Again, when you do data mining and analytics, you may not know where this will lead you.

Mathematical modeling

We thought that maybe there are different classes of customers. We pulled our data together and started to do mathematical modeling. There are techniques called clustering, analytical techniques called K-Means, and things like that. We started to get some results and to analyze them. In this type of situation, we have to be careful, because there are some things that may look mathematically correct, but may not have a real business value behind it.

Once we started to look at those things, we went through multiple iterations. We realized that we were not getting segments or clusters that were very distinct. One day, I was driving home in Austin, and I said, "You know what? Who they are I don’t control, but as far as what they're doing with HP we have a reasonably good understanding."

So we started to do clustering based only on those attributes, and that’s where an "aha" moment came. We started to find these clusters, which we call segments, where we eventually found a cluster which was that 7 to 8 percent of the population that brought in 45 percent of revenue.

The marketers started to say that this was a gold mine. That’s what we never expected to happen. We put together a structure. Once we figured out these four or five clusters, we tried to figure out why they were clustered together. What’s common?
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
We built out a primary research thing, where we took a random sample out of each one of those clusters, interviewed those guys, and were able to build a very good profile of what these segments were.

There are 20 million SMBs in US, and we are able to build a model to predict which of these prospects are similar to the clusters we had. That’s where we were able to find customers that looked like our most profitable customers, which we ended up calling Vanguards. That resulted into a tremendous amount of  a dollar increment for HP. It's a good example of what you talked when you find unexpected things.

We just wanted to analyze data. It led us to a journey and ended up finding a customer group we weren't even aware of. Then, we could build marketing strategy to actually go target those and get some value out of it.

Gardner: At the Big Data Conference, I've spoken to other organizations who are creating an analytics capability and then exposing that to as many of their employees as possible, hoping for this very sort of unexpected positive benefit. Is there a way that you're taking your analytics either through visualization or tools and then allowing a larger population within HP to experiment with it?

Singh: We're trying to democratize the analytics as much as we can. One thing we're realizing is that to get the full value, you don't want data to stay in silos. So there are a couple of things you have to do. In terms of building out an ecosystem where you have good set of motivated people and where you can give them a career path, we have created this organization called Global Analytics. You get a critical mass of people who challenge each other, learn from each other, and do lot of analytics.

But also it’s very important that on the consumption side of it, you have people who are analysts and understand analytics and get the best value out of it. So they try to create that ecosystem. We have seen both ends of it.

Good career path

If you just give them to one data miner or analytics person in one team, sometimes the person does not find an ecosystem to challenge himself or herself. We're trying to do it on both sides of the fence, so that we can provide people with a good career path.

Hiring these folks is not easy. Once you've hired them, retaining them is not easy. You want to make sure to create an ecosystem where it’s challenging enough for these people to work. It also has to be an ecosystem where you continually challenge them and keep training them.

The analytical techniques are evolving. When I started doing it, things were stable for years. Now, the newer class of data is coming in, newer techniques are coming in, and newer classes of business problems are coming in. It’s very important that we keep the ecosystem going. So we try to do it on both sides.

Gardner: Very interesting. HP, of course, has its own line of products for big-data analysis. You're such a large global enterprise that you're doing lots of analysis, as any good business should, but you're also being asked to show how this works. Are there some specific use cases that demonstrate for other enterprises what you've learned yourselves.
You want to make sure to create an ecosystem where it’s challenging enough for these people to work. It also has to be an ecosystem where you continually challenge them and keep training them.

Singh: There are several that we can talk about. One is in a social media space. I briefly talked about that. My career evolved of doing analytics in what I call "data inside the enterprise." But, over the last couple of years, we started to go look at data outside the enterprise.

Recently we went and looked at a bank. We were able to harvest data from the Internet, publicly available data like Glassdoor, for example. Glassdoor is a website where employees of a company can put their feedback, talk about the company, and rate things.

We were presenting it to the executives of this particular bank and we were able to get all the data and tell them the overall employee morale. We figured out that the life-work balance for the employees wasn't very good.

The main component that the employees weren't happy about was their leave policy and their vacation policy. We drilled down and figured out that the bankers seemed to be fairly happy, but the IT guys and analysts weren't very happy. Again, this is one example where we didn't ask for a line of data from the customer. This data is publicly available. You and I, or anybody else, can go get it. I can do that same analysis for HP or any other company.

That’s where I believe the classes of analytics we're doing is changing. A lot of times, your competitive differentiator is the ability to do things with that data. Data is a corporate asset and it will be, but this class of what we call the user-generated data is changing analytics as a whole. The ability to go harvest it and, more importantly, get value out of it will be the competitive differentiator.

Gardner: Any other use cases that demonstrate the power of a particular type of platform, let’s say Vertica in HAVEn, where you've got the power of a columnar architecture and you've got the ability to bring in unstructured data from Autonomy? Maybe there are a couple of use cases that demonstrate the unique attributes of HAVEn when it comes to inclusivity and the comprehensive nature of information today?

Game changer

Singh: Let me talk about a couple of the things that happened in the HAVEn ecosystem. One of the main work forces in HAVEn is our massively parallel database called Vertica. In addition to being a database where we can ingest data very quickly, ingest large volumes of data, and run query performance, the game-changer for us as an analytics practitioner for me has been ability to do analytics in database.

If I look at my career over the last 20-22 years, most of the times what happens in the analytics space is that you have data residing in a database or an enterprise data warehouse. When you want to build a model, you take the data out and use an analytics platform like SAS, R, or SPSS. You do something there and you either bring the data back into the environment or you run the models and publish them out.

What Vertica has done that's unique is given us a framework, and through the UDEF framework, we could build a data mining model and run it directly on a database engine and take the output out.

An example we took to HP Discover a couple of months ago was trying to predict a failure of a machine before the actual failure happens. HP has these big machines and big printers, which are very expensive.

Like lot of high-end devices these days, they send out a lot of data. They send out data about when you're using a machine. The sensors send out a lot of information, maybe the pressure of the valves, the kind of the temperature they're in, the kind of throughput they're giving you, or the number of pages you've printed.
Looking at each components of failure, we could predict with a certain probability when the machine will fail and with a certain probability.

Also, they give you data on the events when the machine was not performing optimally or actually failed. We were able to go ingest all that data, put the data onto in the Vertica  platform, and build predictive models using open source R language. We built a model that can predict the failure of a machine.

Looking at each components of failure, we could predict with a certain probability when the machine will fail and with a certain probability, so our service reps can actually be proactive and not wait for the machine to fail. That's one example of doing an in-database data mining using Vertica.

Another example used more components around the social-media space. One of the problems in the social-media space, and I think you guys are probably familiar with this, is finding influencers.

I gave a talk yesterday around figuring out how you do that. There are classical ways if you go by the uni-dimensional thing around the number of followers or retweets you have. Barack Obama or Lady Gaga would be big influencers, but Barack Obama, for cloud computing for HP, may not be a very big influencer.

So you build those classes of algorithms. My team has actually built out three patented algorithms to figure out how to identify influencers in the space. We've actually built out a framework where we can source that data from the social-media space, drop it into a Hadoop kind of an environment.
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
We use Autonomy to enrich and put some sentiments to it and then drop the data into the Vertica environment. In that Vertica environment, you run the compressed algorithms and get an output. Then, you can score and predict who is the influencer for the topic you are looking for.

Influencers

I gave the example of Barack Obama, in general a big influencer, but he is not influencer for all topics. Maybe in politics or the US government he's a big influencer, but not for cloud computing. Influencer is also a function of time. Somebody like Diego Maradona probably was a big influencer in soccer in the ’90s, but in 2014, not that much.

You have to make sure that you can incorporate those as part of the logic of your algorithm. We've been able to use the multiple components of HAVEn and build out a complete framework where we can tell numerically who the main influencers are and how influential they are. For example, if you get a score of 93 and I get a score of 22, you are almost four times as influential as I am.

Gardner: For other organizations that are interested in learning more about how HP Analytics is operating and maybe learning from your example, are there any resources or websites we can go to, where you are providing more information about HP Analytics?
You have to make sure that you can incorporate those as part of the logic of your algorithm.

Singh: Definitely. We work through our partners in Enterprise Services. We have our own website as well. There are multiple ways that you can approach us. You can talk to the Vertica sales team and they can connect to us. As I said, we do analytics for all of HP and for select customers. We do not have a direct sales arm to us. We work through our partners in Enterprise Services, as well as with software team.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

No comments:

Post a Comment