Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.
To learn more about how Axeda produces streams of massive data to multiple consumer dashboards that analyze business issues in near-real-time, we're joined by Kevin Holbrook, Senior Director of Advance Development at Axeda. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.
Here are some excerpts:
Gardner: We have the whole Internet of Things (IoT) phenomenon. People are accepting more and more devices, end points, sensors, even things within the human body, delivering data out to applications and data pools. What do you do in terms of helping organizations start to come to grip with this M2M and IoT data demand?
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Register now
Gain access to the HP Vertica Community Edition
The initial drivers to have a handle on those things are basic questions, such as, "Is this device on?" There are multi-million dollar machines that are currently deployed in the world where that question can’t be answered without a phone call.
Initial driver
That was the initial driver, the seed, if you will. We entered into that space from the remote-service angle. We deployed small-agent software to the edge to get the first measurements from those systems and get them pushed up to the cloud, so that users can interact with it.
Holbrook |
From there, we started aggregating data. We have about 1.5 million assets connected to our cloud now globally, and there is all kinds of data coming in. Some of it's very, very basic from a resource standpoint, looking at CPU consumption, disks space, available memory, things of that nature.
It goes all the way through to usage and diagnostics, so that you can get a very granular impression how this machine is operating. As you begin to aggregate this data, all sorts of challenges come out of it. HP has proven to be a great partner for starting to extract value.
We can certainly get to the data, we can connect the device, and we can aggregate that data to our partners or to the customer directly. Getting value from that data is a completely different proposition. Data for data’s sake is not high value.
From
our perspective, Vertica represents an endpoint. We've carried the
data, cared for the data, and made sure that the device was online,
generating the right information and getting it into Vertica.
Gardner: What is it that you're using Vertica for to do that? Are we creating applications, are we giving analysis as a service? How is this going to market for you?
Holbrook: From our perspective, Vertica represents an endpoint. We've carried the data, cared for the data, and made sure that the device was online, generating the right information and getting it into Vertica.
When we approach customers, were approaching it from a joint-sale perspective. We're the connectivity layer, the instrumentation, the business automation layer there, and we're getting it into Vertica ,so that can be the seed for applications for business intelligence (BI) and for analytics.
So, we are the lowest component in the stack when we walk into one of these engagements with Vertica. Then, it's up to them, on a customer-by-customer basis, to determine what applications to bring to the table. A lot of that is defined by the group within the organization that actually manages connectivity.
We find that there's a big difference between a service organization, which is focused primarily on keeping things up and running, versus a business unit that’s driving utilization metrics, trying to determine not only how things are used, but how it can influence their billing.
Business use
We've found that that's a place where Vertica has actually been quite a pop for us in talking to customers. They want to know not just the simple metrics of the machines' operation, but how that reflects the business use of it.
The entire market has shifted and continues to shift. I was somewhat taken aback only a couple of weeks ago, when I found out that you can no longer buy a jet engine. I thought this was a piece of hardware you purchased, as opposed to something that you may have rented and paid per use. And so [the model changes to leasing] as the machines get bigger and bigger. We have GE and the Bureau of Engraving and Printing as customers.
We certainly have some very large machines connected to our cloud and we're finding that these folks are shifting away from the notion that one owns a machine and consumes it until it breaks or dies. Instead, one engages in an ongoing service model, in which you're paying for the use of that machine.
While we can generate that data and provide some degree of visibility and insight into that data, it takes a massive analytics platform to really get the granular patterns that would drive business decisions.
Gardner: It sounds like many of your customers have used this for some basic blocking and tackling about inventory and access and control, then moved up to a business metrics of how is it being used, how we're billing, audit trails, and that sort of thing. Now, we're starting to look at a whole new type of economy. It's a services economy, based on cloud interactivity, where we can give granular insights, and they can manage their business very, very tightly.
There's
not only a ton of data being generated, but the regulatory and
compliance requirements which dictate where you can even leave that data
at rest.
Any thoughts about what's going to be required of your organization to maintain scale? The more use cases and the more success, of course, the more demand for larger data and even better analytics. How do you make sure that you don't run out of runway on this?
Holbrook: There are a couple of strategies we've taken, but before I dive into that, I'll say that the issue is further complicated by the issue of data homing. There's not only a ton of data being generated, but the regulatory and compliance requirements which dictate where you can even leave that data at rest. Just moving it around is one problem, and where it sits on a disk is a totally different problem. So we're trying to tackle all of these.
The first way to address the scale for us from an architectural perspective was to try to distribute the connectivity. In order for you to know that something's running, you need to hear from it. You might be able to reach out, what we call contactability, to say, "Tell me if you're still running." But, by and large, you know of a machine's existence and its operation by virtue of it telling you something. So even if a message is nothing more than "Hello, I'm here," you need to hear from this device.
From the connectivity standpoint, our goal is not to try to funnel all of this into a single pipe, but rather to find where to get a point of presence that is closest and that is reasonable. We’ve been doing this on our remote-access technology for years, trying to find the appropriate geographically distributed location to route data through, to provide as easy and seamless an experience as possible.
So that’s the first, as opposed to just ruthlessly federating all incoming data, distributing the connectivity infrastructure, as well as trying to get that data routed to its end consumer as quickly as possible.
We break down data from our perspective into three basic temporal categories. There's the current data, which is the value you would see reading a dial on the machine. There's recent data, which would tell you whether something is trending in a negative direction, say pressure going up. Then, there is the longer-term historical data. While we focus on the first two, we’d deliberately, to handle the scale problem, don't focus on the long-term historical data.
Recent data
I'll treat recent data as being anywhere from 7 to 120 days and beyond, depending on the data aggregation rates. We focus primarily on that. When you start to scale beyond that, where the real long tail of this is, we try to make sure that we have our partner in place to receive the data.
We don't want to be diving into two years of data to determine seasonal trending when we're attempting to collect data from 1.5 million assets and acting as quickly as possible to respond to error conditions at the edge.
Gardner: Kevin, what about the issue of latency? I imagine some of your customers have a very dire need to get analysis very rapidly on an ongoing streamed basis. Others might be more willing to wait and do it in a batch approach in terms of their analytics. How do you manage that, and what are some of the speeds and feeds about the best latency outcomes?
Holbrook: That’s a fantastic question. Everybody comes in and says we need a zero-latency solution. Of course, it took them about two-and-a-half seconds to say that.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Register now
Gain access to the HP Vertica Community Edition
There are two components to it. One is accepting that near-real-time, which is effectively the transport latency, is the smallest amount of time it can take to physically go from point A to point B, absent having a dedicated fiber line from one location to the other. We can assume that on the Internet that's domestically somewhere in the one- to two-second range. Internationally, it's in the two- to three-second or beyond range, depending on the connectivity of the destination.
What we provide is an ability to produce real-time streams of data outbound. You could take from one asset, break up the information it generates, and stream it to multiple consumers in near-real-time in order to get the dashboard in the control center to properly reflect the state of the business. Or you can push it to a data warehouse in the back end, where it then can be chunked and ETLd into some other analytics tool.
For us, we try not to do the batch ETLing. We'd rather make sure that we handle what we're good at. We're fantastic at remote service, at automating responses, at connectivity and at expanding what we do. But we're never going to be a massive ETL, transforming and converting into somebody’s data model or trying to get deep analytics as a result of that.
Gardner: Was it part of this need for latency, familiarity, and agility that led into Vertica? What were some of the decisions that led to picking Vertica as a partner?
Several reasons
Holbrook: There were a few reasons. That was one of them. Also the fact that there's a massive set of offerings already on top of it. A lot of the other people when we considered this -- and I won't mention competitors that we looked at -- were more just a piece of the stack, as opposed to a place where solutions grew out of.
It wasn't just Vertica, but the ecosystem built on top of Vertica. Some of the vendors we looked at are currently in the partner zone, because they're now building their solutions on top of Vertica.
We looked at it as an entry point into an ecosystem and certainly the in-memory component, the fact that you're getting no disk reads for massive datasets was very attractive for us. We don’t want to go through that process. We've dealt with the struggles internally of trying to have a relational data model scale. That’s something that Vertica has absolutely solved.
Gardner: Now your platform includes application services, integration framework, and data management. Let’s hone in on the application services. How are developers interested in getting access to this? What are their demands in terms of being able to use analysis outcomes, outputs, and then bring that into an application environment that they need to fulfill their requirements to their users?
It
wasn't just Vertica, but the ecosystem built on top of Vertica. Some of
the vendors we looked at are currently in the partner zone, because
they're now building their solutions on top of Vertica.
Holbrook: It breaks them down into two basic categories. The first is the aggregation and the collection of data, and the second is physical interaction with the device. So we focus on both about equally. When we look at what developers are doing, almost always it’s transforming the data coming in and reaching out to things like a customer relationship management (CRM) system. It's opening a ticket when a device has thrown a certain error code or integrating with a backend drop-ship distribution system in the event that some consumable has begun to run low.
In terms of interaction, it's been significant. On the data side, we primarily see that they're extracting subsets of data for deeper analysis. Sometimes, this comes up in discrete data points. Frequently, this comes up in the transfer of files. So there is a certain granularity that you can survive. Coming down the fire-hose is discrete data points that you can react to, and there's a whole other order of magnitude of data that you can handle when it's shipped up in a bulk chunk.
A good example is one of the use cases we have with GE in their oil and gas division where they have a certain flow of data that's always ongoing and giving key performance indicators (KPIs). But this is nowhere near the level of data that they're actually collecting. They have database servers that are co-resident with these massive gas pipeline generators.
So we provide them the vehicle for that granular data. Then, when a problem is detected automatically, they can say, "Give me far more granular data for the problem area." it could be five minutes before or five minutes since. This is then uploaded, and we hand off to somewhere else.
So when we find developers doing integration around the data in particular, it's usually when they're diving in more deeply based on some sort of threshold or trigger that has been encountered in the field.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Register now
Gain access to the HP Vertica Community Edition
All about strategy
Holbrook: It’s all going to be about the data-collection strategy. You're going to walk into a customer or potential customer, and their default response is going to be, "Collect everything." That’s not inherently valuable. Just because you've collected it, doesn’t mean that you are going to get value from it. We find that, oftentimes, 90-95 percent of the data collected in the initial deployment is not used in any constructive way.
I would say focus on the data collection strategy. Scale of bad data is scale for scale’s sake. It doesn’t drive business value. Make sure that the folks who are actually going to be doing the analytics are in the room when you are doing your data collection strategy definition. when you're talking to the folks who are going to wire up sensors, and when you're talking to the folks who are building the device.
Unfortunately, these are frequently within a larger business ,in particular, completely different groups of people that might report to completely different vice presidents. So you go to one group, and they have the connectivity guys. You talk about it and you wire everything up.
We find that, oftentimes, 90-95 percent of the data collected in the initial deployment is not used in any constructive way.
Then, six to eight months later, you walk into another room. They’ll say "What the heck is this? I can’t do anything with this. All I ever needed to know was the following metric." It wasn’t collected because the two hadn't stayed in touch. The success of deployed solutions and the reaction to scale challenges is going to be driven directly by that data-collection strategy. Invest the time upfront and then you'll have a much better experience in the back.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.
You may also be interested in:
- Health Shared Services BC harnesses a healthcare ecosystem using IT asset management
- How a Hackathon Approach Juices Innovation on Big Data Applications for Thomson Reuters
- How Waste Management Builds a Powerful Services Continuum Across Operations, Infrastructure, Development, and IT Processes
- GSN Games hits top prize using big data to uncover deep insights into gamer preferences
- Hybrid cloud models demand more infrastructure standardization, says global service provider Steria
- Service providers gain new levels of actionable customer intelligence from big data analytics
- How UK data solutions developer Systems Mechanics uses HP Vertica for BI, streaming and data analysis
- Advanced cloud service automation eases application delivery for global service provider NNIT
- HP network management heightens performance while reducing total costs for Nordic telco TDC
- How Capgemini's UK financial services unit helps clients manage risk using big data analysis