Monday, August 10, 2015

How eCommerce sites harvest big data across multiple clouds

The next BriefingsDirect big data innovation thought leadership interview highlights how a consultant helps large ecommerce organizations better manage their big data architectures across cloud environments.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.

To learn more about how big data is best architected for the largest web applications, BriefingsDirect sat down with Jimmy Mohsin, Principal Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: How are large web applications deciding on the right big data architecture? 

Mohsin: There's a lot of interest in trying to deal with large data volumes, not only large data volumes, but also data that changes rapidly. Now, there are many companies that have very large datasets, some in terabytes, some in petabytes and then they're getting live feeds.

The data is there and it’s changing rapidly. The traditional databases sometimes can’t handle that problem, especially if you're using that database as a warehouse and you're reporting against it.

Basically, we have kind of a moving-target situation. With HP Vertica, what we've seen is the ability to solve that problem in at least some of the cases that I've come across, and I can talk about specific use cases in that regard.

Input/output issues

Gardner: Before we get into a specific use case, I'm interested particularly in some of these input/output issues. People are trying to decide how to move the data around. They're toying with cloud. They're trying to bring data for more types of traditional repositories. And, as you say, they're facing new types of data problems with streaming and real-time feeds.

How do you see them beginning this process when they have to handle so many variables? Is it something that’s an IT architecture, or enterprise architecture, or data architecture? Who's responsible for this, given that it’s now a rather holistic problem?
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Mohsin: In my present project, we ran into that. The problem is that many companies don't even have a well defined data-architecture team. Some of them do. You'll find a lot of companies with an enterprise-architect role and you'll have some companies with a haphazard definition of an architectural group.
Mohsin
Net-net, at least at this point, unless companies are more structured, it becomes a management issue in the sense that someone at the leadership level needs to know who has what domain knowledge and then form the appropriate team to skin this cat.

I know of a recent situation where we had to build a team of four people, and only one was an architect. But we built a virtual team of four people who were able to assemble and collate all the repositories that spanned 15 years and four different technology flavors, and then come up with an approach that resulted in a single repository in HP Vertica.

So there are no easy answers yet, because organizations just aren't uniformly structured.

Gardner: Well, I imagine they'll be adapting, just like we all are, to the new realities. In the meantime, tell me about a specific use case that demonstrates the intensity of scale and velocity, and how at least one architecture has been deployed to manage that?

Mohsin: One of my present projects deals with one of the world's largest retailers. It's eCommerce, online selling. One of the things they do, in addition to their transactions of buying and selling, is email campaign management. That means staying in touch with the customer on the basis of their purchases, their interests, and their profiles.

One of the things we do is see what a certain customer’s buying preferences have been over the past 90 days. Knowing that and the customer’s profile, we can try to predict what their buying patterns will be. So we send them a very tailored message in that regard. In this project, we're dealing with about 150 to 160 million emails a day. So this is definitely big data.

Here we have online information coming into one warehouse as to what's happening in the world of buying and selling. Then, behind the scenes, while that information is being sent to the warehouse, we're trying to do these email campaigns.

This is where the problem becomes fairly complicated. We tried traditional relational database management systems (RDBMS), and they kind of worked, but we ran into a slew of speed and performance issues. That's really where the big-data world was really beneficial. We were able to address that problem in about a seven-month project that we ran.

Gardner: And this was using HP Vertica?

Large organization

Mohsin: We did an evaluation. We looked at a few databases, and the corporate choice was Vertica. We saw that there is a whole bunch of big-data vendors. The issue is that many of the vendors don't have any large organizations behind them, and Vertica does. The company management felt that this was a new big database, but HP was behind it, and the fact that they also use HP hardware helped a lot.

They chose Vertica. The team I was managing did a proof of concept (POC) and we were able to demonstrate that Vertica would be able to handle the reporting that is tied to the email campaign management. We ran a 90 day POC, and the results were so positive that there was an interest in going live. We went live in about another 90 days, following a 90-day POC.

Gardner: I understand that Vertica is quite versatile. I've heard of a number of ways in which it's used technically. But this email campaign problem almost sounds like a transactional issue, a complex event processing issue, or a transfer agent scaling issue. How does big data, Vertica, and analytics come to bear on this particular problem?

Mohsin: It's exactly what you say it is. As we are reporting and pushing out the campaigns, new information is coming in every half hour, sometimes even more frequently. There's a live feed that's updating the warehouse. While the warehouse is being updated, we want to report against it in real time and keep our campaigns going.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
The key point is that we can't really stop any of these processes. The customers who are managing the campaigns want to see information very frequently. We can’t even predict when they would want their information. At the same time, the transactional systems are sending us live feeds.

The problem we ran into with the traditional RDBMS is that the reporting didn't function when the live feeds were underway. We couldn't run our back-end email campaign reports when new data was coming in.

One of the benefits Vertica has, due to its basic architecture and its columnar design is that it's better positioned to do that. This is what we were able to demonstrate in the live POC, and nobody was going to take our word for it.

The end user said, "Take few of our largest clients. Take some of our clients that have a lot of transactions. Prove that the reports will work for those clients." That's what we did in 30 days. Then, we extended it, and then in 90 days, we demonstrated the whole thing end to end. Following that was the go-live.

Gardner: You had to solve that problem of the live feeds, the rapidity of information. Rather going to a stop, batch process, analyze, repeat, you've gained a solution to your problem.

But at the same time, it seems like you're getting data into an environment where you can analyze it and perhaps extract other forms of analysis, in addition to solving your email, eCommerce trajectory issues. It seems to me that you're now going to have the opportunity to add a new dimension of analysis to what's going on and perhaps we find these transactions more toward a customer inference benefit.

More than a database

Mohsin: One of the things internally that I like to say is that Vertica isn't just a big database, it’s more than just a database. It's really a platform, because you have distributed all, you are publishing other tools. When we adopted it and went live with this technology, we first solved the feeds and speeds problem, but now we're very much positioned to use some of the capabilities that exist in Vertica.

We had Distributed R being one of them, Inference Analysis being another one, so that we can build intelligent reports. To date, we've been building those outside the RDBMS. RDBMS has no role in that. With Vertica, I call it more of a data platform. So we definitely will go there, but that would be our second phase.

As the system starts to function and deliver on the key use cases, the next stage would be to build more sophisticated reports. We definitely have the requirements and now we have the ability to deliver.

Gardner: Perhaps you could add visualization capabilities to that. You could make a data pool available to more of the constituents within this organization so that they could innovate and do experiments. That’s a very powerful stuff indeed.

Is there anything else you can tell us for other organizations that might be facing similar issues around real-time feeds and the need to analyze and react, now that you have been through this on this particular project. Are there any lessons learned for others.
One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

If you're facing transactional issues and you haven't thought about a big-data platform as part of that solution, what do you offer to them in terms of maybe lighting a light bulb in their mind about looking for alternatives to traditional middleware.

Mohsin: Like so many people try to do, we tried to see if anyone else had done this. One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

There are lots of people in development, and some are live, but in our space, we couldn't find anyone who was live. We solved that issue via a quick-hit POC. The big lesson there was that we scoped the POC right. We didn’t want to do too much and we didn’t want to do too little. So that was a good lesson learned.

The other big thing is the data-migration question. Maybe, to some extent, this problem will never be solved. It's not so easy to pull data out of legacy database systems. Very few of them will give you good tools to migrate away from them. They all want you to stay. So we had to write our own tooling. We scoured the market for it, but we couldn’t find too many options out there.

Understand your data

So a huge lesson learned was, if you really want to do this, if you want to move to big data, get a handle on understanding your data. Make sure you have the domain experts in-house. Make sure you have the tooling in place, however rudimentary it might be, to be able to pull the data out of your existing database. Once you have it in the file system, Vertica can take it in minutes. That’s not the problem. The problem is getting it out.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
We continue to grapple with that and we have made product enhancement recommendations. But in fairness to Vertica, this is really not something that Vertica can do much about, because this is more in the legacy database space.

Gardner: I've heard quite a few people say that, given the velocity with which they are seeing people move to the cloud, that obviously isn't part of their problem, as the data is already in the cloud. It's in the standardized architecture that that cloud is built around, if there is a platform-as-a-service (PaaS) capability, then getting at the data isn't so much of a problem, or am I not reading that correctly?
There is still a lingering fear of the cloud. People will tell you that the cloud is not secure.

Mohsin: No, you're reading that correctly. The problem we have is that a lot of companies are still not in the cloud. There is still a lingering fear of the cloud. People will tell you that the cloud is not secure. If you have customer information, if you have personalized data, many organizations don't want to put it in the cloud.

Slowly, they are moving in that direction. If we were all there, I would completely agree with you, but since we still have so many on-premise deployments, we're still in a hybrid mode -- some is on-prem, some is in the cloud.

Gardner: I just bring it up because it gives yet another reason to seriously consider cloud. It’s a benefit that is actually quite powerful -- the data access and ability to do joins and bring datasets together because they're all in the same cloud.

Mohsin: I fundamentally agree with you. I fundamentally believe in the cloud and that it really should be the way to go. Going through our very recent go-live, there is no way we could have the same elasticity in an on-prem is deployment that we can have in a cloud. I can pick up the phone, call a cloud provider, and have another machine the next day. I can't do that if it’s on-premise.

Again, a simple question of moving all the assets into the cloud, at least in some organizations, will take several months, if not years.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP Enterprise.

You may also be interested in:

Wednesday, August 5, 2015

How Localytics uses big data to improve mobile app development and marketing

The next BriefingsDirect big data innovation case study interview investigates how Localytics uses data and associated analytics to help providers of mobile applications improve their applications -- and also allow them to better understand the uses for their apps and dynamic customer demands.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.
To learn more about how big data helps mobile application developers better their products and services, please join Andrew Rollins, Founder and Chief Software Architect at Localytics, based in Boston. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us about your organization. You founded it to do what?

Rollins: We founded in 2008, two other guys and I. We set out initially to make mobile apps. If you remember back in 2008, this is when the iPhone App Store launched. So there was a lot of excitement around mobile apps at that time.

Rollins
We initially started looking at different concepts for apps, but then, over a period of a couple months, discovered that there really weren't a whole lot of services out there for mobile apps. It was basically a very bare ecosystem, kind of like the Wild, Wild West. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

We ended up focusing on whether there was a services play in this industry and we settled on analytics, which we then called Localytics. The analogy we like to use is, at the time it was a little bit of a gold rush, and we want to sell the pickaxes. So that’s what we did.

Gardner: That makes a great deal of sense, and it has certainly turned into a gold rush. For those folks who do the mining, creating applications, what is it that they need to know?

Analytics and marketing

Rollins: That’s a good question. Here's a little back story on what we do. We do analytics, but we also do marketing. We're a full-service solution, where you can measure how your application is performing out in the wild. You can see what your users are doing. You can do anything from funnel analysis to engagement analysis, things like that.

From there, we also transition into the marketing side of things, where you can manage your push notifications, your in/out messaging.

For people who are making mobile apps, often they want to look at key metrics and then how to drive those metrics. That means a lot of A/B testing, funnel analysis, and engagement analysis.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
It means not only analyzing these things, but making meaningful interactions, reaching out to customers via push notifications, getting them back in the app when they are not using the app, identifying points of drop-off, and messaging them at the right time to get them back in.
An example would be an e-commerce app. You've abandoned the shopping cart. Let’s get you back in the application via some sort of messaging. Doing all of that, measuring the return on investment (ROI) on that, measuring your acquisition channels, measuring what your users are doing, and creating that feedback loop is what we advocate mobile app developers do.

Gardner: You're able to do data-driven marketing in a way that may not have been very accessible before, because everything that’s done with the app is digital and measurable. There are logs, servers -- and so somewhere there's going to be a trail. It’s not so much marketing as it is science. We've always thought of marketing as perhaps an art and less of a science. How do you see this changing the very nature of marketing?

Everything ultimately that you are doing really does need to be data-driven. It's very hard to work off just intuition alone.
Rollins: Everything ultimately that you are doing really does need to be data-driven. It's very hard to work off of just intuition alone. So that's the art and science. You come out with your initial hypothesis, and that’s a little bit more on the craft or art side, where you're using your intuition to guide you on where to start.

From there, you have to use the data to iterate. I'm going to try this, this, and this, and then see which works out. That would be like a typical multivariate kind of testing.

Determine what works out of all these concepts that you're trying, and then you iterate on that. That's where measuring anything you do, any kind of interaction you have with your user, and then using that as feedback to then inform the next interaction is what you have to be doing.

Gardner: And this is also a bit revolutionary when it comes to software development. It wasn't that long ago that the waterfall approach to development might leave years between iterations. Now, we're thinking about constantly updating, iterating, getting a feedback loop, and condensing the latency of that feedback loop so that we really can react as close to real-time as possible.

What is it about mobile apps that's allowed for a whole different approach to this notion of connectedness and feedback loops to an app audience?

Mobile apps are different

Rollins: This brings up a good point. A lot of people ask why we have a mobile app analytics company. Why did we do that? Why is typical web analytics not good enough? It kind of speaks to something that you're talking about. Mobile apps are a little bit different than the regular web, in the sense that you do have a cycle that you can push apps out on.

You release to, let’s say, the iPhone App Store. It might take a couple of weeks before your app goes out there. So you have to be really careful about what you're publishing, because your turnaround time is not that of the web. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

However, there are certain interactions you can have, like on the messaging side, where you have an ability to instantly go back and forth. Mobile apps are a different kind of market. It requires a little different understanding than the traditional approach.

... We consume the data in a real-time pipeline. We're not doing background batch processing that you might see in something like Hadoop. We're doing a lot of real-time pipeline stuff, such that you can see results within a minute or two of it being uploaded from a device. That's largely where HP Vertica comes in, and why we ended up using Vertica, because of its real-time nature. It’s about the scale.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Gardner: If I understand correctly, you have access to the data from all these devices, you are crunching that, and you're offering reports and services back to your customers. Do they look to you as also a platform provider or just a data-service provider? How do the actual hosting and support services for these marketing capabilities come about?

Rollins: We tend to cater more toward the high end. A lot of our customers are large app publishers that have an ongoing application, let’s say a shopping application or news application.

In that sense, when we bring people on board, oftentimes they tend to be larger companies that aren’t necessarily technically savvy yet about mobile, because it's still new for some people. We do offer a lot of onboarding services to make sure they integrate their application correctly, measure it correctly, and are looking at the right metrics for their industry, as compared to other apps in that industry.

Then, we keep that relationship open as they go along and as they see data. We iterate on that with them. Because of the newness of the industry it does require education.

Gardner: And where is HP Vertica running for you? Do you run it on your own data center? Are you using cloud? Is there a hybrid? Do you have some other model?

Running in the cloud

Rollins: We run it in the cloud. We are running on Amazon Web Services (AWS). We've thought a lot about whether we should run it in a separate data center, so that we can dictate the hardware, but presently we are running it in AWS.

Gardner: Let’s talk about what you can do when you do this correctly. Because you have a capacity to handle scale, you've developed speed, and you understand the requirements in the market, what are your customers getting from the ability to do all this?

Rollins: It really depends on the customer. Something like an e-commerce app is going to look heavily at things like where users are dropping off and what's preventing them from making that purchase.

Another application, like news, which I mentioned, will look at something different, usually something more along the lines of engagement. How long are they reading an article for? That matters to them, so that they can give those numbers to advertisers.

So the answer to that largely depends on who you are and what your app is. Something like an e-commerce app is going to look heavily at things like where users are dropping off and what's preventing them from making that purchase.
Something like an e-commerce app is going to look heavily at things like where users are dropping off and what's preventing them from making that purchase.

Gardner: I suppose another benefit of developing these insights, as specific and germane as they might be to each client, is the ability to draw different types of data in. Clearly, there's the data from the App Store and from the app itself, but if we could join that data with some other external datasets, we might be able to determine something more about why they drop-off or why they are spending more, or time doing certain things.

So is there an opportunity, and do you have any examples of where you've been able to go after more datasets and then be able to scale to that?

Rollins: This is something that's come up a lot recently. In the past year, we have our own products that we're launching in this space, but the idea of integrating different data types is really big right now.

You have all these different silos -- mobile, web, and even your internal server infrastructure. If you're a retail company that has a mobile app, you might even have physical stores. So you're trying to get all this data in some collective view of your customer.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
You want to know that Sally came to your store and purchased a particular kind of item. Then, you want to be able to know that in your mobile app. Maybe you have a loyalty card that you can tie across the media and then use that to engage with her meaningfully about stuff that might interest her in the mobile app as well.
"We noticed that you bought this a month ago. Maybe you need another one. Here is a coupon for it."

Other datasets

That's a big thing, and we're looking at a lot of different ways of doing that by bringing in other datasets that might not be from just a mobile app itself.

We're not even focused on mobile apps any more. We're really just an app analytics company, and that means the web and desktop. We ship in Windows, for example. We deal with a lot of Microsoft applications. Tying together all of that stuff is kind of the future. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Gardner: For those organizations that are embarking on more of a data-driven business model, that are looking for analytics and platforms and requirements, is there anything that you could offer in hindsight having traveled this path and worked with HP Vertica. What should they keep in mind when they're looking to move into a capability, maybe it's on-prem, maybe it's cloud. What advice could you offer them?

At scale, you have to know what each technology is good at, and how you bring together multiple technologies to accomplish what you want.
Rollins: The journey that we went through was with various platforms. At the end of day, be aware of what the vendor of the big-data platform is pitching, versus the reality of it.

A lot of times, prototyping is very easy, but actually going to large scale is fairly difficult. At scale, you have to know what each technology is good at, and how you bring together multiple technologies to accomplish what you want.

That means a lot of prototyping, a lot of stress testing and benchmarking. You really don’t know until you try it with a lot of these things. There are a lot of promises, but the reality might be different.

Gardner: Any thoughts about Vertica’s track record, given your length of experience?

Rollins: They're really good. I'm both impressed with the speed of it as compared to other things we have looked at, as well as the features that they release. Vertica 7 has a bunch of great stuff in it. Vertica 6, when it came out, had a bunch of great stuff in it. I'm pretty happy with it.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Tuesday, August 4, 2015

HP hyper-converged appliance delivers speedy VDI and apps deployment and a direct onramp to hybrid cloud

HP today announced the new HP ConvergedSystem 250-HC StoreVirtual (CS 250), a hyper-converged infrastructure appliance (HCIA) based on HP's new ProLiant Apollo 2000 server and HP StoreVirtual software-defined storage (SDS) technology.

Built on up-to-date HP, Intel, and VMware technologies, the CS 250 combines a virtual server and storage infrastructure that HP says is configurable in minutes for nearly half the price of competitive systems. It is designed for virtual desktops and remote office productivity, as well as  to provide a flexible path to hybrid cloud. [Disclosure: HP is a sponsor of BriefingsDirect.]

Designed to attract customers on a tight budget, the HP CS 250 includes a new three-node configuration that is up to 49 percent more cost effective than comparable configurations from Nutanix, SimpliVity and other competitors, says HP. Because HP's StoreVirtual runs in VMware, Microsoft Hyper-V and KVM virtual environments, the appliance may soon come to support all those hypervisors.

HP recently discontinued the EVO:RAIL version of its HCIA, which was based on the EVO:RAIL software from OEM partner VMware.

Increasingly, even small IT shops want to modernize and simplify how they support existing applications. They want virtualization benefits to extend to storage, backup and recovery, and be ready to implement and consume some cloud services. They want the benefits of software-defined data centers (SDDC), but they don’t want to invest huge amounts of time, money, and risk in a horizontal, pan-IT modernization approach.

That's why, according to IDC, businesses are looking for flexible infrastructure solutions that will allow them to quickly deploy and run new applications. This trend has resulted in a 116 percent year-over-year increase in hyper-converged systems sales and 60 percent compound annual growth rate (CAGR) anticipated through 2019.

The growth in the building blocks approach to IT infrastructure is building rapidly. IDC estimates that in 2015, $10.2 billion will be spent on converged systems, representing 11.4 percent of total IT infrastructure spending. This number will grow to $14.3 billion by 2018, representing 14.9 percent of total IT infrastructure spending, says IDC. Similarly, Technology Business Research, Inc. in Hampton, NH, estimates a $10.6 billion U.S. addressable market over the next 12 months, through mid-2016.

With HCIAs specifically, enterprises can begin making what amounts to mini-clouds based on their required workloads and use cases.  IT can quickly deliver the benefits of modern IT architectures without biting off the whole cloud model. Virtual desktops is a great place to begin, especially as Windows 10 is emerging on the scene.

Indeed, VDI deployments that support as many as 250 desktops on a single appliance at a remote office or agency, for example, allow for ease in administration and deployment on a small footprint while keeping costs clear and predictable. And, if the enterprise wants to scale up and out to hybrid cloud, they can do so with ease and low risk.

Multi-site continuity

The inclusion of three 4TB StoreVirtual Virtual Storage Appliance (VSA) licenses also allows the new HP CS 250 system to replicate data to any other HP StoreVirtual-based solution. This means that customers can leverage their existing infrastructure as a replication target at no additional cost, says HP. The CS 250 also allows customers to tailor the system with a choice of up to 96 processing cores, a mix of SSD and SAS disk drives, and up to 2TB of memory per 4-node appliance -- double that of previous generations.

The CS 250 arrives pre-configured for VMware's vSphere 5.5 or 6.0 and HP OneView InstantOn to enable customers to be production-ready with only 5 minutes of keyboard time and a total of 15 minutes deployment time, with daily management from VMware vCenter via the HP OneView for VMware vCenter plug-in, says HP.

HP sees the CS 250 as a oath to bigger things. For midsize and enterprise customers seeking an efficient and cost-effective cloud entry point, for example, the new HP Helion CloudSystem 9.0 built on the CS 250 provides a direct path to the hybrid cloud. This hyper-converged cloud solution leverages the clustered compute and storage resources of the CS 250 for on-premise workloads but adds self-service portal provisioning and public cloud bursting features for those moving beyond server virtualization.
HP announced that it is enhancing its “Nitro” partner program and opening it up to distributors worldwide, starting with Arrow Electronics in the US.

HP is also introducing new Software-Defined Storage Design and Integration services to help customers deploy highly scalable, elastic cloud storage services, the company announced today. The integration service provides customers with detailed configuration and implementation guidance tailored to their specific needs to accelerate time to value, said HP.

The 4-node CS 250-HC StoreVirtual is available on August 17, while 3-node configurations are available on September 28.  A sample solution price inclusive of the 3-node CS250 with Foundation Carepack and VMware vSphere Enterprise starts at a list price of $121,483, said HP.

You may also be interested in:

Thursday, July 30, 2015

Full 360 takes big data analysis cloud services to new business heights

The latest BriefingsDirect cloud innovation case study interview highlights how Full 360 uses big data and analytics to improve their applications support services for the financial industry -- and beyond.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.

To learn how Full 360 uses HP Vertica in the Amazon cloud to provide data warehouse and BI applications and services to its customers from Wall Street to the local airport, BriefingsDirect sat down with Eric Valenzuela, Director of Business Development at Full 360, based in New York. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us about Full 360.

Valenzuela: Full 360 is a consulting and services firm, and we purely focus on data warehousingbusiness intelligence (BI), and hosted solutions. We build and consult and then we do managed services for hosting those complex, sophisticated solutions in the cloud, in the Amazon cloud specifically.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Gardner: And why is cloud a big differentiator for this type of service in the financial sector?

Valenzuela: It’s not necessarily just for finance. It seems to be beneficial for any company that has a large initiative around data warehouse and BI. For us, specifically, the cloud is a platform that we can develop our scripts and processes around. That way, we can guarantee 100 percent that we're providing the same exact service to all of our customers.

Valenzuela
We have quite a bit of intellectual property (IP) that’s wrapped up inside our scripts and processes. The cloud platform itself is a good starting point for a lot of people, but it also has elasticity for those companies that continue to grow and add to their data warehousing and BI solutions. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Gardner: Eric, it sounds as if you've built your own platform as a service (PaaS) for your specific activities and development and analytics on top of a public cloud infrastructure. Is that fair to say?

Valenzuela: That’s a fair assumption.

Primary requirements

Gardner: So as you are doing this cloud-based analytic service, what is it that your customers are demanding of you? What are the primary requirements you fulfill for them with this technology and approach?

Valenzuela: With data warehousing being rather new, Vertica specifically, there is a lack of knowledge out there in terms of how to manage it, keep it up and running, tune it, analyze queries and make sure that they're returning information efficiently, that kind of thing. What we try to do is to supplement that lack of expertise.

Gardner: Leave the driving to us, more or less. You're the plumbers and you let them deal with the proper running water and other application-level intelligence?

Valenzuela: We're like an insurance policy. We do all the heavy lifting, the maintenance, and the management. We ensure that your solution is going to run the way that you expect it to run. We take the mundane out, and then give the companies the time to focus on building intelligent applications, as opposed to worrying about how to keep the thing up and running, tuned, and efficient.

Gardner: Given that Wall Street has been crunching numbers for an awfully long time, and I know that they have, in many ways, almost unlimited resources to go at things like BI -- what’s different now than say 5 or 10 years ago? Is there more of a benefit to speed and agility versus just raw power? How has the economics or dynamics of Wall Street analytics changed over the past few years?
We're like an insurance policy. We do all the heavy lifting, the maintenance, and the management.

Valenzuela: First, it’s definitely the level of data. Just 5 or 10 years ago, either you had disparate pieces of data or you didn’t have a whole lot of data. Now it seems like we are just managing massive amounts of data from different feeds, different sources. As that grows, there has to be a vehicle to carry all of that, where it’s limitless in a sense.

Early on, it was really just a lack of the volume that we have today. In addition to that, 8 or 10 years ago BI was still rather new in what it could actually do for a company in terms of making agile decisions and informed decisions, decisions with intent.

So fast forward, and it’s widely accepted and adopted now. It’s like the cloud. When cloud first came out, everybody was concerned about security. How are we going to get the data in there? How are we going to stand this thing up? How are we going to manage it? Those questions come up a lot less now than they did even two years ago. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Gardner: While you may have cut your teeth on Wall Street, you seem to be branching out into other verticals -- gaming, travel, logistics. What are some of the other areas now to which you're taking your services, your data warehouse, and your BI tools?

Following the trends

Valenzuela: It seems like we're following the trends. Recently it's been gaming. We have quite a few gaming customers that are just producing massive amounts of data.

There's also the airline industry. The customers that we have in airlines, now that they have a way to -- I hate this term -- slice and dice their data, are building really informed, intelligent applications to service their customers, customer appreciation. It’s built for that kind of thing. Airlines are now starting to see what their competition is doing. So they're getting on board and starting to build similar applications so they are not left behind.

Banking was pretty much the first to go full force and adopt BI as a basis for their practice. Finance has always been there. They've been doing it for quite a long time.

Gardner: So as the director of business development, I imagine you're out there saying, "We can do things that couldn’t have been done before at prices that weren’t available before." That must give you almost an unlimited addressable market. How do you know where to go next to sell this?
At first, we were doing a lot of education. Now, it’s just, "Yes, we can do this."

Valenzuela: It’s kind of an open field. From my perspective, I look at the different companies out there that come to me. At first, we were doing a lot of education. Now, it’s just, "Yes, we can do this," because these things are proven. We're not proving any concepts anymore. Everything has already been done, and we know that we can do it.

It is an open field, but we focus purely on the cloud. We expect all of our customers will be in the Amazon cloud. It seems that now I am teaching people a little bit more -- just because it’s cloud, it’s not magic. You still have to do a lot of work. It’s still an infrastructure.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
But we come from that approach and we make sure that the customer is properly aligned with the vision that this is not just a one- or two-month type commitment. We're not just going to build a solution, put it in our pocket, and walk away. We want to know that they're fully committed for 6-12 months.

Otherwise, you're not going to get the benefits of it. You're just going to spend the money and the effort, and you're not really going to get any benefits out of it if you're not going to be committed for the longer period of time. There still are some challenges with the sales and business development.

Gardner: Given this emphasis on selling the cloud model as much as the BI value, you needed to choose an analytics platform that was cloud-friendly and that was also Amazon AWS cloud-friendly. Tell me how Vertica and Amazon -- and your requirements -- came together.

Good timing

Valenzuela: I think it was purely a timing thing. Our CTO, Rohit Amarnath, attended a session at MIT, where Vertica was first announced. So he developed a relationship there.

This was right around the time when Amazon announced that they were offering its public cloud platform, EC2. So it made a lot of sense to look at the cloud as being a vision, looking at the cloud as a platform, looking at column databases as a future way of managing BI and analytics, and then putting the two together.

It was more or less a timing thing. Amazon was there. It was new technology, and we saw the future in that. Analytics was newly adopted. So now you have the column database that we can leverage as well. So blend the two together and start building some platform that hadn’t been done yet.
There are a lot of Vertica customers out there that are going to reach a limitation. That may require procuring more hardware, more IT staff. The cloud aspect removes all of that.

Gardner: What about lessons learned along the way? Are there some areas to avoid or places that you think are more valuable that people might appreciate? If someone were to begin a journey toward a combination of BI, cloud, and vertical industry tool function, what might you tell them to be wary of, or to double-down on?

Valenzuela: We forged our own way. We couldn’t learn from our competitors’ mistakes because we were the ones that were creating the mistakes. We had to to clear those up and learn from our own mistakes as we moved forward.

Gardner: So perhaps a lesson is to be bold and not to be confined by the old models of IT?

Valenzuela: Definitely that. Definitely thinking outside the box and seeing what the cloud can do, focus on forgetting about old IT and then looking at cloud as a new form of IT. Understanding what it cannot do as a basis, but really open up your mind and think about it as to what it can actually do, from an elasticity perspective.

There are a lot of Vertica customers out there that are going to reach a limitation. That may require procuring more hardware, more IT staff. The cloud aspect removes all of that.

Gardner: I suppose it allows you as a director of business development to go downstream. You can find smaller companies, medium-sized enterprises, and say, "Listen, you don’t have to build a data warehouse at your own expense. You can start doing BI based on a warehouse-as-a-service model, pay as you go, grow as you learn, and so forth."

Money concept

Valenzuela: Exactly. Small or large, those IT departments are spending that money anyway. They're spending it on servers. If they are on-premises, the cost of that server in the cloud should be equal or less. That’s the concept. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

If you're already spending the money, why not just migrate it and then partner with a firm like us that knows how to operate that. Then, we become your augmented experts, or that insurance policy, to make sure that those things are going to be running the way you want them to, as if it were your own IT department.

Gardner: What are the types of applications that people have been building and that you've been helping them with at Full 360? We're talking about not just financial, but enterprise performance management. What are the other kinds of BI apps? What are some of the killer apps that people have been using your services to do?
I don’t know how that could be driven if it weren’t for analytics and if it weren’t for technology like Vertica to be able to provide that information.

Valenzuela: Specifically, with one of our large airlines, it's customer appreciation. The level of detail on their customers that they're able to bring to the plane, to the flight attendants, in a handheld device is powerful. It’s powerful to the point where you remember that treatment that you got on the plane. So that’s one thing.

That’s something that you don’t get if you fly a lot, if you fly other airlines. That’s just kind of some detail and some treatment that you just don’t get. I don’t know how that could be driven if it weren’t for analytics and if it weren’t for technology like Vertica to be able to provide that information.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Tuesday, July 28, 2015

How big data technologies Hadoop and Vertica drive business results at Snagajob

The next BriefingsDirect analytics innovation case study interview explores how Snagajob in Richmond, Virginia – one of the largest hourly employment networks for job seekers and employers – uses big data to finally understand their systems' performance in action. The result is vast improvement in how they provide rapid and richer services to their customers.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.

Snagajob recently delivered 4 million new jobs applications in a single month through their systems. To learn how they're managing such impressive scale, BriefingsDirect sat down with Robert Fehrmann, Data Architect at Snagajob in Richmond, Virginia. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us about your jobs matching organization. You’ve been doing this successfully since 2000. Let's understand the role you play in the employment market.

Fehrmann: Snagajob, as you mentioned, is America's largest hourly network for employees and employers. The hourly market means we have, relatively speaking, high turnover.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Another aspect, in comparison to some of our competitors, is that we provide an inexpensive service. So our subscriptions are on the low end, compared to our competitors.

Gardner: Tell us how you use big data to improve your operations. I believe that among the first ways that you’ve done that is to try to better analyze your performance metrics. What were you facing as a problem when it came to performance? [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Signs of stress

Fehrmann: A couple of years ago, we started looking at our environment, and it became obvious that our traditional technology was showing some signs of stress. As you mentioned, we really have data at scale here. We have 20,000 to 25,000 postings per day, and we have about 700,000 unique visitors on a daily basis. So data is coming in very, very quickly.

Fehrmann
We also realized that we're sitting on a gold mine and we were able to ingest data pretty well. But we had problem getting information and innovation out of our big data lake.

Gardner: And of course, near real time is important. You want to catch degradation in any fashion from your systems right away. How do you then go about getting this in real time? How do you do the analysis?

Fehrmann: We started using Hadoop. I'll use a lot of technical terms here. From our website, we're getting events. Events are routed via Flume directly into Hadoop. We're collecting about 600 million key-value pairs on a daily basis. It's a massive amount of data, 25 gigabytes on a daily basis.

The second piece in this journey to big data was analyzing these events, and that’s where we're using HP Vertica. Second, our original use case was to analyze a funnel. A funnel is where people come to our site. They're searching for jobs, maybe by keyword, maybe by zip code. A subset of that is an interest in a job, and they click on a posting. A subset of that is applying for the job via an application. A subset is interest in an employer, and so on. We had never been able to analyze this funnel.

The dataset is about 300 to 400 million rows, and 30 to 40 gigabytes. We wanted to make this data available, not just to our internal users, but all external users. Therefore, we set ourselves a goal of a five-second response time. No query on this dataset should run for more than five seconds -- and Vertica and Hadoop gave us a solution for this.

Gardner: How have you been able to increase your performance reach your key performance indicators (KPIs) and service-level agreements (SLAs)? How has this benefited you?

Fehrmann: Another application that we were able to implement is a recommendation engine. A recommendation engine is that use where our jobseekers who apply for a specific job may not know about all the other jobs that are very similar to this job or that other people have applied to.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
We started analyzing the search results that we were getting and implemented a recommendation engine. Sometimes it’s very difficult to have real comparison between before and after. Here, we were able to see that we got an 11 percent increase in application flow. Application flow is how many applications a customer is getting from us. By implementing this recommendation engine, we saw an immediate 11 percent increase in application flow, one of our key metrics.

Gardner: So you took the success from your big-data implementation and analysis capabilities from this performance task to some other areas. Are there other business areas, search yield, for example, where you can apply this to get other benefits?

Brand-new applications

Fehrmann: When we started, we had the idea that we were looking for a solution for migrating our existing environment, to a better-performing new environment. But what we've seen is that most of the applications we've developed so far are brand-new applications that we hadn't been able to do before.

You mentioned search yield. Search yield is a very interesting aspect. It’s a massive dataset. It's about 2.5 billion rows and about 100 gigabytes of data as of right now and it's continuously increasing. So for all of the applications, as well as all of the search requests that we have collected since we have started this environment, we're able to analyze the search yield.
Most of the applications we've developed so far are brand-new applications that we hadn't been able to do before.

For example, that's how many applications we get for a specific search keyword in real time. By real time, I mean that somebody can run a query against this massive dataset and gets result in a couple of seconds. We can analyze specific jobs in specific areas, specific keywords that are searched in a specific time period or in a specific location of the country.

Gardner: And once again, now that you've been able to do something you couldn't do before, what have been the results? How has that impacted change your business? [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Fehrmann: It really allows our salespeople to provide great information during the prospecting phase. If we're prospecting with a new client, we can tell him very specifically that if they're in this industry, in this area, they can expect an application flow, depending on how big the company is, of let’s say in a hundred applications per day.

Gardner: How has this been a benefit to your end users, those people seeking jobs and those people seeking to fill jobs?

Fehrmann: There are certainly some jobs that people are more interested in than others. On the flip side, if a particular job gets a 100 or 500 applications, it's just a fact that only a small number going to get that particular job. Now if you apply for a job that isn't as interesting, you have much, much higher probability of getting the job.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in: