Tuesday, November 3, 2015

Big data generates new insights into what’s happening in the world's tropical ecosystems

The next BriefingsDirect big-data innovation case study interview explores how large-scale monitoring of rainforest biodiversity and climate has been enabled and accelerated by cutting-edge big-data capture, retrieval, and analysis.

We'll learn how quantitative analysis and modeling are generating new insights into what’s happening in tropical ecosystems worldwide, and we'll hear how such insights are leading to better ways to attain and verify sustainable development and preservation methods and techniques.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To learn more about data science -- and how hosting that data science in the cloud -- helps the study of biodiversity, we're pleased to welcome Eric Fegraus, Senior Director of Technology of the TEAM Network at Conservation International and Jorge Ahumada, Executive Director of the TEAM Network, also at Conservation International in Arlington, Virginia. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Knowing what’s going on in environments in the tropics helps us understand what to do and what not to do to preserve them. How has that changed? We spoke about a year ago, Eric. Are there any trends or driving influences that have made this data gathering more important than ever.

Fegraus: Over this last year, we’ve been able to roll out our analytic systems across the TEAM Network. We're having more-and-more uptake with our protected-area managers using the system and we have some good examples where the results are being used.

Fegraus
For example, in Uganda, we noticed that a particular cat species was trending downward. The folks there were really curious why this was happening. At first, they were excited that there was this cat species, which was previously not known to be there.

This particular forest is a gorilla reserve, and one of the main economic drivers around the reserve is ecotourism, people paying to go see the gorillas. Once they saw that these cats are going down, they started asking what could be impacting this. Our system told them that the way they were bringing in the eco-tourists to see the gorillas had shifted and that was potentially having an impact of where the cats were. It allowed them to readjust and think about their practices to bring in the tourists to the gorillas.

Information at work

Gardner: Information at work.

Fegraus: Information at work at the protected-area level.

Gardner: Just to be clear for our audience, the TEAM Network stands for the Tropical Ecology Assessment and Monitoring. Jorge, tell us a little bit about how that came about, the TEAM Network and what it encompasses worldwide?
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Ahumada: The TEAM Network was a program that started about 12 years ago and it was started to fill a void in the information we have from tropical forests. Tropical forests cover a little bit less than 10 percent of the terrestrial area in the world, but they have more than 50 percent of the biodiversity.

Ahumda
So they're the critical places to be conserved from that point of view, despite the fact we didn’t have any information about what's happening in these places. That’s how the TEAM Network was born, and the model was to use data collection methods that were standardized, that were replicated across a number of sites, and have systems that would store and analyze that data and make it useful. That was the main motivation.

Gardner: Of course, it’s super-important to be able to collect and retrieve and put that data into a place where it can be analyzed. It’s also, of course, important then to be able to share that analysis. Eric, tell us what's been happening lately that has led to the ability for all of those parts of a data lifecycle to really come to fruition?

Fegraus: Earlier this year, we completed our end-to-end system. We're able to take the data from the field, from the camera traps, from the climate stations, and bring it into our central repository. We then push the data into Vertica, which is used for the analytics. Then, we developed a really nice front-end dashboard that shows the results of species populations in all the protected areas where we work.

The analytical process also starts to identify what could be impacting the trends that we're seeing at a per-species level. This dashboard also lets the user look at the data in a lot of different ways. They can aggregate it and they can slice and dice it in different ways to look at different trends.

Gardner: Jorge, what sort of technologies are they using for that slicing and dicing? Are you seeing certain tools like Distributed R or visualization software and business-intelligence (BI) packages? What's the common thread or is it varied greatly?

Ahumada: It depends on the analysis, but we're really at the forefront of analytics in terms of big data. As Michael Stonebraker and other big data thinkers have said, the big-data analytics infrastructure has concentrated on the storage of big data, but not so much on the analytics. We break that mold because we're doing very, very sophisticated Bayesian analytics with this data.

One of the problems of working with camera-trap data is that you have to separate the detection process from the actual trend that you're seeing because you do have a detection process that has error.

Hierarchical models

We do that with hierarchical models, and it's a fairly complicated model. Just using that kind of model, a normal computer will take days and months. With the power of Vertica and power of processing, we’ve been able to shrink that to a few hours. We can run 500 or 600 species from 13 sites, all over the world in five hours. So it’s a really good way to use the power of processing.

We’d been also more recently working with Distributed R, a new package that was written by HP folks at Vertica, to analyze satellite images, because we're also interested in what’s happening at these sites in terms of forest loss. Satellite images are really complicated, because you have millions of pixels and you don’t really know what each pixel is. Is it forest, agricultural land, or a house? So running that on normal R, it's kind of a problem.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Distributed R is a package that actually takes some of those functions, like random forest and regression trees, and takes full power of the vertical processing of Vertica. So we’ve seen a 10-fold increase in performance with that, and it allows us to get much more information out of those images.

Gardner: Not only are you on the cutting-edge for the analytics, you've also moved to the bleeding edge on infrastructure and distribution mechanisms. Eric, tell us a little bit about your use of cloud and hybrid cloud?

Fegraus: To back up a little bit, we ended up building a system that uses Vertica. It’s an on-premise solution and that's what we're using in the TEAM Network. We've since realized that this solution we built for the TEAM Network can also be readily scalable to other organizations and government agencies, etc., different people that want to manage camera trap data, they want to do the analytics.

So now, we're at a process where we’ve been essentially doing software development and producing software that’s scalable. If an organization wants to replicate what we’re doing, we have a solution that we can spin up in the cloud that has all of the data management, the analytics, the data transformations and processing, the collection, and all the data quality controls, all built into a software instance that could be spun up in the cloud.
In many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data.

Gardner: And when you say “in the cloud,” are you talking about a specific public cloud, in a specific country or all the above, some of the above?

Fegraus: All of the above. We'll be using Vertica or we're using Vertica OnDemand. We're actually going to transition our existing on-premise solution into Vertica OnDemand. The solution we’re developing uses mostly open-source software and it can be replicated in the Amazon cloud or other clouds that have the right environments where we can get things up and running.

Gardner: Jorge, how important is that to have that global choice for cloud deployment and attract users and also keep your cost limited?

Ahumada: It’s really key, because in many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data. As Eric was saying, the big limitation here is which cloud solutions are available in each country. Right now, we have something with cloud OnDemand here, but in some of the countries, we might not have the same infrastructure. So we'll have to contract different vendors or whatever.

But it's a way to keep cost down, deliver the information really quick, and store the data in a way that is safe and secure.

What's next?

Gardner: Eric, now that we have this ability to retrieve, gather, analyze, and now distribute, what comes next in terms of having these organizations work together? Do we have any indicators of what the results might be in the field? How can we measure the effectiveness at the endpoint -- that is to say, in these environments based on what you have been able to accomplish technically?

Fegraus: One of the nice things about the software that we built that can run in the various cloud environments, is that it can also be connected. For example, if we start putting these solutions in a particular continent, and there are countries that are doing this next to each other, there are not going to be silos that will be unable to share an aggregated level of data across each other so that we can get a holistic picture of what's happening.

So that was very important when we started going down this process, because one of the big inhibitors for growth within the environmental sciences is that there are these traditional silos of data that people in organizations keep and sit on and essentially don't share. That was a very important driver for us as we were going down this path of building software.

Gardner: Jorge, what comes next in terms of technology. Are the scale issues something you need to hurdle to get across? Are there analytics issues? What's the next requirements phase that you would like to work through technically to make this even more impactful?

Ahumada: As we scale up in size and  start  having more granularity in the countries where we work, the challenge is going to be keeping these systems responsive and information coming. Right now, one of the big limitations is the analytics. We do have analytics running at top speeds, but once we started talking about countries, we're going to have an the order of many more species and many more protected areas to monitor.
This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data.

This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data. We're looking forward to keep working with our technology partners, and in particular HP, to help them guide this process. As a case study, we're very well-positioned for that, because we already have that challenge.

Gardner: Also it appears to me that you are a harbinger, a bellwether, for the Internet of Things (IoT). Much of your data is coming from monitoring, sensors, devices, and cameras. It's in the form of images and raw data. Any thoughts about what others who are thinking about the impact of the IoT should consider, now that you have been there?

Fegraus: When we talk about big data, we're talking about data collected from phones, cars, and human devices. Humans are delivering the data. But here we have a different problem. We're talking about nature delivering the data and we don't have that infrastructure in places like Uganda, Zimbabwe, or Brazil.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
So we have to start by building that infrastructure and we have the camera traps as an example of that. We need to be able to deploy much more, much larger-scale infrastructure to collect data and diversify the sensors that we currently have, so that we can gather sound data, image data, temperature, and environmental data in a much larger scale.

Satellites can only take us some part of the way, because we're always going to have problems with resolution. So it's really deployment on the ground which is going to be a big limitation, and it's a big field that is developing now.

Gardner: Drones?

Fegraus: Drones, for example, have that capacity, especially small drones that are showing to be intelligent, to be able to collect a lot of information autonomously. This is at the cutting edge right now of technological development, and we're excited about it.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:

Thursday, October 29, 2015

DevOps and security, a match made in heaven

This next BriefingsDirect DevOps thought leadership discussion explores the impact of improved development on security and how those investing in DevOps models specifically can expect to improve their security, compliance, and risk-mitigation outcomes.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To help better understand the relationship between DevOps and security, we're joined by two panelists: Gene Kim, DevOps researcher and author focused on IT operations, information security and transformation (his most recent book, The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win, will soon be followed by The DevOps Cookbook), and Ashish Kuthiala, Senior Director of Marketing and Strategy for Hewlett Packard Enterprise (HP) DevOps. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Coordinating and fostering increased collaboration between development, testers, and IT operations has a lot of benefits. We've been talking about that in a number of these discussions, but security specifically. How specifically is DevOps engendering safer code and improved security?

Kuthiala: Dana, I look at security as no different than any other testing that you do on your code. Anything that you catch early-on in the process, fix it, and close the vulnerabilities is much simpler, much easier, and much cheaper to fix than when the end-product is in the hands of the users.
Learn the Four Keys
to Continuous DevOps
At that point, it could be in the hands of thousands of users, deployed in thousands of environments, and it's really very expensive. Even if you want to fix it there, if some trouble happens, if there is security breach, you're not just dealing with the code vulnerability, but you are also dealing with loss of brand, loss of revenue, and loss of reputation in the marketplace. Gene has done a lot of study on security and DevOps. I would love to hear his point of view on that.

Promise is phenomenal

Kim: You're so right. The promise of DevOps for advancing the information security objective is phenomenal, but unfortunately, the way most information security practitioners react to DevOps is one of moral outrage and fear. The fear being verbalized is that Dev and Ops are deploying more quickly than ever, and the outcomes haven't been so great. You're doing one release a year, what will happen if they are doing 10 deploys a day? [See a recent interview with Gene from the DevOps Enterprise Summit.]

Kim
We can understand why they might be just terrified of this. Yet, what Ashish described is that DevOps represents the ideal integration of testing into the the daily work of Dev and Ops. We have testing happening all the time. Developers own the responsibilities of building and running the test. It’s happening after every code commit, and these are exactly same sort of behaviors and cultural norms that we want in information security. After all, security is just another aspect of quality.

We're seeing many, many examples of how organizations are creating what some people calling DevOps(Sec), that is DevOps plus security. One of my favorite examples is Capital One, which calls DevOps in their organization DevOps(Sec). Basically, information security is being integrated into every stage of the software development lifecycle. This is actually what every information security practitioner has wanted for the last two decades.

Kuthiala
Kuthiala: Gene, that brings up an interesting thought. As we look at Dev and Ops teams coming together without security, increasingly we talk about how people need to have generally more skills across the spectrum. Developers need to understand production systems and to be able to support their code in production. But what you just described, does that mean that’s how the developers and planners start to become security specialist or think like that? What have you seen?

Kim: Let's talk about the numbers for a second. I love this ratio of 100 to 10 to 1. For every 100 developers, we have 10 operations people, and you have one security person. So there's no way you're going to get the adequate coverage, right? There are not enough security people around. If we can't embed Ops people into these project or service teams, then we have to train developers to care and know when seek help from the Ops experts.

We have the similar challenge in information security -- how we train, whether it's about secure coding, regular compliance, or how we create evidence that controls exist and are effective. It is not going to be security doing the work. Instead, security needs to be training Dev and Ops on how to do things securely.

Kuthiala: Are there patterns that they should be looking at in security? Are there any known patterns out there or are there some being developed? What you have seen with the customers that you work with?

Kim: In the deployment pipeline, instead of having just unit tests being run after every code commit, you actually run static code analysis tools. That way you know that it's functionally correct, and the developers are getting fast feedback and then they’re writing things that are potentially more secure than they would have otherwise.

And then alongside that in production, there are the monitoring tools. You're running things like the dynamic security testing. Now, you can actually see how it’s behaving in the production environment. In my mind, that's the ideal embodiment of how information security work should be integrated into the daily work of dev, test, and operations.

Seems contradictory

Kuthiala: It seems a little contradictory in nature. I know DevOps is all about going a little faster, but actually, you’re adding more functionality right up front and slowing this down. Is it a classic case of going slower to go faster? Walk before you can run, until you get to crawl? From my point of view, it slows you down here, but toward the end, you speed up more. Are you able to do this?

Kim: I would claim the opposite. We're getting the best of all worlds, because the security testing is now automated. It’s being done on demand by the developers, as opposed to your opening a ticket, "Gene, can you scan my application?" And I'll get back to you in about six weeks.
Learn the Four Keys
to Continuous DevOps
That’s being done automatically as part of my daily work. My claim would be not only is it faster, but we'll get better coverage than we had before. The fearful info sector person would ask how we can do this for highly regulated environments, where there is a lot of compliance regimes in place.

If you were to count the number of controls that are continuously operating, not only do you have orders and managing more controls, but they are actually operating all the time as opposed to testing once a year.

Kuthiala: From what I've observed with my customers, I have two kind of separate questions here. First, if you look at some of the highly regulated industries, for example, the pharmaceutical industry, it's not just internal compliance and regulations. It's part of security, but they often have to go to the outside agencies for almost physical paperwork kind of regulatory compliance checks.
Not only can you be compliant with all the relevant laws, contractual obligations, and regulations, but you can significantly decrease the amount of work.

As they're trying to go toward DevOps and speed this up, they are saying, "How do we handle that portion of the compliance checks and the security checks, because they are manual checks? They're not automated. How do we deal with external agencies and incorporate this in? What have you seen work really well?

Kim: Last year, at the DevOps Enterprise Summit, we had one bank, and it was a smaller bank. This year, we have five including some of the most well-known banks in the industry. We had manufacturing. I think we had coverage of almost every major industry vertical, the majority of which are heavily regulated. They are all able to demonstrate that not only can you be compliant with all the relevant laws, contractual obligations, and regulations, but you can significantly decrease the amount of work.

One of my favorite examples came from Salesforce. Selling to the Federal government, they had to apply with FedRAMP. One of the things that they got agreement on from security, compliance groups, and change management was that all infrastructure changes made through the automation tools could be considered a standard change.

In other words, they wouldn’t require review and approval, but all changes that were done manually would still require approvals, which would often take weeks. This really shows that we can create this fast path not just for the people doing the work, but also, this make some work significantly easier for security and compliance as well.

Human error

Kuthiala: And you're taking on the human error possibility in there. People can be on vacation, slowing things down. People can be sick. People may not be in their jobs anymore. Automation is a key answer to this, as you said. [More insights from HP from the DevOps Enterprise Summit.]

Gardner: One of things we've been grappling with in the industry is how to get DevOps accelerated into cultures and organizations. What about the security as a point on the arrow here? If we see and recognize that security can benefit from DevOps and we want to instantiate DevOps models faster, wouldn’t the security people be a good place to be on the evangelistic side of DevOps?

Kim: That’s a great observation, Dana. In fact, I think part of the method behind the madness is that the goal of the DevOps Enterprise Summit is to prove points. We have 50 speakers all from large, complex organizations. The goal is to get coverage of the industry verticals.
Learn the Four Keys
to Continuous DevOps
I also helped co-host a one-day DevOps Security Conference at the RSA Conference, and this was very much from a security perspective. It was amazing to find those champions in the security community who are driving DevOps objectives. They have to figure out how security fits into the DevOps ecosystem, because we need them to show that the water is not only just safe, but the water is great.

Kuthiala: This brings up a question, Gene. For any new project that kicks off, it’s a new company. You can really define the architecture from scratch, thus enabling you a lot of practices you need to put in place, whether it's independent deliverables and faster deliverables, all acting independent of each other.

But for the bigger companies and enterprise software that’s being released -- we've discussed this in our past talks -- you need to look at the architecture underneath it and see how we can modernize this to do this.
Just as marketing is too important to leave to the marketing people, and quality is too important to leave to the QA people -- so too security is too important to leave just to the security people.

So, when you start to address security, how do you go about approaching that, because you know you're dealing with a large base of code that’s very monolithic? It can take thousands of people to release something out to the customers. Now, you're trying to incorporate security into this with any new features and functions you add.

I can see how you can start to incorporate security and the expertise into it and scan it right from development cycle. How do you deal with that big component of the architecture that’s already there? Any best practices?

Kim: One of the people who have best articulated the philosophy is Gary Gruver. He said something that, for me, was very memorable. If you don’t have automated testing, and I think his context was very much like unit testing, automated regression testing, you have a fundamentally broken cost model, and it becomes too expensive. You get to a point where it becomes too expensive to add features.

That’s not even counting security testing. You get to a point where not only it is too expensive, but it becomes too risky to change code.

We have to fully empower developers to get feedback on their work and have them fully responsible for not just the features, but the non-functional requirements, testability, deployability, manageability, and security.

A better way

Gardner: Assume that those listening and reading here are completely swayed by our view of things and they do want to have DevOps with security ingrained. Are there not also concurrent developments around big data and analytics that give them a better way to do this, once they've decided to do it.

It seems to me that there is an awful lot of data available within systems, whether it's log files, configuration databases. Starting to harness that affordably, and then applying that back to those automation capabilities is going to be a very powerful synergistic value. How does it work when we apply big data to DevOps and security, Ashish?

Kuthiala: Good question Dana. You're absolutely right with data sources now becoming easy, bringing together data sources into one repository and at an affordable cost. We're starting to build analytics on top of that and this has being applied in a number of areas.
We're finding that we're about 80 to 85 percent accurate in predicting what to test and not to test and what features are reflected or not.

The best example I can talk about is how HP has been working on an IP creation of the area of testing using big data analytics. So, if we have to go faster and we have to release software every hour or every two, versus every six to eight months, you need to test it as fast as well. You can no longer afford to go and run your 20,000 tests based on this one-line change of code.

You have to be able to figure out what modules are affected, which ones are not, and which ones are likely to break. We're starting to do some intelligent testing inside of our labs and we're finding that we're about 80 to 85 percent accurate in predicting what to test and not to test and what features are reflected or not.

Similarly, using the big data analytics and the security expertise that Gene talked about, you need to start digging through and analyzing exactly the same as we run any test. What security vulnerabilities do you want to test, which functions of the code? And it’s just a best practice moving forward that you start to incorporate the big data analytics into your security testing.

Kim: You were implying something that I just want to make explicit. One of the most provocative notions that Ashish and I talked about was to think about all the telemetry and all the data that the build mechanisms create. You start putting in all the results of testing, and suddenly we have a much better basis of where we apply our testing effort.
Learn the Four Keys
to Continuous DevOps
If we actually need to deploy faster, even if we completely automate our tests, and even if we parallelize them and run them across thousands of servers and if that takes days, we may be able use data to tell us where to surgically apply testing so we make a informed decision on whether to deploy or not. That's an awesome potential.

Gardner: Speaking of awesome potentials, when we compress the feedback loops using this data -- when development and operations are collaborating and communicating very well -- it seems to me that we're also moving from a reactive stance to security issues to a proactive stance.

One of the notions about security is that you can’t prevent people from getting in, but you can limit the damage they can do when they do get in. It seems to me that if you close a loop between development operations and test, you can get the right remediation out into operations and production much quicker. Therefore you can almost behave as we had seen with anti-malware software -- where the cycle between the inception of a problem, the creation of the patch, and then deployment of the patch was very, very short.

Is that vision pie in the sky or is that something we could get to when DevOps and security come together, Gene?

Key to prevention

Kim: You're right on. The way an auditor would talk about it is that there are things that we can do to prevent: that’s code review, that’s automated code testing and scanning.

Making libraries available so that developers are choosing things and deploying them in a secured state are all preventive controls. If we can make sure that we have the best situational awareness we can of the production environment, those are what allow quicker detection recovery.

The better we are at that, the better we are at mitigating, effectively mitigating risk.

Kuthiala: Gene, as you were talking, I was thinking. We have this notion of rolling back code when something breaks in production, and that’s a very common kind of procedure. You go back into the lab, fix what didn’t work, and then you roll it back into production. If it works, it's fine. Otherwise, you roll it back and do it over again.

But with the advent of DevOps and those who are doing this successfully, there are no roll backs. They roll forward. You just go forward, because with the discipline of DevOps, if done well, you can quickly put a patch into production within hours, versus months, days, and weeks.
The more you talk about IoT, the more holes are open for hackers to get in.

And similarly like you talked about security, you know once a vulnerability is out there that you want to go fix it, you want to issue the patch. With DevOps and security, there are lot of similarities.

Gardner: Before we close out, is there anything more for the future? We've heard a lot about the Internet of Things (IoT), a lot more devices, device types, networks, extended networks, and variable networks. Is there a benefit with DevOps and security as a tag team, as we look to an increased era of complexity around the IoT sensors and plethora of disparate networks? Ashish?

Kuthiala: The more you talk about IoT, the more holes are open for hackers to get in. I'll give you classic example. I've been looking forward to the day where my phone is all I carry. I don’t have to open my car with my keys or I can pay for things with it, and we have been getting toward that vision, but a lot of my friends who are in high-tech are actually skeptical.
Learn the Four Keys
to Continuous DevOps
What happens if you lose your phone? Somebody has access to it. You know their counter argument against that. You can switch off your phone and wipe the data etc. But I think as IoT grows in number, more holes open up. So, it becomes even more important to incorporate your security planning cycles right into the planning and software development cycles.

Gardner: Particularly if you're in an industry where you expect to an have an Internet of Things ramp-up, getting automation in place, thinking about DevOps, thinking about security as an integral part of DevOps -- it all certainly makes a great deal of sense to me.

Kim: Absolutely, you said it better than I ever could. Yes.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:

Thursday, October 15, 2015

How Sprint employs orchestration and automation to bring IT into DevOps readiness

The next BriefingsDirect DevOps innovation case study explores how telecommunications giant Sprint places an emphasis on orchestration and automation to bring IT culture and infrastructure into readiness for cloud, software-defined data center (SDDC) and DevOps.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

Learn how Sprint has made IT infrastructure orchestration and automation a pillar of its strategic IT architecture future from Chris Saunderson, Program Manager and Lead Architect for Data Center Automation at Sprint in Kansas City, Missouri. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: I'm intrigued by your emphasis on working toward IT infrastructure, of getting to more automation at a strategic level. Tell us why you think automation and orchestrations are of strategic benefit to IT.
IT automation is an urgent priority
Automation and orchestration can pay huge dividends
Read the report
Saunderson: We've been doing automation since 2011, but it came out of an appreciation that the velocity of change inside that data center is just going to increase over time.

In 2009, my boss and I sat down and said, "Look, this is going nowhere. We're not going to make a significant enough impact on the way that the IT division works ... if we just keep doing the same thing."

Saunderson
That’s when we sat down and saw the orchestrated data center coming. I encapsulated it as the "data center of things." When you look at the journey that most enterprises go through, right around 2009 is when the data center of things emerged. You then began to lose track of where things are, what they are, who uses them, how long they’ve been there, and what their state is.

When we looked at automation and orchestration in 2009, it was very solidly focused on IT operational efficiency, but we had the longer-term view that it that it was going to be foundational for the way to do things going forward -- if for nothing else than to handle the data center of things. We could also see changes coming in the way that our IT organization was going to have to respond to the change in our business, let alone just the technology change.

Gardner: So that orchestration has allowed you to not only solve the problems of the day, but put a foundation in place for the new problems, rapid change, cloud, mobile, all of these things that are happening. Before we go back to the foundational aspects, tell us a little bit about Sprint itself, and why your business is changing.

Provider of choice

Saunderson: The Sprint of today ... We're number three, with aspirations to be bigger, better, and faster, and the provider of choice in wireless, voice and data. We're a tier 1 network service provider of global IP network along with private network, MPLS backbone, all that kind of stuff. We're a leader in TRS -- Telecommunication Relay Services for the deaf.

The Sprint of old is turning into the Sprint of new, where we look at mobile and we say mobile is it, mobile is everything -- that Internet of Things (IoT). That's what we want to foster growth. I see an exciting company that’s coming in terms of connecting people not only to each other, but to their partners, the people who supply services to them, to their entertainment, to their business. That’s what we do.

Gardner: When you started that journey for automation -- getting out of those manual processes and managing complexity, but repeatedly getting the manual labor out of the way -- what have you learned that you might relate to other people? What are some of the first things people should keep in mind as they embark on this journey?

Saunderson: It’s really a two-part answer. Orchestration comes after automation, because orchestration is there to consume the new automation services. So let’s take that one first. The big things to remember is that change is hard for people. Not technology change. People are very good about doing technology change, but unwiring people’s brains is a problem, and you have to acknowledge that up-front. You’re going to have a significant amount of resistance from people to change the way that they're used to doing things.
Orchestration comes after automation, because orchestration is there to consume the new automation services.

Now addressing that is also a human problem, but in a certain sense, the technology helps because you're able to say things like, "Let's just look at the result and let's compare what it takes to get to the result. Was it the humans doing it, and what does it take to get to the result with the machines doing it?" Let’s just call it what it is. It’s machines doing things. If the result is the same, then it doesn't require the humans. That’s challenge number one, unwiring people’s minds.

The second is making sure that you are articulating the relevance of what you’re doing. We had an inbuilt advantage, at least in the automation space, of having some external forces that were driving us to do this.

It’s really regulatory compliance, right? Sarbanes-Oxley (SOX) is what it is. PCI is what it is --  SAS70, FISMA, those things. We had to recognize the excessive amount of labor that we were expending to try and keep up with regulatory change.

PCI changes every year or 18 months. So it's just going through every rule set and saying, "Yes, this doesn’t apply to me; I'm more restricted." That takes six people. We were able to turn that. We were able to address the requirement to continue to do compliance more effectively and more efficiently. Don’t lose that upward communication, the relevancy thing -- which is not only are we doing this more efficiently, but we are better at it?

When you get to orchestration, now you’re really talking about some interesting stuff because this is where you begin to talk about being able to do continuous compliance, for example. That says, "Okay, we used to treat this activity as once a quarter or maybe once a month. Let's just do it all the time, but don’t even have a human involved in it." Anybody who has talked to me about this will hear this over and over again. I want smart people working on smart things. I do not want smart people working on dumb things. Truth be told, 99 percent of the things that IT people do are dumb things.

Orchestration benefits

The problem with them is that they're dumb because they force a human to look at the thing and make a decision. Orchestration allows you take that one level out, look at the thing, and figure out how to make that decision without a human having to make it. Then, tie that to your policy, then report on policy compliance, and you're done.

The moment you do that, you’re freeing people up to go have the harder discussions. This is where we start to talk about DevOps and this is where we start to talk about some of the bigger blocks that grind against each other in the IT world.

Gardner: "Continuous" is very interesting. You use the PCI compliance issue, but it's also very important when it comes to applications, software development, test, and deploy. Is there anything that you can explain for us about the orchestration and automation that lends itself to that continuous delivery of applications? People might not put the two together, but I'm pretty sure there's a connection here.
IT automation is an urgent priority
Automation and orchestration can pay huge dividends
Read the report
Saunderson: There is. DevOps is a philosophy. There was a fantastic discussion from Adobe where it was very clear that DevOps is a philosophy, an organizational discussion. It’s not necessarily a technology discussion. The thing that I would say, though, is that you can apply continuous everywhere.

The successes that we're having in that orchestration layer is that it's a much easier discussion to go in and say, "You know how we do this over here? Well, what if it was a release candidate code?" The real trick there, when you go back to the things that I want people to think about, is that DevOps is a philosophy, because it requires development and operations to work together, not one hand off to the other, and not one superior to the other; it’s together.

If they’re not willing to walk down the same path together, then you have an organizational problem, but you may also have a toolset problem as well. We're an Application Lifecycle Manager (ALM) shop. We have it there. Does it cover all of our applications? No. Are we getting all of the value out of it that we could? No.

But that’s because we're spending time in getting ready to do things like connect ALM into the running environment. The bigger problem, Dana, is that the organization has to be ready for it, because your philosophical changes are way more difficult than technical changes. Continuous means everything else has to be continuous along with it.

If you're in the ITIL model, you’re still going to need configuration items (CIs). How do CIs translate to Docker containers? Do they need to be described in the same way? If the operations team isn't necessarily as involved in the management of continuously deployed applications, who do I route a ticket to and how do they fix it?

This is where I look at it and say that this is the opportunity for orchestration to sit underneath that and say it not only has the capability to enable people to deploy continuously -- whether it’s into test or production, disaster recovery, or any other environment.

To equip them to be able to operate the continuous operation (that’s coming after the continuous integration and development and deployment), that has to be welded on because you’re going to enforce dis-synergy if you don’t address it all at the same time as you do with integration and deployment.

Gardner: Let’s look at some other values that you can derive from better orchestration and automation. I'm thinking about managing complexity, managing scale, but also more of the software-defined variety. We are seeing a lot of software-defined storage (SDS), software-defined networking (SDN), ultimately software-defined data center (SDDC), all of which is abstracted and managed. How do you find the path to SDDC, vis-à-vis better orchestration and automation?

At the core

Saunderson: Orchestration is going to have to be at the core of that. If you look at the product offerings just across the space, you’re starting to see orchestration pop up in every last one of them -- simply because there's no other way to do it.

RESTFul APIs are nice, but it’s not enough because, at that point, you’re asking customers to start bolting things together themselves, as opposed to saying, "I'm going to give you a nice orchestrated interface, where I have a predefined set of actions that are going be executed when you poll that orchestration to make it work and then apply that across the gamut."

SDS is coming after SDN. Don’t misunderstand me. We're not even at the point of deploying software defined networks, but we look at it and we say, "I have to have that, if for no other reason than I need to remove the human hands out of the delivery chain for things that touch the network."
We should never lose sight of the fact that the whole reason to do this is to say, "Deploy the thing."

I go back to the data center of things. The moment you go to 10Gbit, where you are using virtual context, just anything that’s in the current lexicon of new networking as opposed to VLANs, versus all that stuff, switchboards, etc., you’re absolutely losing visibility.

Without orchestration, and, behind that, without the analytics to look at what's happening in the orchestration that’s touching the elements in your data center, you’re going to be blind. Now, we’re starting to talk about going back to the dark ages. I think we're smarter than that.

By looking at orchestration as the enabler for all of that, you start to get better capability to deliver that visibility that you’re after, as well as the efficiency. We should never lose sight of the fact that the whole reason to do this is to say, "Deploy the thing."

That’s fine, but how do I run it, how do I assure it, how do I find it? This keeps coming up over and over. Eventually, you’re going to have to do something to that thing, whether it’s deployed again, whether you have some kind of security event that is attached to it, or the business just decides not to do it any more. Then, I have to find it and do something to it.

Gardner: Given your deep knowledge and understanding of orchestration and automation, what would you like to see done better for the tools that are provided to you to do this?

Is there a top-three list of things you’d like to see that would help you extend the value of your orchestration and automation, do things like software-defined, do things like DevOps as a philosophy, ultimately to be have more of a data-driven IT of strategic operation?

Development shop

Saunderson: I'm not sure I have a top three. I can certainly talk about generic principal stuff, which is, I want open. That’s what I really want. Just to take the sideline for a second, it’s fascinating. It’s just absolutely fascinating. IT operations is starting to become a software development shop now.

I'm not resistant to that in the least because, just in this conversation, we've been talking about RESTFul APIs and we were talking about orchestration. None of this is IT operations stuff. This isn’t electrons flowing through copper anymore. It’s business process translated into a set of actions, open, and interoperable.

Then, just give me rich data about those things, very rich data. We’re getting to that point, just by the shear evolution of big data, that it doesn’t matter anymore. Just give it all to me, and I will filter it out to what I'm looking for.

Gardner: The thing that is interesting with Hewlett Packard Enterprise (HPE) is that they do have a big-data capability, as well as a leading operations capability and they're starting to put it all together.

Saunderson: In the same way the orchestration is starting to pop up everywhere. If you look at the HPE product portfolio and you look at network coordination, it’s going to have an operations orchestration interface into it. Server automation is welded into operations orchestration and it’s going to appear everywhere else. Big data is coming with it.
Server automation is welded into operations orchestration and it’s going to appear everywhere else.

I'm not hesitant on it. It's just that it introduces complexity for me. The fact that the reporting engine is starting to turn big data is good. I'm happy for that. It just has to get more. It’s not enough to just be giving me job results that are easy to find and easy to search. Now, I want to get some really rich metadata out of things.

Software-defined network is a good example. The whole open flow activity just by itself looks like network management until it goes into a big-data thing and then suddenly, now I have a data source that I can start correlating events to that turn into actions inside the control that turns into change on the network. 

Let’s extend that concept. Let’s put that into orchestration, into service management, or into automation. Give me that and it doesn’t have to be the single platform. Give me a way to anticipate HPE’s product roadmap. The challenge for HPE is delivery.

Gardner: Before we sign off, one of the important things about IT investment is getting the buy-in and support from your superiors or the other aspects of your enterprise. Are there some tangible metrics of success, returns on investment (ROIs), improvements and productivity that you can point to from your orchestration, not just helping smart people do smart things, but benefiting the business at large? 

Business case

Saunderson: So organizations often only do the things that the boss checks. The obvious priorities for us are straight around our business case.

If you want to look at real tangibles, our virtual server provisioning, even though it’s the  heavyweight process that it is today, is turning from days into hours. That’s serious change, that’s serious organizational cultural change, but it’s not enough. It has to be minutes not hours, right? 

Then there's compliance. I keep coming back to it as this is a foundational thing. We're able to continue to pass SOX and PCI every time, but we do it efficiently. That’s a cultural change as well, but that’s something that CIOs and above do care about, because it’s kind of important.

One gets your CFO in trouble, and the other ones stops you taking payments. That gets people's attention really quickly. The moment you can delve into those and demonstrate that not only are you meeting those regulatory requirements, and you're able to report all of them and have auditors look at it and say yes we agree, you are doing all those things that you should be doing.
IT automation is an urgent priority
Automation and orchestration can pay huge dividends
Read the report
Then, you can flip that into the next area which is that we do have to go look at our applications for compliance. We have rich metadata over here that was able to articulate things. So let’s apply it there.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:

Monday, October 5, 2015

How fast analytics changes the game and expands the market for big data value

The next BriefingsDirect big-data thought leadership discussion highlights how fast analytics -- or getting to a big data analysis value in far less time than before -- expands the market for advanced data infrastructure to gain business insights.

We'll learn how bringing analytics to a cloud services model also allows smaller and less data-architecture-experienced firms to benefit from the latest in big-data capabilities. And we'll explore how Dasher Technologies is helping to usher in this democratization of big data value to more players in less time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or  download a copy.

To share how a fast ramp-up for big data as a service has evolved, we're joined by Justin Harrigan, Data Architecture Strategist at Dasher Technologies, as well as Chris Saso, Senior Vice President of Technology at Dasher Technologies in Campbell, California. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Justin, how have big-data practices changed over the past five years to set the stage for rapid leveraging of big-data capabilities?

Harrigan: Back in 2010, we saw big data become mainstream. Hadoop became a household name in the IT industry, doing scale-out architectures. Linux databases were becoming common practice. Moving away from traditional legacy, smaller, slower databases allowed this whole new world of analytics to open up to previously untapped resources within companies. So data that people had just been sitting on could now be used for actionable insights.

Harrigan
Fast forward to 2015, and we've seen big data become more approachable. Five years ago, only the largest organizations or companies that were specifically designed to leverage big-data architectures could do so. The smaller guys had maybe a couple of hundred or even tens of terabytes, and it required too much expertise or too much time and investment to get a big-data infrastructure up and running.

Today, we have approachable analytics, analytics as a service, hardened architectures that are almost turnkey with back-end hardware, database support, and applications -- all integrating seamlessly. As a result, the user on the front end, who is actually interacting with the data and making insights, is able to do so with very little overhead, very little upkeep, and is able to turn that data into business-impact data, where they can make decisions for the company.

Gardner: Justin, how big of an impact has this had? How many more types of companies or verticals have been enabled to start exploring advanced, cutting-edge, big-data capabilities? Is this a 20 percent increase? Perhaps almost any organization that wants to can start doing this.

Tipping point

Harrigan: The tipping point is when you outgrow your current solutions for data analytics. Data analytics is nothing new. We've been doing it for more than 50 years with databases. It’s just a matter of how big you can get, how much data you can put in one spot, and then run some sort of query against it and get a timely report that doesn’t take a week to come back or that doesn't time out on a traditional database.

Saso
Almost every company nowadays is growing so rapidly with the type of data they have. It doesn’t matter if you're an architecture firm, a marketing company, or a large enterprise getting information from all your smaller remote sites, everyone is compiling data to create better business decisions or create a system that makes their products run faster.

For people dipping their toes in the water for their first larger dataset analytics, there's a whole host of avenues available to them. They can go to some online providers, scale up a database in a couple of minutes, and be running.

They can download free trials. HP Vertica has a community edition, for example, and they can load it on a single server, up to terabytes, and start running there. And it’s significantly faster than traditional SQL.

It’s much more approachable. There are many different flavors and formats to start with, and people are realizing that. I wouldn’t even use the term big data anymore; big data is almost the norm.

Gardner: I suppose maybe the better term is any data, anytime.

Harrigan: Any data, anytime, anywhere, for anybody.

Gardner: I suppose another change over the past several years has been an emphasis away from batch processing, where you might do things at an infrequent or occasional basis, to this concept that’s more applicable to a cloud or an as-a-service model, where it’s streaming, continuous, and then you start reducing the latency down to getting close to real time.

Are we starting to see more and more companies being able to compress their feedback, and start to use data more rapidly as a result of this shift over the past five years or so?

Harrigan: It’s important to address the term big data. It’s almost like an umbrella, almost like the way people use cloud. With big data, you think large datasets, but you mentioned speed and agility. The ability to have real-time analytics is something that's becoming more prevalent and the ability to not just run a batch process for 18 hours on petabytes of data, but having a chart or a graph or some sort of report in real time. Interacting with it and making decisions on the spot is becoming mainstream.

We did a blog post on this not long ago, talking about how instead of big data, we should talk about the data pipe. That’s data ingest or fast data, typically OLTP data, that needs to run in memory or on hardware that's extremely fast to create a data stream that can ingest all the different points, sensors, or machine data that’s coming in.

Smarter analysis

Then we've talked about smarter analytic data that required some sort of number-crunching dataset on data that was relevant, not data that was real-time, but still fairly new, call it seven days or older and up to a year. And then, there's the data lake, which essentially is your data repository for historical data crunching.

Those are three areas you need to address when you talk about big data. The ability to consume that data as a service is now being made available by a whole host of companies in very different niches.

It doesn’t matter if it’s log data or sensor data, there's probably a service you can enable to start having data come in, ingest it, and make real-time decisions without having to stand up your own infrastructure.

Gardner: Of course, when organizations try to do more of these advanced things that can be so beneficial to their business, they have to take into consideration the technology, their skills, their culture -- people, process and technology, right?

Chris, tell us a bit about Dasher Technologies and how you're helping organizations do more with big-data capabilities, how you address this holistically, and this whole approach of people, process and technology.
Dasher has built up our team to be able to have a set of solutions that can help people solve these kinds of problems.

Saso: Dasher was founded in 1999 by Laurie Dasher. To give you an idea of who we are, we're a little over 65 employees now, and the size of our business is somewhere around $100 million.

We started by specializing in solving major data-center infrastructure challenges that folks had by actually applying the people, process and technology mantra. We started in the data center, addressing people’s scale out, server, storage, and networking types of problems. Over the past five or six years, we've been spending our energy, strategy, and time on the big areas around mobility, security, and of course, big data.

As a matter of fact, Justin and I were recently working on a project with a client around combining both mobility information and big data. It’s a retail client. They want to be able to send information to a customer that might be walking through a store, maybe send a coupon or things like that. So, as Justin was just talking about, you need fast information and making actionable things happen with that data quickly. You're combining something around mobility with big data.

Dasher has built up our team to be able to have a set of solutions that can help people solve these kinds of problems.

Gardner: Justin, let’s flesh that out a little bit around mobility. When people are using a mobile device, they're creating data that, through apps, can be shared back to a carrier, as well as application hosts and the application writers. So we have streams of data now about user experience and activities.

We also can deliver data and insights out to people in the other direction in that real-time of fashion, a closed loop, regardless of where they are. They don’t have to be at their desk, they don’t have to be looking at a specific business-intelligence (BI) application for example. So how has mobility changed the game in the past five years?

Capturing data

Harrigan: Dana, it’s funny you brought up the two different ways to capture data. Devices can be both used as a sensor point or as a way to interact with data. I remember seeing a podcast you did with HP Vertica and GUESS regarding how they interacted with their database on iPads.

In regards to interacting with data, it has become not only useful to data analysts or data scientists, but we can push that down into a format so lower-level folks who aren't so technical. With a fancy application in front of them, they can use the data as well to make decisions for companies and actually benefit the company.

You give that data to someone in a store, at GUESS for example, who can benefit by understanding where in the store to put jeans to impact sales. That’s huge. Rather than giving them a quarterly report and stuff that's outdated for the season, they can do it that same day and see what other sites are doing.

On the flip side, mobile devices are now sensors. A mobile device is constantly pinging access points over wi-fi. We can capture that data and, through a MAC address as an unique identifier, follow someone as they move through a store or throughout a city. Then, when they return, that person’s data is captured into a database and it becomes historical. They can track them through their device.
Read more on tackling big data analytics
Learn how the future is all about fast data
Find out how big data trends affect your business
It allows a whole new world of opportunities in terms of the way retailers interact with where they place merchandise, the way they interact with how they staff stores to make sure they have the proper amount of people for the certain time, what weather impact has on the store.

Lastly, as Chris mentioned, how do we interact with people on devices by pushing them data that's relevant as they move throughout their day?

The next generation of big data is not just capturing data and using it in reports, but taking that data in real time and possibly pushing it back out to the person who needs it most. In the retail scenario, that's the end users, possibly giving them a coupon as they're standing in front of something on a shelf that is relevant and something they will use.

Gardner: So we're not just talking about democratization of analytics in terms of the types of organizations, but now we're even talking about the types of individuals within those organizations.

Do you have any examples of some Dasher’s clients that have been able to exploit these advances and occurrences with mobile and cloud working in tandem, and how that's produced some sort of a business benefit?

Business impact

Harrigan: A good example of a client who leveraged a large dataset is One Kings Lane. They were having difficulty updating the website their users were interacting with because it’s a flash shopping website, where the information changes daily, and you have to be able to update it very quickly. Traditional technologies were causing a business impact and slowing things down.

They were able to leverage a really fast columnar database to make these changes and actually grow the inventory, grow the site, and have updates happen in almost real time, so that there was no impact or downtime when they needed to make these changes. That's a real-world example of when big data had the direct impact on the business line.

Gardner: Chris, tell us a little bit about how Dasher works with Hewlett Packard Enterprise technologies, and perhaps even some other HP partners like GoodData, when it comes to providing analytics as a service?
Once Vertica . . . has done the analysis, you have to report on that and make it in a nice human-readable form or human-consumable form.

Saso: HP has been a longtime partner from the very beginning, actually when we started the company. We were a partner of Vertica before HP purchased them back in 2011.

We started working with Vertica around big data, and Justin was one of our leads in that area at the time. We've grown that business and in other business units within HP to combine solutions, Vertica, big data, and hardware, as Justin was just talking about. You brought up the applications that are analyzing this big data. So we're partners in the ecosystem that help people analyze the data.

Once HP Vertica, or what have you, has done the analysis, you have to report on that and make it in a nice human-readable form or human-consumable form. We’ve built out our ecosystem at Dasher to have not only the analytics piece, but also the reporting piece.

Gardner: And on the as a service side, do you work with GoodData at all or are you familiar with them?

Saso: Justin, maybe you can talk a little bit about that. You've worked with them more I think on their projects.

Optimizing the environment

Harrigan: GoodData is a large consumer of Vertica and they actually leverage it for their back-end analytics platform for the service that they offer. Dasher has been working with GoodData over the past year to optimize the environment that they run on.

Vertica has different deployment scenarios, and you can actually deploy it in a virtual-machine (VM) environment or on bare-metal. And we did an analysis to see if there was a return on investment (ROI) on moving from a virtualized environment running on OpenStack to a bare-metal environment. Through a six-month proof of concept (POC), we leveraged HP Labs in Houston. We had a four-node system setup with multiple terabytes of data.

We saw 4:1 increase in performance in moving from a VM with the same resources to a bare-metal machine. That’s going to have a significant impact on the way they move data in their environment in the future and how they adjust to customers with larger datasets.

Gardner: When we think about optimizing the architecture and environment for big data, are there any other surprises or perhaps counter-intuitive things that have come up, maybe even converged infrastructure for smaller organizations that want to get in fast and don’t want to be too concerned with the architecture underlying the analytics applications?
That’s going to have a significant impact on the way they move data in their environment in the future and how they adjust to customers with larger datasets.

Harrigan: There's a tendency now with so many free solutions out there to pick a free solution, something that gets the job done now, something that grows the business rapidly, but to forget about what businesses will need three years down the road, if it's going to grow, if it’s going to survive.

There are a lot of startups out there that are able to build a big data infrastructure, scale it to 5,000 nodes, and then they reach a limit. There are network limits on how fast the switch can move data between nodes, constantly pushing the limits of 10 Gbyte, 40 Gyte and soon 100 Gbyte networks to keep those infrastructures up.

Depending on what architecture you choose, you may be limited in the number of nodes you can go to. So there are solutions out there that can process a million transactions per second with 100 nodes, and then there are solutions that can process a million transactions per second with 20 nodes, but may cost slightly more.

If you think long-term, if you start in the cloud, you want to be able to move out of the cloud. If you start with an open ecosystem, you want to make sure that your hardware refresh is not going to cost so much that the company can’t afford it three years down the road. One of the areas we help consult with, when picking different architectures, is thinking long-term. Don't think six weeks down the road, how are we going to get our service up and running? Think, okay, we have a significant client install base, how we are going to grow the business from three to five years and five to 10 years?

Gardner: Given that you have quite a few different types of clients, and the idea of optimizing architecture for the long-term seems to be important, I know with smaller companies there’s that temptation to just run with whatever you get going quickly.

What other lessons can we learn from that long-term view when it comes to skills, security, something more than the speeds and feeds aspects of thinking long term about big data?

Numerous regulations

Harrigan: Think about where your data is going to reside and the requirements and regulations that you may run into. There are a million different regulations we have to do now with HIPAA, ITAR, and money transaction processes in a company. So if you ever perceive that need, make sure you're in an ecosystem that supports it. The temptation for smaller companies is just to go cloud, but who owns that data if you go under, or who owns that data when you get audited?

Another problem is encryption. If you're going to start gaining larger customers once you have a proven technology or a proven service, they're going to want to make sure that you're compliant for all their regulations, not just your regulations that your company is enforcing.

There's logging that they're required to have, and there is going to be encryption and protocols and the ability to do audits on anyone who is accessing the data.

Gardner: On this topic of optimizing, when you do it right, when you think about the long term, how do you know you have that right? Are there some metrics of success? Are there some key performance indicators (KPIs) or ROIs that one should look to so they know that they're not erring on the side of going too commercial or too open source or thinking short term only? Maybe some examples of what one should be looking for and how to measure that.
If you implement a system and it costs you $10 million to run and your ROI is $5 million, you've made a bad decision.

Harrigan: That’s going to be largely subjective to each business. Obviously if you're just going to use a rule of thumb, it shouldn't cost you more money than it makes you. If you implement a system and it costs you $10 million to run and your ROI is $5 million, you've made a bad decision.

The two factors are the value to the business. If you're a large enterprise and you implement big data, and it gives you the ability to make decisions and quantify those decisions, then you can put a number to that and see how much value that big-data system is creating. For example, a new marketing campaign or something you're doing with your remote sites or your retail branches and it’s quantifiable and it’s having an impact on the business.

The other way to judge it is impact on business. So, for ad serving companies, the way they make money is ad impressions, and the more ad impressions they can view, for the least cost in their environment, the higher return they're going to make. The delta is between the infrastructure costs and the top line that they get to report to all their investors.

If they can do 56 billion ad impressions in a day, and you can double that by switching architectures, that’s probably a good investment. But if you can only improve it by 10 percent by switching architectures, it’s probably too much work for what it’s worth.
Read more on tackling big data analytics
Learn how the future is all about fast data
Find out how big data trends affect your business
Gardner: One last area on this optimization idea. We've seen, of course, organizations subjectively make decisions about whether to do this on-premises, maybe either virtualized or on bare metal. They will do their cost-benefit analysis. Others are looking at cloud and as a service model.

Over time, we expect to have a hybrid capability, and as you mentioned, if you think ahead that if you start in the cloud and move private, or if you start private you want to be able to move to the cloud, we're seeing the likelihood of more of that being able to move back and forth.

Thinking about that, do you expect that companies will be able to do that? Where does that make the most sense when it comes to data? Is there a type of analysis that you might want to do in a cloud environment primarily, but other types of things you might do private? How do we start to think about breaking out where on the spectrum of hybrid cloud set of options one should be considering for different types of big-data activity?

Either-or decision

Harrigan: In the large data analytics world, it’s almost an either-or decision at this time. I don’t know what it will look like in the future.

Workloads that lend themselves extremely well to the cloud are inconsistent, maybe seasonal, where 90 percent of your business happens in December. Seasonal workloads like that lend themselves extremely well to the cloud.

Or, if your business is just starting out, and you don't know if you're going to need a full 400-node cluster to run whatever platform or analytics platform you choose, and the hardware sits idle for 50 percent of the time, or you don’t get full utilization. Those companies need a cloud architecture, because they can scale up and scale down based on needs.

Companies that benefit from on-premise are ones that can see significant savings by not using cloud and paying someone else to run their environment. Those companies typically pin the CPU usage meter at 100 percent, as much as they can, and then add nodes to add more capacity.

The best advice I could give is, if you start in the cloud or you start on bare metal, make sure you have agility and you're able to move workloads around. If you choose one sort of architecture that only works in the cloud and you are scaling up and you have to do a rip and replace scenario just to get out of the cloud and move to on-premise, that’s going to be significant business impact.

One of the reasons I like HP Vertica is that it has a cloud instance that can run on a public cloud. That same instance, that same architecture runs just as well on bare metal, only faster.

Gardner: Chris, last word to you. For those organizations out there struggling with big data, trying to figure out the best path, trying to think long term, and from an architectural and strategic point of view, what should they consider when coming to an organization like Dasher? Where is your sweet spot in terms of working with these organizations? How should they best consider how to take advantage of what you have to offer?

Saso: Every organization is different, and this is one area where that's true. When people are just looking for servers, they're pretty much all the same. But when you're actually trying to figure out your strategy for how you are going to use big-data analytics, every company, big or small, probably does have a slightly different thing they are trying to solve.

That's where we would sit down with that client and really listen and understand, are they trying to solve a speed issue with their data, are they trying to solve massive amounts of data and trying to find the needle in a haystack, the golden egg, golden nugget in there? Each of those approaches certainly has a different answer to it.
Read more on tackling big data analytics
Learn how the future is all about fast data
Find out how big data trends affect your business
So coming with your business problem and also what you would like to see as a result -- we would like to see x-number of increase in our customer satisfaction number or x-number of increase in revenue or something like that -- helps us define the metric that we can then help design toward.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or  download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in: