Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: Dell Software.
Debunking myths around
big data should be a first step to making better business decisions for improving data analysis and data management capabilities in your company.
As the volume and purpose of data and
business intelligence (BI) has dramatically shifted, older notions and misconceptions -- what amount to myths about data infrastructure -- need to updated and corrected, too.
So
we're here to pose some better questions about data, and provide up-to-date answers for running data-driven businesses that can efficiently and repeatedly predict dynamic market trends and customer wants in real time.
As
the volume and types of data that are brought to bear on business
analytics advance, the means to manage and exploit that sea of data needs to be none too costly nor too complex for
mid-size companies to master. There are better ways than traditional data architectures.
To
help identify what works best around modern big data management, BriefingsDirect interviews
Darin Bartik, Executive Director of Products in the Information Management Group at Dell Software. The discussion is conducted by
Dana Gardner, Principal Analyst at
Interarbor Solutions. [Disclosure:
Dell is a sponsor of
BriefingsDirect podcasts.]
Here are some excerpts:
Gardner: Are people
losing sight of the business value by getting lost in speeds and feeds
and technical jargon around big data? Is there some sort of a disconnect between the
providers and consumers of big data?
Bartik: You
hit the nail on the head with the first question. We are experiencing a
disconnect between the technical side of big data and the business
value of big data, and that’s happening because we’re digging too deeply
into the technology.
With a term like big data, or any one of the trends
that the information technology industry talks about so much, we tend to
think about the technical side of it. But with analytics, with the
whole conversation around big data -- what we've been stressing with many
of our customers -- is that it starts with a business discussion. It starts
with the questions that you're trying to answer about the business; not
the technology, the tools, or the architecture of solving those
problems. It has to start with the business discussion.
That’s a pretty big flip. The traditional approach to BI
and reporting has been one of technology frameworks, and a lot of things
that were owned more by the IT group. This is part of the reason why a
lot of the BI projects of the past struggled, because there was a
disconnect between the business goals and the IT methods.
So
you're right. There has been a disconnect, and that’s what I've been
trying to talk a lot about with customers -- how to refocus on the
business issues you need to think about, especially in the mid-market, where you maybe don’t have as many resources at hand. It can be pretty confusing.
I've been a part of Dell Software since the acquisition of Quest Software.
I was a part of that organization for close to 10 years. I've been in
technology coming up on 20 years now. I spent a lot of time in enterprise resource planning (ERP), supply chain, and monitoring, performance management, and infrastructure management, especially on the Microsoft side of the world.
Most
recently, as part of Quest, I was running the database management area
-- a business very well-known for its products around Oracle, especially Toad, as well as our SQL Server management capabilities. We leveraged that expertise when we started to evolve into BI and analytics.
I started working with Hadoop
back in 2008-2009, when it was still very foreign to most people. When
Dell acquired Quest, I came in and had the opportunity to take over the
Products Group in the ever-expanding world of information management.
We're part of the Dell Software Group, which is a big piece of the
strategy for Dell over all, and I'm excited to be here.
Part of the hype cycle
Without
disparaging the vendors like us, or anyone else, the current confusion is part of the
problem of any hype cycle. Many people jumped on the bandwagon of big
data. Just like everyone was talking cloud. Everyone was talking
virtualization, bring your own device (BYOD), and so forth.
Everyone
jumps on these big trends. So it's very confusing for customers,
because there are many different ways to come at the problem. This is
why I keep bringing people back to staying focused on what the real
opportunity is. It’s a business opportunity, not a technical problem or a
technical challenge that we start with.
It’s not a size issue. It's really a trend
that has happened as a result of digitizing so much more of the
information that we all have already.
Gardner: Even the name "big data" stirs up myths right from the get-go,
with "big" being a very relative term. Should we only be concerned about
this when we have more data than we can manage? What is the relative
position of big data and what are some of the myths around the size
issue?
Bartik: That’s the perfect one to start
with. The first word in the definition is actually part of the problem.
"Big." What does big mean? Is there a certain threshold of petabytes that you have to get to? Or, if you're dealing with petabytes, is it not a problem until you get to exabytes?
It’s
not a size issue. When I think about big data, it's really a trend that
has happened as a result of digitizing so much more of the information
that we all have already and that we all produce. Machine data, sensor
data, all the social media activities, and mobile devices are all contributing to the proliferation of data.
It's
added a lot more data to our universe, but the real opportunity is to
look for small elements of small datasets and look for combinations and
patterns within the data that help answer those business questions that I
was referencing earlier.
It's not necessarily a scale
issue. What is a scale issue is when you get into some of the more
complicated analytical processes and you need a certain data volume to
make it statistically relevant. But what customers first want to think
about is the business problems that they have. Then, they have to think
about the datasets that they need in order to address those problems.
Big-data challenge
That
may not be huge data volumes. You mentioned mid-market earlier. When we
think about some organizations moving from gigabytes to terabytes, or
doubling data volumes, that’s a big data challenge in and of itself.
Analyzing
big data won't necessarily contribute to your solving your business
problems if you're not starting with the right questions. If you're just
trying to store more data, that’s not really the problem that we have
at hand. That’s something that we can all do quite well with current
storage architectures and the evolving landscape of hardware that we
have.
We all know that we have growing data, but the exact size, the exact threshold that we may cross, that’s not the relevant issue.
Gardner:
I suppose this requires prioritization, which has to come from the
business side of the house. As you point out, some statistically
relevant data might be enough. If you can extrapolate and you have
enough to do that, fine, but there might be other areas where you
actually want to get every little bit of possible data or information
relevant, because you don't know what you're looking for. They are the
unknown unknowns. Perhaps there's some mythology about all data. It
seems to me that what’s important is the right data to accomplish what
it is the business wants.
Bartik: Absolutely. If
your business challenge is an operational efficiency or a cost problem,
where you have too much cost in the business and you're trying to pull
out operational expense and not spend as much on capital expense, you
can look at your operational data.
There's a lot of variability and prioritization that all starts with that business issue that you're trying to address.
Maybe
manufacturers are able to do that and analyze all of the sensor,
machine, manufacturing line, and operational data. That's a very
different type of data and a very different type of approach than
looking at it in terms of sales and marketing.
If
you're a retailer looking for a new set of customers or new markets to
enter in terms of geographies, you're going to want to look at maybe
census data and buying-behavior data of the different geographies. Maybe
you want datasets that are outside your organization entirely. You may
not have the data in your hands today. You may have to pull it in from
outside resources. So there's a lot of variability and prioritization
that all starts with that business issue that you're trying to address.
Gardner:
Perhaps it's better for the business to identify the important data,
rather than the IT people saying it’s too big or that big means we need
to do something different. It seems like a business term rather than a
tech term at this point.
Bartik: I agree with
you. The more we can focus on bringing business and IT to the table
together to tackle this challenge, the better. And it does start with
the executive management in the organization trying to think about
things from that business perspective, rather than starting with the IT
infrastructure management team.
Gardner: What’s our second myth?
Bartik: I'd think about the idea of people and the skills needed to address this concept of big data. There is the term "data scientist"
that has been thrown out all over the place lately. There’s a lot of
discussion about how you need a data scientist to tackle big data. But
“big data” isn't necessarily the way you should think about what you’re
trying to accomplish. Instead, think about things in terms of being more
data driven, and in terms of getting the data you need to address the
business challenges that you have. That’s not always going to require
the skills of a data scientist.
Data scientists rare
I
suspect that a lot of organizations would be happy to hear something
like that, because data scientists are very rare today, and they're very
expensive, because they are rare. Only certain geographies and certain
industries have groomed the true data scientist. That's a unique blend
between a data engineer and someone like an applied scientist, who can
think quite differently than just a traditional BI developer or BI
programmer.
Don’t get stuck on thinking that, in order
to take on a data-driven approach, you have to go out and hire a data
scientist. There are other ways to tackle it. That’s where you're going
to combine people who can do the programming around your information,
around the data management principles, and the people who can ask and
answer the open-minded business questions. It doesn’t all have to be
encapsulated into that one magical person that’s known now as the data
scientist.
There are varying degrees of tackling this problem. You can get into very sophisticated algorithms
and computations for which a data scientist may be the one to do that
heavy lifting. But for many organizations and customers that we talk to
everyday, it’s something where they're taking on their first project and
they are just starting to figure out how to address this opportunity.
For
that, you can use a lot of the people that you have inside your
organization, as well potentially consultants that can just help you
break through some of the old barriers, such as thinking about
intelligence, based strictly on a report and a structured dashboard
format.
Often a combination of programming and some open-minded thinking, done
with a team-oriented approach, rather than that single keyhole person,
is more than enough to accomplish your objectives.
That’s
not the type of approach we want to take nowadays. So often a
combination of programming and some open-minded thinking, done with a
team-oriented approach, rather than that single keyhole person, is more
than enough to accomplish your objectives.
Gardner:
It seems also that you're identifying confusion on the part of some to
equate big data with BI and BI with big data. The data is a resource
that the BI can use to offer certain values, but big data can be applied
to doing a variety of other things. Perhaps we need to have a
sub-debunking within this myth, and that is that big data and BI are
different. How would you define them and separate them?
Bartik:
That's a common myth. If you think about BI in its traditional, generic
sense, it’s about gaining more intelligence about the business, which
is still the primary benefit of the opportunity this trend of big data
presents to us. Today, I think they're distinct, but over time, they
will come together and become synonymous.
I equate it
back to one of the more recent trends that came right before big data,
cloud. In the beginning, most people thought cloud was the public-cloud concept. What’s turned out to be true is that it’s more of a private cloud or a hybrid cloud,
where not everything moved from an on-premise traditional model, to a
highly scalable, highly elastic public cloud. It’s very much a mix.
They've kind of come together. So while cloud and traditional data centers
are the new infrastructure, it’s all still infrastructure. The same is
true for big data and BI, where BI, in the general sense of how can we
gain intelligence and make smarter decisions about our business, will
include the concept of big data.
Better decisions
So
while we'll be using new technologies, which would include Hadoop,
predictive analytics, and other things that have been driven so much
faster by the trend of big data, we’ll still be working back to that
general purpose of making better decisions.
One of the
reasons they're still different today is because we’re still breaking
some of the traditional mythology and beliefs around BI -- that BI is
all about standard reports and standard dashboards, driven by IT. But
over time, as people think about business questions first, instead of
thinking about standard reports and standard dashboards first, you’ll
see that convergence.
Gardner: We probably need
to start thinking about BI in terms of a wider audience, because all the
studies I've seen don't show all that much confidence and satisfaction
in the way BI delivers the analytics or the insights that people are
looking for. So I suppose it's a work in progress when it comes to BI as
well.
Bartik: Two points on that. There has
been a lot of disappointment around BI projects in the past. They've
taken too long, for one. They've never really been finished, which of
course, is a problem. And for many of the business users who depend on
the output of BI -- their reports, their dashboard, their access to data
-- it hasn’t answered the questions in the way that they may want it
to.
One of the things in front of us today is a way of
thinking about it differently. Not only is there so much data, and so
much opportunity now to look at that data in different ways, but there
is also a requirement to look at it faster and to make decisions faster.
So it really does break the old way of thinking.
People are trying to make decisions about moving the business forward, and they're being forced to do it faster.
Slowness
is unacceptable. Standard reports don't come close to addressing the
opportunity in front us, which is to ask a business question and answer
it with the new way of thinking supported by pulling together different
datasets. That’s fundamentally different from the way we used to do it.
People
are trying to make decisions about moving the business forward, and
they're being forced to do it faster. Historical reporting just doesn't
cut it. It’s not enough. They need something that’s much closer to real
time. It’s more important to think about open-ended questions, rather
than just say, "What revenue did I make last month, and what products
made that up?" There are new opportunities to go beyond that.
Gardner: When it comes to
these technology issues, do you also find, Darin, that there is a lack
of creativity as to where the data and information resides or exists and
thinking not so much about being able to run it, but rather acquire it?
Is there a dissonance between the data I have and the data I need. How
are people addressing that?
Bartik: There is and
there isn’t. When we look at the data that we have, that’s oftentimes a
great way to start a project like this, because you can get going
faster and it’s data that you understand. But if you think that you have
to get data from outside the organization, or you have to get new
datasets in order to answer the question that’s in front of us, then,
again, you're going in with a predisposition to a myth.
You
can start with data that you already have. You just may not have been
looking at the data that you already have in the way that’s required to
answer the question in front of you. Or you may not have been looking at
it all. You may have just been storing it, but not doing anything with
it.
Storing data doesn’t help you answer questions. Analyzing it does.
Storing
data doesn’t help you answer questions. Analyzing it does. It seems
kind of simple, but so many people think that big data is a storage
problem. I would argue it's not about the storage. It’s like backup and recovery. Backing up data is not that important, until you need to recover it. Recovery is really the game changing thing.
Gardner:
It’s interesting that with these myths, people have tended, over the
years, without having the resources at hand, to shoot from the hip and
second-guess. People who are good at that and businesses that have been
successful have depended on some luck and intuition. In order to take
advantage of big data, which should lead you to not having to make
educated guesses, but to have really clear evidence, you can apply the
same principle. It's more how you get big data in place, than how you
would use the fruits of big data.
It seems like a
cultural shift we have to make. Let’s not jump to conclusions. Let’s get
the right information and find out where the data takes us.
Bartik:
You've hit on one of the biggest things that’s in front of us over the
next three to five years -- the cultural shift that the big data concept
introduces.
We looked at traditional BI as more of an
IT function, where we were reporting back to the business. The business
told us exactly what they wanted, and we tried to give that to them
from the IT side of the fence.
Data-driven organization
But
being successful today is less about intuition and more about being a
data-driven organization, and, for that to happen, I can't stress this
one enough, you need executives who are ready to make decisions based on
data, even if the data may be counter intuitive to what their gut says
and what their 25 years of experience have told them.
They're
in a position of being an executive primarily because they have a lot
of experience and have had a lot of success. But many of our markets are
changing so frequently and so fast, because of new customer patterns
and behaviors, because of new ways of customers interacting with us via
different devices. Just think of the different ways that the markets are
changing. So much of that historical precedence no longer really
matters. You have to look at the data that’s in front of us.
Because
things are moving so much faster now, new markets are being penetrated
and new regions are open to us. We're so much more of a global economy.
Things move so much faster than they used to. If you're depending on gut
feeling, you'll be wrong more often than you'll be right. You do have
to depend on as much of a data-driven decision as you can. The only way
to do that is to rethink the way you're using data.
Historical
reports that tell you what happened 30 days ago don't help you make a
decision about what's coming out next month, given that your competition
just introduced a new product today. It's just a different mindset. So
that cultural shift of being data-driven and going out and using data to
answer questions, rather than using data to support your gut feeling,
is a very big shift that many organizations are going to have to adapt
to.
Executives who get that and drive it down into the
organization, those are the executives and the teams that will succeed
with big data initiatives, as opposed to those that have to do it from
the bottom up.
It's fair to say that big data is not just a trend; it's a reality. And
it's an opportunity for most organizations that want to take advantage
of it.
Gardner: Listening to you Darin, I
can tell one thing that isn’t a product of hype is just how important
this all is. Getting big data right, doing that cultural shift,
recognizing trends based on the evidence and in real-time as much as
possible is really fundamental to how well many businesses will succeed
or not.
So it's not hype to say that big data is going
to be a part of your future and it's important. Let's move towards how
you would start to implement or change or rethink things, so that you
can not fall prey to these myths, but actually take advantage of the
technologies, the reduction in costs for many of the infrastructures,
and perhaps extend and exploit BI and big data problems.
Bartik:
It's fair to say that big data is not just a trend; it's a reality. And
it's an opportunity for most organizations that want to take advantage
of it. It will be a part of your future. It's either going to be part of
your future, or it's going to be a part of your competition’s future,
and you're going to be struggling as a result of not taking advantage of
it.
The first step that I would recommend -- I've said
it a few times already, but I don't think it can't be said too often --
is pick a project that's going to address a business issue that you've
been unable to address in the past.
What are the
questions that you need to ask and answer about your business that will
really move you forward?" Not just, "What data do we want to look at?"
That's not the question.
What business issue?
The
question is what business issue do we have in front of us that will
take us forward the fastest? Is it reducing costs? Is it penetrating a
new regional market? Is it penetrating a new vertical industry, or
evolving into a new customer set?
These are the kind
of questions we need to ask and the dialogue that we need to have. Then
let's take the next step, which is getting data and thinking about the
team to analyze it and the technologies to deploy. But that's the first
step – deciding what we want to do as a business.
That
sets you up for that cultural shift as well. If you start at the
technology layer, if you start at the level of let's deploy Hadoop or
some type of new technology that may be relevant to the equation, you're
starting backwards. Many people do it, because it's easier to do that
than it is to start an executive conversation and to start down the path
of changing some cultural behavior. But it doesn’t necessarily set you
up for success.
Gardner: It sounds as if you
know you're going on a road trip and you get yourself a Ferrari, but you
haven't really decided where you're going to go yet, so you didn’t know
that you actually needed a Ferrari.
Bartik:
Yeah. And it's not easy to get a tent inside a Ferrari. So you have to
decide where you're going first. It's a very good analogy.
Get smart by going to your peers and going to your industry influencer groups and learning more about how to approach this.
Gardner:
What are some of the other ways when it comes to the landscape out
there? There are vendors who claim to have it all, everything you need
for this sort of thing. It strikes me that this is more of an early
period and that you would want to look at a best-of-breed approach or an
ecosystem approach.
So are there any words of wisdom
in terms of how to think about the assets, tools, approaches, platforms,
what have you, or not to limit yourself in a certain way?
Bartik:
There are countless vendors that are talking about big data and
offering different technology approaches today. Based on the type of
questions that you're trying to answer, whether it's more of an
operational issue, a sales market issue, HR, or something else, there
are going to be different directions that you can go in, in terms of the
approaches and the technologies used.
I encourage the
executives, both on the line-of-business side as well as the IT side,
to go to some of the events that are the "un-conferences," where we talk
about the big-data approach and the technologies. Go to the other
events in your industry where they're talking about this and learn what
your peers are doing. Learn from some of the mistakes that they've been
making or some of the successes that they've been having.
There's
a lot of success happening around this trend. Some people certainly are
falling into the pitfalls, but get smart by going to your peers and
going to your industry influencer groups and learning more about how to
approach this.
Technical approaches
There
are technical approaches that you can take. There are different ways of
storing your data. There are different ways of computing and processing
your data. Then, of course, there are different analytical approaches
that get more to the open-ended investigation of data. There are many
tools and many products out there that can help you do that.
Dell
has certainly gone down this road and is investing quite heavily in
this area, with both structured and unstructured data analysis, as well
as the storage of that data. We're happy to engage in those
conversations as well, but there are a lot of resources out there that
really help companies understand and figure out how to attack this
problem.
Gardner: In the past, with many of the
technology shifts, we've seen a tension and a need for decision around
best-of-breed versus black box, or open versus entirely turnkey, and I'm
sure that's going to continue for some time.
But one
of the easier ways or best ways to understand how to approach some of
those issues is through some examples. Do we have any use cases or
examples that you're aware of, of actual organizations that have had
some of these problems? What have they put in place, and what has worked
for them?
There are a lot of resources out there that really help companies understand and figure out how to attack this problem.
Bartik:
I'll give you a couple of examples from two very different types of
organizations, neither of which are huge organizations. The first one is
a retail organization, Guess Jeans.
The business issue they were tackling was, “How do we get more sales in
our retail stores? How do we get each individual that's coming into our
store to purchase more?”
We sat down and started
thinking about the problem. We asked what data would we need to
understand what’s happening? We needed data that helps us understand the
buyer’s behavior once they come into the store. We don't need data
about what they are doing outside the store necessarily, so let's look
specifically at behaviors that take place once they get into the store.
We
helped them capture and analyze video monitoring information. Basically
it followed each of the people in the store and geospatial locations
inside the store, based on their behavior. We tracked that data and then
we compared against questions like did they buy, what did they buy, and
how much did they buy. We were able to help them determine that if you
get the customer into a dressing room, you're going to be about 50
percent more likely to close transactions with them.
So
rather than trying to give incentives to come into the store or give
discounts once they get into the store, they moved towards helping the
store clerks, the people who ran the store and interacted with the
customers, focus on getting those customers into a dressing room. That
itself is a very different answer than what they might have thought of
at first. It seems easy after you think about it, but it really did make
a significant business impact for them in rather short order.
Now,
they're also thinking about other business challenges that they have
and other ways of analyzing data and other datasets, based on different
business challenges, but that’s one example.
Another
example is on the higher education side. In universities, one of the
biggest challenges is having students drop out or reduce their class
load. The fewer classes they take, or if they dropout entirely, it
obviously goes right to the top and bottom line of the organization,
because it reduces tuition, as well as the other extraneous expenses
that students incur at the university.
Finding indicators
The University of Kentucky
went on an effort to reduce students dropping out of classes or
dropping entirely out of school. They looked at a series of datasets,
such as demographic data, class data, the grades that they were
receiving, what their attendance rates were, and so forth. They analyzed
many different data points to determine the indicators of a future drop
out.
Now, just raising the student retention rate by
one percent would in turn mean about $1 million of top-line revenue to
the university. So this was pretty important. And in the end, they were
able to narrow it down to a couple of variables that strongly indicated
which students were at risk, such that they could then proactively
intervene with those students to help them succeed.
The
key is that they started with a very specific problem. They started it
from the university's core mission: to make sure that the students
stayed in school and got the best education, and that's what they are
trying to do with their initiative. It turned out well for them.
These
were very different organizations or business types, in two very
different verticals, and again, neither are huge organizations that have
seas of data. But what they did are much more manageable and much more
tangible examples many of us can kind of apply to our own businesses.
Gardner: Those really demonstrate how asking the right questions is so important.
What we have today is a set of capabilities that help customers take
more of a data-type agnostic view and a vendor agnostic view to the way
they're approaching data and managing data.
Darin,
we're almost out of time, but I did want to see if we could develop a
little bit more insight into the Dell Software road map. Are there some
directions that you can discuss that would indicate how organizations
can better approach these problems and develop some of these innovative
insights in business?
Bartik: A couple of
things. We've been in the business of data management, database
management, and managing the infrastructure around data for well over a
decade. Dell has assembled a group of companies, as well as a lot of
organic development, based on their expertise in the data center for
years. What we have today is a set of capabilities that help customers
take more of a data-type agnostic view and a vendor agnostic view to the
way they're approaching data and managing data.
You
may have 15 tools around BI. You may have tools to look at your Oracle
data, maybe new sets of unstructured data, and so forth. And you have
different infrastructure environments set up to house that data and
manage it. But the problem is that it's not helping you bring the data
together and cross boundaries across data types and vendor toolset
types, and that's the challenge that we're trying to help address.
We've
introduced tools to help bring data together from any database,
regardless of where it may be sitting, whether it's a data warehouse, a
traditional database, a new type of database such as Hadoop, or some
other type of unstructured data store.
We want to
bring that data together and then analyze it. Whether you're looking at
more of a traditional structured-data approach and you're exploring data
and visualizing datasets that many people may be working with, or doing
some of the more advanced things around unstructured data and looking
for patterns, we’re focused on giving you the ability to pull data from
anywhere.
Using new technologies
We're
investing very heavily, Dana, into the Hadoop framework to help
customers do a couple of key things. One is helping the people that own
data today, the database administrators, data analysts, the people that
are the stewards of data inside of IT, advance their skills to start
using some of these new technologies, including Hadoop.
It's
been something that we have done for a very long time, making your C
players B players, and your B players A players. We want to continue to
do that, leverage their existing experience with structured data, and
move them over into the unstructured data world as well.
The
other thing is that we're helping customers manage data in a much more
pragmatic way. So if they are starting to use data that is in the cloud,
via Salesforce.com or Taleo,
but they also have data on-prem sitting in traditional data stores, how
do we integrate that data without completely changing their
infrastructure requirements? With capabilities that Dell Software has
today, we can help integrate data no matter where it sits and then
analyze it based on that business problem.
We help
customers approach it more from a pragmatic view, where you're taking a
stepwise approach. We don't expect customers to pull out their entire
BI and data-management infrastructure and rewrite it from scratch on day
one. That's not practical. It's not something we would recommend. Take a
stepwise approach. Maybe change the way you're integrating data. Change
the way you're storing data. Change, in some perspective, the way
you're analyzing data between IT and the business, and have those teams
collaborate.
But you don't have to do it all at one time. Take that stepwise approach.
But
you don't have to do it all at one time. Take that stepwise approach.
Tackle it from the business problems that you're trying to address, not
just the new technologies we have in front of us.
There's
much more to come from Dell in the information management space. It
will be very interesting for us and for our customers to tackle this
problem together. We're excited to make it happen.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: Dell Software.
You may also be interested in: