The next BriefingsDirect business trends panel discussion explores how the role of the data scientist in the enterprise is expanding in both importance and influence.
Data scientists are now among
the most
highly sought-after professionals, and they are being called on to work
more closely than ever with enterprise strategists to predict emerging trends,
optimize outcomes, and create entirely new kinds of business value.
Listen
to the podcast. Find it on iTunes. Read a full transcript or download a copy.
To learn more about modern data
scientists, how they operate, and why a new level of business analysis professional
certification has been created by The Open
Group, we are joined by Martin Fleming, Vice President,
Chief Analytics Officer, and Chief Economist at IBM; Maureen
Norton, IBM Global Data Scientist Professional Lead, Distinguished Market
Intelligence Professional, and author of Analytics
Across the Enterprise, and George Stark,
Distinguished Engineer for IT Operations Analytics at IBM. The panel is moderated by Dana
Gardner, Principal Analyst at Interarbor
Solutions.
Here are some excerpts:
Gardner: We
are now characterizing the data scientist as a profession. Why have we elevated
the role to this level, Martin?
Fleming |
Fleming: The
benefits we have from the technology that’s now available allow us to bring
together the more traditional skills in the space of mathematics and statistics
with computer science and data engineering. The technology wasn't as useful
just 18 months ago. It’s all about the very rapid pace of change in technology.
Gardner: Data
scientists used to be behind-the-scenes people; sneakers, beards, white lab
coats, if you will. What's changed to now make them more prominent?
Norton: Today’s
data scientists are consulting with the major leaders in each corporation and
enterprise. They are consultants to them. So they are not in the back room, mulling
around in the data anymore. They're taking the insights they're able to glean and
support with facts and using them to provide recommendations and to provide
insights into the business.
Gardner: Most
companies now recognize that being data-driven is an imperative. They can’t
succeed in today's world without being data-driven. But many have a
hard time getting there. It's easier said than done. How can the data scientist
as a professional close that gap?
Stark |
Stark: The biggest
drawback in integration of data sources is having disparate data systems. The financial
system is always separate from the operational system, which is separate from
the human resources (HR) system. And you need to combine those and make sure
they're all in the same units, in the same timeframe, and all combined in a way
that can answer two questions. You have to answer, “So what?” And you have to answer,
“What if?” And that’s really the challenge of data science.
Gardner: An awful
lot still has to go on behind the scenes before you get to the point where the “a-ha”
moments and the strategic inputs take place.
Martin, how will the nature of
work change now that the data scientist as a profession is arriving – and probably
just at the right time?
Fleming: The
insights that data scientists provide allow organizations to understand where the
opportunities are to improve productivity, of how they can help to make workers
more effective, productive, and to create more value. This enhances the role of
the individual employees. And it’s that value creation, the integration of the
data that George talked about, and the use of analytic tools that's driving
fundamental changes across many organizations.
Captain of the data team
Gardner: Is
there any standardization as to how the data scientist is being organized
within companies? Do they typically report to a certain C-suite executive or
another? Has that settled out yet? Or are we still in a period of churn as to
where the data scientist, as a professional, fits in?
Norton |
Norton: We're
still seeing a fair amount of churn. Different organizing approaches have been tried.
For example, the centralized center of excellence that supports other business
units across a company has a lot of believers and followers.
The economies of scale in that
approach help. It’s difficult to find one person with all of the skills you
might need. I’m describing the role of consultant to the presidents of
companies. Sometimes you can’t find all of that in one individual -- but you
can build teams that have complimentary skills. We like to say that data science
is a team sport.
Gardner:
George, are we focusing the new data
scientist certification on the group or the individual? Have we progressed
from the individual to the group yet?
Stark: I
don’t believe we are there yet. We’re still certifying at the individual level.
But as Maureen said, and as Martin alluded to, the group approach has a large
effect on how you get certified and what kinds of solutions you come up with.
Gardner: Does
the certification lead to defining the managerial side of this group, with the data
scientist certified in organizing in a methodological, proven way that group or
office?
Data Scientist
Fleming: The certification
we
are announcing focuses not only on the technical skills of a data scientist,
but also on project management and project leadership. So as data scientists
progress through their careers, the more senior folks are certainly in a position
to take on significant leadership and management roles.
And we are seeing over time,
as George referenced, a structure beginning to appear. First in the technology
industry, and over time, we’ll see it in other industries. But the technology
firms whose names we are all familiar with are the ones who have really taken
the lead in putting the structure together.
Gardner: How has
the “day in the life” of the typical data scientist changed in the last 10
years?
Stark: It’s
scary to say, but I have been a data
scientist for 30 years. I began writing my own Fortran 77 code to integrate
datasets to do eigenvalues
and eigenvectors and build models that would discriminate among key objects
and allow us to predict what something was.
The difference today is that I
can do that in an afternoon. We have the tools, datasets, and all the
capabilities with visualization tools, SPSS, IBM Watson, and Tableau. Things that used to take me months
now take a day and a half. It’s incredible, the change.
Gardner: Do you
as a modern data scientist find yourself interpreting what the data science can
do for the business people? Or are you interpreting what the business people
need, and bringing that back to the data scientists? Or perhaps both?
Collaboration is key
Stark: It’s
absolutely both. I was recently with a client, and we told them, “Here are some
things we can do today.” And they said, “Well, what I really need is something
that does this.” And I said, “Oh, well, we can do that. Here’s how we would do
it.” And we showed them the roadmap. So it’s both. I will take that information
back to my team and say, “Hey, we now need to build this.”
Gardner: Is
there still a language, culture, or organizational divide? It seems to me that
you’re talking apples and oranges when it comes to business requirements and
what the data and technology can produce. How can we create a Rosetta Stone effect here?
Norton: In
the certification, we are focused on supporting that data scientists have to
understand the business problems. Everything begins from that.
In
the certification, we are focused on supporting that data scientists
have to understand the business problems. Everything begins from that.
Knowing how to ask the right questions, to scope the problem, and be
able to then translate is essential.
Gardner: I
have been around long enough to remember when the notion of a chief information
officer (CIO) was new and fresh. There are some similarities to what I remember
from those conversations in what I’m hearing now. Should we think about the
data scientist as a “chief” something, at the same level as a chief technology officer
(CTO) or a CIO?
Chief Data Officer defined
Fleming: There
are certainly a number of organizations that have roles such as mine, where
we've combined economics and analytics. Amazon has done it on
a larger scale, given the nature of their business, with supply chains, pricing,
and recommendation engines. But other firms in the technology industry have as
well.
We have found that there are
still three separate needs, if you will. There is an infrastructure need that
CIO teams are focused on. There are significant data governance and management
needs that typically chief data officers (CDOs) are focused on. And there are
substantial analytics capabilities that typically chief analytics officers (CAOs)
are focused on.
It's certainly possible in
many organizations to combine those roles. But in an organization the size of
IBM, and other large entities, it's very difficult because of the complexity
and requirements across those three different functional areas to have that all
embodied in a single individual.
Gardner: In
that spectrum you just laid out – analytics, data, and systems -- where does The Open
Group process for a certified data scientist fit in?
Fleming: It's really on the analytics side. A lot of what CDOs do is data engineering, creating data platforms. At IBM, we use the term Watson Data Platform because it's built on a certain technology that's in the public cloud. But that's an entirely separate challenge from being able to create the analytics tools and deliver the business insights and business value that Maureen and George referred to.
Gardner: I
should think this is also going to be of pertinent interest to government
agencies, to nonprofits, to quasi-public-private organizations, alliances, and
so forth.
Given that this has societal-level
impacts, what should we think about in improving the data scientists’ career
path? Do we have the means of delivering the individuals needed from our
current educational tracks? How do education and certification relate to each
other?
Academic avenues to certification
Fleming: A
number of universities have over the past three or four years launched programs
for a master’s
degree in data science. We are now seeing the first graduates of those
programs, and we are recruiting and hiring.
I think this will be the first
year that we bring in folks who have completed a master’s in data science
program. As we all know, universities change very slowly. It's the early days,
but demand will continue to grow. We have barely scratched the surface in terms
of the kinds of positions and roles across different industries.
That growth in demand will cause
many university programs to grow and expand to feed that career track. It takes
15 years to create a profession, so we are in the early days of this.
Norton: With the new certification, we are doing outreach to universities because several of them have master’s in data analytics programs. They do significant capstone-type projects, with real clients and real data, to solve real problems.
We want to provide a path for
them into certification so that students can earn, for example, their first
project profile, or experience profile, while they are still in school.
Gardner:
George, on the organic side -- inside of companies where people find a variety
of tracks to data scientist -- where do the prospects come from? How does
organic development of a data scientist professional happen inside of
companies?
Stark: At IBM,
in our group, Global Services, in
particular, we've developed a training program with a set of badges. They get
rewarded for achievement in various levels of education. But you still need to
have projects you've done with the techniques you've learned through
education to get to certification.
Having education is not enough.
You have to apply it to get certified.
Gardner: This
is a great
career path, and there is tremendous demand in the market. It also strikes
me as a very fulfilling and rewarding career path. What sorts of impacts can these
individuals have?
Data Scientist
Fleming: Businesses
have traditionally been managed through a profit-and-loss statement, an income
statement, for the most part. There are, of course, other data sources -- but
they’re largely independent of each other. These include sales opportunity
information in a CRM
system, supply chain information in ERP
systems, and financial information portrayed in an income statement. These get
the most rigorous attention, shall we say.
We're now in a position to create
much richer views of the activity businesses are engaged in. We can integrate
across more datasets now, including human resources data. In addition, the
nature of machine
learning (ML) and artificial
intelligence (AI) are predictive. We are in a position to be able to not
only bring the data together, we can provide a richer view of what's
transpiring at any point in time, and also generate a better view of where
businesses are moving to.
It may be about defining a sought-after
destination, or there may be a need to close gaps. But understanding where the business
is headed in the next 3, 6, 9, and 12 months is a significant value-creation
opportunity.
Gardner: Are
we then thinking about a data scientist as someone who can help define what the
new, best business initiatives should be? Rather than finding those through
intuition, or gut instinct, or the highest
paid person's opinion, can we use the systems to tell us where our next
product should come from?
Pioneers of insight
Norton: That's
certainly the direction we are headed. We will have systems that augment that
kind of decision-making. I view data scientists as pioneers. They're able to go
into big data, dark data, and a lot of
different places and push the boundaries to come out with insights that can
inform in ways that were not possible before.
It’s a very rewarding career
path because there is so much value and promise that a data scientist can bring.
They will solve problems that hadn't been addressed before.
It's a very exciting career
path. We’re excited to be launching the certification program to help data
scientists gain a clear path and to make sure they can demonstrate the right skills.
It's
a very rewarding career path because there is so much value and promise
that a data scientist can bring. They will solve problems that hadn't
been addressed before.
Stark: I
think so. If we can get more people to do data science and understand its
value, I'd be really happy. It's been fun for 30 years for me. I have had a
great time.
Gardner: What
comes next on the technology side that will empower the date scientists of
tomorrow? We hear about things like quantum computing, distributed ledger,
and other new capabilities on the horizon?
Future forecast: clouds
Fleming: In
the immediate future, new benefits are largely coming because we have both
public cloud and private cloud in a hybrid structure, which brings the data,
compute, and the APIs together in one place. And that allows for the kind of
tools and capabilities that necessary to significantly improve the performance
and productivity of organizations.
Blockchain is making
enormous progress and very quickly. It's essentially a data management and storage
improvement, but then that opens up the opportunity for further ML and AI applications
to be built on top of it. That’s moving very quickly.
Quantum computing is further
down the road. But it will change the nature of computing. It's going to take
some time to get there but it nonetheless is very important and is part of that
what we are looking at over the horizon.
Gardner:
Maureen, what do you see on the technology side as most interesting in terms of
where things could lead to the next few years for data science?
Norton: The
continued evolution of AI is pushing boundaries. One of the really interesting
areas is the emphasis on transparency and ethics, to make sure that the systems
are not introducing or perpetuating a bias. There is some really exciting work
going on in that area that will be fun to watch going forward.
Gardner: The data
scientist needs to consider not just what can be done, but what
should be done. Is that governance angle brought into the certification
process now, or something that it will come later?
Stark: It's
brought into the certification now when we ask about how were things validated
and how did the modules get implemented in the environment? That’s one of the
things that data scientists need to answer as part of the certification. We
also believe that in the future we are going to need some sort of code of
ethics, some sort of methods for bias-detection and analysis, the measurement
of those things that don't exist today and that will have to.
Gardner: Do
you have any examples of data scientists doing work that's new, novel, and
exciting?
Rock star potential
Fleming: We
have a team led by a very intelligent and aggressive young woman who has put
together a significant product recommendation tool for IBM. Folks familiar with
IBM know it has a large number of products and offerings. In any given client
situation the seller wants to be able to recommend to the client the offering
that's most useful to the client’s situation.
And our recommendation engines
can now make those recommendations to the sellers. It really hasn't existed in
the past and is now creating enormous value -- not only for the clients but for
IBM as well.
Gardner:
Maureen any examples jump to mind that illustrate the potential for the data
scientist?
Norton: We wrote
a book, Analytics
Across the Enterprise, to explain examples across nine different
business units. There have been some great examples in terms of finance, sales,
marketing, and supply chain.
Data Scientist
Gardner: Any
use-case scenario come to mind where the certification may have been useful?
Norton: Certification
would have been useful to an individual in the past because it helps map out
how to become the best practitioner you can be. We have three different levels
of certification going up to the thought leader. It's designed to help that
professional grow within it.
Stark: A young
man who works for me in Brazil built a model for one of our manufacturing
clients that identifies problematic infrastructure components and recommends
actions to take on those components. And when the client implemented the model,
they saw a 60 percent reduction in certain incidents and a 40,000-hour-a-month
increase in availability for their supply chain. And we didn't have a
certification for him then -- but we will have now.
Gardner: So
really big improvement. It shows that being a data scientist means you're impactful
and it puts you in the limelight.
IBM
has built an internal process that matches with The Open Group. Other
companies are getting accredited for running a version of the
certification themselves, too.
Gardner: For
those folks who might be more intrigued with a career path toward certification
as a data scientist, where might they go for more information? What are the next
steps when it comes to the process through The Open Group, with IBM, and the
industry at large?
Where to begin
Norton: The Open
Group officially
launched this in January, so anyone can go to The Open Group website and check under certifications. They will
be able to read the information about how to apply. Some companies are
accredited, and others can get accredited for running a version of the
certification themselves.
IBM recently went through the
certification process. We have built an internal process that matches with The Open
Group. People can apply either directly to The Open Group or, if they happen to
be within IBM or one of the other companies who will certify, they can apply
that way and get the equivalent of it being from The Open Group.
Listen
to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: The Open Group.
You may also be
interested in:
- The Open Group panel explores ways to help smart cities initiatives overcome public sector obstacles
- The Open Group digital practitioner effort eases the people path to digital business transformation
- How The Open Group Healthcare Forum and Health Enterprise Reference Architecture cures process and IT ills
- Why government agencies could lead the way in demanding inter-public cloud interoperability and standardization
- Panel explores how the IT4IT Reference Architecture acts as a digital business enabler
- The UNIX evolution: A history of innovation reaches an unprecedented 20-year milestone
- The Open Group president, Steve Nunn, on the inaugural TOGAF User Group and new role of EA in business transformation
- A Tale of Two IT Departments, or How Cloud Governance is Essential in the Bimodal IT Era
- Securing Business Operations and Critical Infrastructure: Trusted Technology, Procurement Paradigms, Cyber Insurance