The next BriefingsDirect
big-data use case leadership discussion explores how retail luxury goods market analysis provider
Sky I.T.
Group has upped its game to provide more buyer behavior analysis faster
-- and with more user depth.
Learn how Sky I.T. changed its
data analysis
platform infrastructure to Hewlett Packard Enterprise (HPE)
Vertica -- and why that has helped solve its challenges
around data variety, velocity, and volume and make better insights
available across the luxury retail marketplace.
To share how retail intelligence just got a whole lot smarter, we welcome
Jay Hakami, President;
Dane Adcock, Vice President of Business Development, and
Stephen Czetty, Vice President and Chief Technology Officer, all at Sky I.T. Group in New York. The discussion is moderated by me,
Dana Gardner, Principal Analyst at
Interarbor Solutions.
Here are some excerpts:
Gardner:
What's driving the need for greater and better
big-data analysis for luxury retailers? Why do they need to know more, better,
faster?
Adcock: Well, customers have more
choices. As a result, businesses need to be more agile and responsive
and fill the customer's needs more completely or lose the business.
That's driving the entire industry into practices that mean shorter
times from design to shelf in order to be more responsive.
It
has created a great deal of gross marketing pressure, because there's
simply more competition and more selections that a consumer can make
with their dollar today.
Gardner: Is there
anything specific to the retail process around luxury goods that is even
more pressing when it comes to this additional speed?
Adcock:
Yes. The downside to making mistakes in terms of designing a product
and allocating it in the right amounts to locations at the store level
carries a much greater penalty, because it has to be liquidated. There's
not a chance to simply cut back on the
supply chain side, and so margins are more at risk in terms of making the mistake.
Ten
years ago, from a fashion perspective, it was about optimizing the
return and focusing on winners. Today, you also have to plan to manage
and optimize the margins on your losers as well. So, it's a total
package.
Gardner: So, clearly, the more you know
about what those users are doing or what they have done is going to be
essential. It seems to me, though, that we'rere talking about a
market-wide look rather than just one store, one retailer, or one brand.
How does that work, Jay? How do we get to the point
where we've been able to gather information at a fairly comprehensive
level, rather than cherry-picking or maybe getting a non-representative
look based on only one organization’s view into the market?
Hakami: With
SKYPAD,
what we're doing is collecting data from the supplier, from the
wholesaler, as well as from their retail stores, their wholesale
business, and their dot-com, meaning the whole omni channel. When we
collect that data, we cleanse it to make sure its meaningful to the
user.
Now, we're dealing with a connected world where the
retailer, wholesalers, and suppliers have to talk to one another and
plan together for the buying season. So the partnerships and the insight
that they get into the product performance is extremely important, as
Dane mentioned, in terms of the gross margin and in terms of the
software information. SKYPAD basically provides that intelligence, that
insight, into this retail/wholesale world.
Gardner:
Isn’t this also a case where people are
opening up their information and making it available for the benefit of a
community or recognizing that the more data and the more analysis
that’s available, the better it is for all the participants, even if
there's an element of competition at some point?
Hakami: That's correct. The retail business likes to share the
information with their suppliers, but they're not sharing it across all
the suppliers. They're sharing it with each individual supplier. Then,
you have the market research companies who come in and give you
aggregation of trends and so on. But the retailers are interested in
sell-through. They're interested in telling X supplier, "This is how
your products are performing in my stores."
If they're
not performing, then there's going to be a mark down. There's going to
be less of a margin for you and for us. So, there's a very strong
interest between the retailer and a specific supplier to improve the
performance of the product and the sell-through of those products on the
floor.
Gardner: Before we learn more about the
data science and dealing with the technology and business case issues,
tell us a little bit more about Sky I.T. Group, how you came about, and
what you're doing with SKYPAD to solve some of these issues across this
entire supply chain and retail market spot.
Complex history
Hakami:
I'll take the beginning. I'll give you a little bit of the history,
Dana, and then maybe Dane and Stephen can jump in and tell you what we
are doing today, which is extremely complex and interesting at the same
time.
We started with SKYPAD about eight years ago. We
found a pain point within our customers where they were dealing with so
many retailers, as well as their own retail stores, and not getting the
information that they needed to make sound business decisions on a
timely basis.
We started with one customer, which was
Theory. We came to them and we said, "We can give you a solution where
we're going to take some data from your retailers, from your retail
stores, from your dot-com, and bring it all into one dashboard, so you
can actually see what’s selling and what’s not selling."
Fast forward, we've been able to take not only
EDI transactions, but also retail portals. We're taking information from any format you can imagine -- from
Excel,
PDF,
merchant spreadsheets -- bringing that wealth of data into our data
warehouse, cleansing it, and then populating the dashboard.
So
today, SKYPAD is giving a wealth of information to the users by the
sheer fact that they don’t have to go out by retailer and get the
information. That’s what we do, and we give them, on a Monday morning,
the information they need to make decisions.
As these business intelligence (BI)
tools have become more popular, the distribution of data coming from
the retailers has gotten more ubiquitous and broader in terms of the
metrics.
Dane, can you elaborate more on this as well?
Adcock:
This process has evolved from a time when EDI was easy, because it was
structured, but it was also limited in the number of metrics that were
provided by the mainstream. As these
business intelligence (BI)
tools have become more popular, the distribution of data coming from
the retailers has gotten more ubiquitous and broader in terms of the
metrics.
But the challenge has moved from reporting to
identification of all these data sources and communication
methodologies and different formats. These can change from week to week,
because they're being launched by individuals, rather than systems, in
terms of Excel spreadsheets and PDF files. Sometimes, they come from
multiple sources from the same retailer.
One of our
accounts would like to see all of their data together, so they can see
trends across categories and different geographies and markets. The
challenge is to bring all those data sources together and align them to
their own item master file, rather than the retailer’s item master file,
and then be able to understand trends, which accounts are generating
the most profits, and what strategies are the most profitable.
It's
been a shifting model from the challenge of reporting all this data
together, to data collection. And there's a lot more of it today,
because more retailers report at the
UPC
level, size level, and the store level. They're broadcasting some of
this data by day. The data pours in, and the quicker they can make a
decision, the more money they can make. So, there's a lot of pressure to
turn it around.
Gardner: When you're putting out those reports on Monday morning, do you
get queries back? Is this a sort of a conversation, if you will, where
not only are you presenting your findings, but people have specific
questions about specific things? Do you allow for them to do that, and
is the data therefore something that’s subject to query?
Subject to queries
Adcock:
It’s subject to queries in the sense that they're able to do their own
discovery within the data. In other words, we put it in a BI tool, it’s
on the web, and they're doing their own analysis. They're probing to see
what their best styles are. They're trying to understand how colors are
moving, and they're looking to see where they're low on stock, where
they may be able to backfill in the marketplace, and trying to
understand what attributes are really driving sales.
But
of course, they always have questions about completeness of the data.
When things don’t look correct, they have questions about it. That
drives us to be able to do analysis on the fly, on-demand, and deliver
some responses, "All your stores are there, all of your locations,
everything looks normal." Or perhaps there seems to be some flaws or
things in the data that don’t actually look correct.
Not
only do we need to organize it and provide it to them so that they can
do their own broad, flexible analysis, but they're coming back to us
with questions about how their data was audited. And they're looking for
us to do the analysis on the spot and provide them with satisfactory
answers.
Gardner: Stephen Czetty, we've heard
about the use case, the business case, and how this data challenge has
grown in terms of variety as well as volume. What do you need to bring
to the table from the data architecture to sustain this
growth and provide for the agility that these market decision-makers
are demanding?
Czetty: We started out with an
abacus, in a sense, but today we collect information from thousands of
sources literally every single week. Close to 9,000 files will come
across to us and we'll process them correctly and sort of them out --
what client they belong to and so forth, but the challenge is forever
growing.
We
needed to go from older technology to newer technology, because our
volumes of data are increasing and the amount of time that we need to
consume to data in is static.
So we're quite aware that we have a time limit. We found HPE
Vertica
as a platform for us to be able to collect the data into a coherent
structure in a very rapid time as opposed to our legacy systems.
It
allows us to treat the data in a truly vertical way, although that has
nothing to do with the application or the database itself. In the past
we had to deal with each client separately. Now we can deal with each
retailer separately and just collect their data for every single client
that we have. That makes our processes much more pipelined and far
faster in performance.
The secret sauce behind that is
the ability in our Vertica environment to rapidly sort out the data --
where it belongs, who it belongs to -- calculate it out correctly, put
it into the database tables that we need to, and then serve it back to
the front end that we're using to represent it.
That's why we've shifted from a traditional database model to a Vertica-type model. It's 100 percent
SQL
for us, so it looks the same for everybody who is querying it, but
under the covers we get tremendous performance and compression and lots
of cost savings.
Gardner: For some organizations
that are dealing with the different sources and different types of
data, cleansing is one problem. Then, the ability to warehouse that and
make it available for queries is a separate problem. You've been able to
tackle those both at the same time with the same platform. Is that
right?
Proprietary parsers
Czetty: That's correct. We get the data, and we have proprietary
parsers
for every single data type that we get. There are a couple of hundred
of them at this point. But all of that data, after parsing, goes into
Vertica. From there, we can very rapidly figure out what is going where
and what is not going anywhere, because it’s incomplete or it’s not
ours, which happens, or it’s not relevant to our processes, which
happens.
We can sort out what we've collected very
rapidly and then integrate it with the information we already have or
insert new information if it's brand-new. Prior to this, we'd been doing
this by hand to a large-scale, and that's not effective any longer with
our number of clients growing.
Gardner: I'd
like to hear more about what your actual deployment is, but before we do
that, let’s go back to the business case. Dane and Jay, when HPE Vertica
came online, when Steve was able to give you some of these more
pronounced capabilities, how did that translate into a benefit for your
business? How did you bring that out to the market, and what's been the
response?
Hakami: I think the first response was
"wow." And I think the second response was, "Wow, how can we do this
fast and move quickly to this platform?"
Prior to this, we'd been doing this by hand to a large-scale, and
that's not effective any longer with our number of clients growing.
Let me give you some examples. When Steve did the
proof of concept (POC)
with the folks from HPE, we were very impressed with the statistics we
had seen. In other words, going from a processing time of eight or nine
hours to minutes was a huge advantage that we saw from the business
side, showing our customers that we can load data much faster.
The
ability to use less hardware and infrastructure as a result of the
architecture of Vertica allowed us to reduce, and to continue to reduce,
the cost of infrastructure. These two are the major benefits that I've
seen in the evolution of us moving from our legacy to Vertica.
From
the business perspective, if we're able to deliver faster and more
reliably to the customer, we accomplished one of the major goals that we
set for ourselves with SKYPAD.
Adcock: Let me
add something there. Jay is exactly right. The real impact, as it
translates into the business, is that we have to stop processing and
stop collecting data at a certain point in the morning and start
processing it in order for us to make our
service-level agreements (SLAs)
on reporting for our clients, because they start their analysis. The
retail data comes in staggered over the morning and it may not all be in
by the time that we need to shut that processing off.
One
of the things that moving to Vertica has allowed us to do is to cut
that time off later, and when we cut it off later, we have more data, as
a rule, for a customer earlier in the morning to do their analysis.
They don’t have to wait until the afternoon. That’s a big benefit. They
get a much better view of their business.
Driving more metrics
The
other thing that it has enabled us to do is drive more metrics into the
database and do some processing in the database, rather than in the
user tool, which makes the user tool faster and it provides more value.
For
example, maybe for age on the floor, we can do the calculation in the
background, in the database, and it doesn't impede the response in the
front-end engine. We get more metrics in the database calculated rather
than in our user tool, and it becomes more flexible and more valuable.
Gardner:
So not only are you doing what you used to do faster, better, cheaper,
but you're able to now do things you couldn't have done before in terms
of your quality of data and analysis. Is there anything else that is of a
business nature that you're able to do vis-Ã -vis analytics that just
wasn't possible before, and might, in fact, be equivalent of a new
product line or a new service for you?
Czetty:
In the old model, when we got a new client we had to essentially
recreate the processes that we'd built for other clients to match that
new client, because they're collecting that data just for that client
just at that moment.
In the current model, where we're centered on retailers, the only thing
that will take us a long time to do in this particular situation is if
there's a new retailer that we've never collected data from.
So
99 percent of it is the same as any other client, but one percent is
always different, and it had to be built out. On-boarding a client, as
we call it, took us a considerable amount of time -- we are talking
weeks.
In the current model, where we're centered on
retailers, the only thing that will take us a long time to do in this
particular situation is if there's a new retailer that we've never
collected data from. We have to understand their methodology of
delivery, how it comes, how complex it is and so forth, and then create
the logic to load that into the database correctly to match up with what
we are collecting for others.
In this scenario, since
we’ve got so many clients, very few new stores or new retailers show up,
and typically it’s just our clients on retail chain, and therefore our
on-boarding is just simplified, because if we are getting
Nordstrom’s data from client A, we're getting the same exact data for client B, C, D, E, and F.
Now,
it comes through a single funnel and it's the Nordstrom funnel. It’s
just a lot easier to deal with, and on-boarding comes naturally.
Hakami:
In addition to that, since we're adding more significant clients, the
ability to increase variety, velocity, and volume is very important to
us. We couldn't scale without having Vertica as a foundation for us.
We'd be standing still, rather than moving forward and being innovative,
if we stayed where we were. So this is a monumental change and a very
instrumental change for us going forward.
Gardner:
Steve, tell us about your actual deployment. Is this a
single tenant environment? Are you on a single database? What’s your
server or
data center
environment? What's been the impact of that on your storage and
compression and costs associated with some of the ancillary issues?
Multi-tenant environment
Czetty: To begin with, we're coming from a multi-tenant environment. Every client had its own private database in the past, because in IBM
DB2,
we couldn't add all these clients into one database and get the job
done. There was not enough horsepower to do the queries and the loads.
We ran a number of databases on a farm of servers, on
Rackspace
as our hosting system. When we brought in Vertica, we put up a minimal
configuration with three nodes, and we're still living with that minimal
configuration with three nodes.
We haven't exhausted
our capacity on the license by any means whatsoever in loading up this
data. The compression is obscenely high for us, because at the end of
the day, our data absolutely lends itself to being compressed.
Everything
repeats over and over again every single week. In the world of Vertica,
that means it only appears once in wherever it lives in the database,
and the rest of it is magic. Not to get into the technology underneath
it at this point, from our perspective, it's just very effective in that
scenario.
With the three nodes, we've had zero problems with performance. It
hasn't been an issue at all. We're just looking back and saying that we
wish we had this a little sooner.
Also in our IBM DB2 world, we're using quite costly large
SAN
configurations with lots of spindles, so that we can have the data
distributed all across the spindles for performance on DB2, and that
does improve the performance of that product.
However,
in HPE Vertica, we have 600 GB drives and we can just pop more in if we
need to expand our capacity. With the three nodes, we've had zero
problems with performance. It hasn't been an issue at all. We're just
looking back and saying that we wish we had this a little sooner.
Vertica
came in and did the install for us initially. Then, we ended up taking
those servers down and reinstalling it ourselves. With a little
information from the guide, we were able to do it. We wanted to learn it
for ourselves. That took us probably a day and a half to two days, as
opposed to Vertica doing it in two hours. But other than that,
everything is just fine. We’ve had a little training, we’ve gone to the
Vertica event to learn how other people are dealing with things, and
it's been quite a bit of fun.
Now there is a lot of
work we have to do at the back end to transform our processes to this
new methodology. There are some restrictions on how we can do things,
updates and so forth. So, we had to reengineer that into this new
technology, but other than that, no changes. The biggest change is that
we went vertical on the retail silos. That's just a big win for us.
Gardner: As you know, HPE Vertica is
cloud-ready. Is there any benefit to that further down the road where maybe
it’s around issues of a spike demand in holiday season, for example, or
for backup recovery or business continuity? Any thoughts about where you
might leverage that cloud readiness in the future?
Dedicated servers
Czetty:
We're already sort of in the cloud with the use of dedicated servers,
but in our business, the volume increases in the stores around holidays
is not doubling the volume. It’s adding 10 percent, 15 percent, maybe 20
percent of the volume for the holiday season. It hasn’t been that big a
problem in DB2. So, it’s certainly not going to be a problem in
Vertica.
We've looked at
virtualization
in the cloud, but with the size of the hardware that we actually want
to run, we want to take advantage of the speed and the memory and
everything else. We put up pretty robust servers ourselves, and it turns
out that in secure cloud environments like we're using right now at
Rackspace, it's simply less expensive to do it as dedicated equipment.
To spin up a machine, like another node for us at Rackspace, would take
about same time it would take for virtual system setup and configure to a
day or so. They can give us another node just like this on our rack.
We
looked at the cloud financially every single time that somebody came
around and said there was a better cloud deal, but so far, owning it
seems to be a better financial approach.
Gardner:
Before we close out, looking to the future, I suppose the retailers are
only going to face more competition. They're going to be getting more
demand from their end users or customers for user experience for
information.
We looked at the cloud financially every single time that somebody came
around and said there was a better cloud deal, but so far, owning it
seems to be a better financial approach.
We're
going to see more mobile devices that will be used in a dot-com world or
even a retail world. We are going to start to see geolocation data
brought to bear. We're going to expect the
Internet of Things (IoT) to kick in at some point where there might be more sensors involved either in a retail environment or across the supply chain.
Clearly,
there's going to be more demand for more data doing more things faster.
Do you feel like you're in a good position to do that? Where do you see
your next challenges from the data-architecture perspective?
Czetty:
Not to disparage too much the industry of luxury, but at this point,
they're not the bleeding edge on the data collection and analysis side,
where they are on the bleeding edge on social media and so forth. We've
anticipated that. We've got some clients who were collecting information
about their web activities and we have done analysis for identifying
customers who are presenting different personas through their different
methods as they contact the company.
We're dabbling in
that area and that’s going to grow as it becomes so tablet-oriented or
phone-oriented as the interfaces go. A lot of sales are potentially
going to go through social media and not just the official websites in
the future.
We'll be capturing that information as
well. We’ve got some experience with that kind of data that we’ve done
in the past. So, this is something I'm looking forward to getting more
of, but as of today, we’re only doing it for a few clients.
Well positioned
Hakami:
In terms of planning, we're very well-positioned as a hub between the
wholesaler and the retailer, the wholesaler and their own retail stores,
as well as the wholesaler and their dot-coms. One of the things that we
are looking into, and this is going to probably get more oxygen next
year, is also taking a look at the relationships and the data between
the retailer and the consumer.
As you mentioned, this
is a growing area, and the retailers are looking to capture more of the
consumer information so they can target-market to them, not based on
segment but based on individual preferences. This is again a huge amount
of data that needs to be cleansed, populated, and then presented to the
CMOs of companies to be able to sell more, market more, and be in front
of their customers much more than ever before.
Gardner:
That’s a big trend that we are seeing in many different sectors of the
economy -- that drive for personalization, and it really is a result of
these data technologies to allow that to happen.
Any other thoughts about where the intersection of
computer science capabilities and market intelligence demands are coming
together in new and interesting ways?
Adcock:
I'm excited about the whole approach to leveraging some predictive
capabilities alongside the great inventory of data that we've put
together for our clients. It's not just about creating better forecasts
of demand, but optimizing different metrics, using this data to
understand when product should be marked down, what types of attributes
of products seem to be favored by different locations of stores that are
obviously alike in terms of their shopper profiles, and bringing
together better allocations and quantities in breadth and depth of
products to individual locations to drive better, higher percentage of
full-price selling and fewer markdowns for our clients.
So it’s a predictive side, rather than discovery using a BI tool.
Czetty:
Just to add to that, there's the margin. When we talked to CEOs and
CFOs five or six years ago and told them we could improve business by
two, three, or four percent, they were laughing at us, saying it was
meaningless to them. Now, three, four, or five percent, even in the
luxury market, is a huge improvement to business. The companies like
Michael Kors, Tory Burch, Marc Jacobs, Giorgio Armani, and Prada are all
looking for those margins.
I'm excited about the whole approach to leveraging some predictive
capabilities alongside the great inventory of data that we've put
together for our clients.
So, how do we become
more efficient with a product assortment, how do we become more
efficient with distribution and all of these products to different sales
channels, and then how do we increase our margins? How do we not
over-manufacture and not create those blue shirts in Florida, where they
are not selling, and create them for Detroit, where they're selling
like hotcakes.
These are the things that customers are
looking at and they must have that tool or tools in place to be able to
manage their merchandising and by doing so become a lot more agile and a
lot more profitable.
Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.
You may also be interested in: