Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.
Snagajob recently delivered 4 million new jobs applications in a single month through their systems. To learn how they're managing such impressive scale, BriefingsDirect sat down with Robert Fehrmann, Data Architect at Snagajob in Richmond, Virginia. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.
Here are some excerpts:
Gardner: Tell us about your jobs matching organization. You’ve been doing this successfully since 2000. Let's understand the role you play in the employment market.
Fehrmann: Snagajob, as you mentioned, is America's largest hourly network for employees and employers. The hourly market means we have, relatively speaking, high turnover.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Register now
Gain access to the free HP Vertica Community Edition
Gardner: Tell us how you use big data to improve your operations. I believe that among the first ways that you’ve done that is to try to better analyze your performance metrics. What were you facing as a problem when it came to performance? [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]
Signs of stress
Fehrmann: A couple of years ago, we started looking at our environment, and it became obvious that our traditional technology was showing some signs of stress. As you mentioned, we really have data at scale here. We have 20,000 to 25,000 postings per day, and we have about 700,000 unique visitors on a daily basis. So data is coming in very, very quickly.
Fehrmann |
Gardner: And of course, near real time is important. You want to catch degradation in any fashion from your systems right away. How do you then go about getting this in real time? How do you do the analysis?
Fehrmann: We started using Hadoop. I'll use a lot of technical terms here. From our website, we're getting events. Events are routed via Flume directly into Hadoop. We're collecting about 600 million key-value pairs on a daily basis. It's a massive amount of data, 25 gigabytes on a daily basis.
The second piece in this journey to big data was analyzing these events, and that’s where we're using HP Vertica. Second, our original use case was to analyze a funnel. A funnel is where people come to our site. They're searching for jobs, maybe by keyword, maybe by zip code. A subset of that is an interest in a job, and they click on a posting. A subset of that is applying for the job via an application. A subset is interest in an employer, and so on. We had never been able to analyze this funnel.
The dataset is about 300 to 400 million rows, and 30 to 40 gigabytes. We wanted to make this data available, not just to our internal users, but all external users. Therefore, we set ourselves a goal of a five-second response time. No query on this dataset should run for more than five seconds -- and Vertica and Hadoop gave us a solution for this.
Gardner: How have you been able to increase your performance reach your key performance indicators (KPIs) and service-level agreements (SLAs)? How has this benefited you?
Fehrmann: Another application that we were able to implement is a recommendation engine. A recommendation engine is that use where our jobseekers who apply for a specific job may not know about all the other jobs that are very similar to this job or that other people have applied to.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Register now
Gain access to the free HP Vertica Community Edition
Gardner: So you took the success from your big-data implementation and analysis capabilities from this performance task to some other areas. Are there other business areas, search yield, for example, where you can apply this to get other benefits?
Brand-new applications
Fehrmann: When we started, we had the idea that we were looking for a solution for migrating our existing environment, to a better-performing new environment. But what we've seen is that most of the applications we've developed so far are brand-new applications that we hadn't been able to do before.
You mentioned search yield. Search yield is a very interesting aspect. It’s a massive dataset. It's about 2.5 billion rows and about 100 gigabytes of data as of right now and it's continuously increasing. So for all of the applications, as well as all of the search requests that we have collected since we have started this environment, we're able to analyze the search yield.
Most of the applications we've developed so far are brand-new applications that we hadn't been able to do before.
For example, that's how many applications we get for a specific search keyword in real time. By real time, I mean that somebody can run a query against this massive dataset and gets result in a couple of seconds. We can analyze specific jobs in specific areas, specific keywords that are searched in a specific time period or in a specific location of the country.
Gardner: And once again, now that you've been able to do something you couldn't do before, what have been the results? How has that impacted change your business? [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]
Fehrmann: It really allows our salespeople to provide great information during the prospecting phase. If we're prospecting with a new client, we can tell him very specifically that if they're in this industry, in this area, they can expect an application flow, depending on how big the company is, of let’s say in a hundred applications per day.
Gardner: How has this been a benefit to your end users, those people seeking jobs and those people seeking to fill jobs?
Fehrmann: There are certainly some jobs that people are more interested in than others. On the flip side, if a particular job gets a 100 or 500 applications, it's just a fact that only a small number going to get that particular job. Now if you apply for a job that isn't as interesting, you have much, much higher probability of getting the job.
Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP.
You may also be interested in:
- Zynga builds big data innovation culture by making analytics open to all developers
- How big data powers GameStop to gain retail advantage and deep insights into its markets
- Data-driven apps performance monitoring spurs broad business benefits for Swiss insurer and Turkish mobile carrier
- How Malaysia’s Bank Simpanan Nasional implemented a sweeping enterprise content management system
- Redcentric Uses Advanced Configuration Database to Focus Massive Merger Across Multiple Networks
- HP at Discover delivers the industry's first open, hybrid, ecosystem-wide cloud architecture
- How Tableau Software and Big Data Come Together: Strong Visualization Embedded on an Agile Analytics Engine
- Big Data Helps Conservation International Proactively Respond to Species Threat in Tropical Forests
- How Globe Testing helps startups make the leap to cloud- and mobile-first development
- GoodData analytics developers on what they look for in a big data platform
- ITIL-ITSM tagteam boosts Mexican ISP INFOTEC's operations quality
- Novel consumer retail behavior analysis from InfoScout relies on HP Vertica big data chops
- IT Operations Modernization Helps Energy Powerhouse Exelon Acquire Businesses
No comments:
Post a Comment