DataTaunew | comments | leaders | submitlogin
Ask DT: Are we in a data scientist bubble?
12 points by roycoding 3289 days ago | 17 comments
(People love to talk about bubbles, so I thought we could talk about one closer to home.)

I'm a data scientist. Like most data scientists you meet, I didn't call myself a data scientist just a few years ago. In fact, I used to be a physicist who happened to love working with computers and programming. A few years ago I started hearing more about the idea of a data scientist and decided to transition into the world of "tech". I was not alone in wanting to make that transition it seems, and I regularly field questions from prospective data scientists asking how to make that transition.

Having met many, many new data scientists, it has led me to wonder if we are starting to reach some saturation level. Many articles cite the 2011 study from McKinsey about the pending shortage that will leave us approximately 20 bajillion data scientists below the market need. I am unaware of an update to that study and have heard many stories of people having very long job hunts. Additionally, my relatively unknown startup has recently started the process to hire another data scientist and we have been inundated with applications. I'm sure this inundation is not unusual when anyone with an internet connection can apply, but so far we have had very few applicants that were obviously unqualified. Nearly all had education and/or work experience fitting of the title data scientist.

An article recently submitted to DataTau,

http://firstround.com/review/how-to-consistently-hire-remarkable-data-scientists/

claims that they need/get 150 applicants for every (great) hire. The article's point seems to be that you need a robust system in place to filter out the 149 to get the one shiny, brilliant diamond in the rough. Another take away could be that the market of potential data science hires is very crowded.

So the question is "are we now in a data scientist bubble?", with more so-called talent than jobs to fill?

I am interested in any thoughts people have on this.



10 points by apor 3289 days ago | link

IMO this isn't a bubble. What people are seeing is an immature and ill-defined field that wants very technically mature people with skills that 95% match specific jobs. I think a lot of organizations don't even understand what value "Data Science" adds, so they don't know how to staff their org with Data Scientists.

I work in a data science (software company) team and it's astonishing to me that a) Our technical skills vary greatly from person to person and b) we have no culture of mentorship or training - Management sees no room for fresh grads even though all the technical people do.

I visit customers a lot and talk to startups (including applying for jobs). It quickly becomes apparent that most places are like where I work. Since "Data Science" is so new they want people who can immediately come in and show leadership in a vertical/horizontal, subject area (e.g. click-through analysis, fraud analysis) with very particular personality traits and technical skills.

Many positions also have contradictions - we want very experienced people who can work 60+ hours/week and travel 50%. Except very experienced people have other commitments and are going to be hesitant to commit to these types of positions. IMO a fresh college grad attached to a mentor is a great fit for this.

A lot of Data Science jobs are in startups or small companies - so again - little room for mentorship, training, and development since there is no time and people to do it.

Compare this to software (e.g. Java) development. It's been around for decades so there is lots of room for Juniors, Seniors, and a culture of what it means to go up the experience and skillset ladder. The value they add to and organization is also understood.

My take away from the article you linked to is "We want somebody who can start the job at 100%. But somebody else should have taken the risk and paid for the training, guidance, and mistakes. So we only take risk in the interview process."

I think in 5 - 10 years both the technology and the field will mature enough that many organizations (big and small) will have a data science team delivering defined and understood value. By that point the "Data Science" hype will have also died down and they'll be a more stable culture of what to do with juniors, seniors, hiring, and training.

-----

2 points by gata 3286 days ago | link

Out of curiosity...what are the types of data science jobs that require you to travel to much?

-----

2 points by apor 3286 days ago | link

I'll provide some examples and the travel requirements I've seen. Then I'll add a caveat due to my own sampling bias.

1. A lot of companies have a small or no data science team. So a lot of work is contracted. Contractors can be at 75%+ travel if they are not local.

2. Companies that sell software with "advanced analytic" (e.g. R) capabilities that is customized for customers. There are also companies that sell prebuilt solutions for specific use cases (e.g. oil production analysis analysis) where the core R/Python scripts are already built, but they have to integrate with customer's IT infrastructure and validate output. Closer to 50% travel (this includes sales and services).

3. Some large (e.g. 30,000+ employees) companies have data science teams that are tasked with helping the rest of the company run better. They don't focus on one area. Rather, they visit different divisions and evangelize and create data science for better operations. They operate almost like internal contractors when conveying their value to upper management. Travel up to 25% since different divisions can be spread across a country or even the world.

Now my caveat. Most of what I've described are describing positions for data science generalists, not experts in one specific area (e.g. looking at price elasticity in consumer goods). From what I've seen on job sites, a lot of more focused data science positions are in advertising (or related) and deep data mining of people. I typically don't focus on these jobs so I'm missing data on a lot of 0-5% travel positions.

Also keep in mind that many companies want guidance. So even if you are an expert in a highly focused subject area in a role that isn't customer facing, there is value in getting you out there to talk to people about how to do things and best practices.

-----

1 point by patwater 3286 days ago | link

Great points about the lack of definition. I'd add: http://simplystatistics.org/2015/03/17/data-science-done-wel...

-----

6 points by dragibus420 3288 days ago | link

Being a data scientist today is like being a webmaster 15 years ago, a sort of jack of all trades in a brand new blooming job domain. IMO the term "data scientist" is not meant to stay forever: as there are no more webmasters today but rather front-end devs, sysadmins, designers, people dealing with SEO, online advertising, etc. There will be dedicataded branches of data science specializations, in data scraping/munging, machine-learning, database management, dataviz...

-----

4 points by apor 3288 days ago | link

Another interesting aspect here are "Data Science" programs in colleges/universities. My experience looking at new grads is they are all focused on statistical and computational modeling/algos, albeit with a lot of variability in the depth of their education (e.g. apply a linear model in SAS vs specify a new covariance structure to the errors and derive the respective equations or code a linear model with this).

This has created a situation where lots of people want to get into a specific Data niche and they see Data Science as one thing. But how many Data Modeling jobs are there and how many can there be? Especially when you consider how quickly statistical and machine learning is being automated and is built into software.

Any company working in Data does lots of stuff outside of this. This includes BI, Data Integration, DB Modeling, Custom Visualizations (e.g. D3), Security, Customization, and sitting in front of clients to scope projects and explain how to get value from their Data.

We have a person who is amazing at getting customer authentication integrated into our Data product. So when a customer asks "We use a token and blah blah blah for our data security" this person figures out how to get our software to authenticate through their system. This person is a great R programmer, but they know the bare minimum about modeling in R.

If our statistical model is half-baked that might jeopardize a deal. But If we can't use their security and authentication infrastructure the deal is dead - no compromise there. So this person is really critical in our data team, but they aren't what a lot of people see as critical to Data Science.

The growth of Data jobs is going to be huge if we do not only include the jobs which require R modeling (or related) skills. A lot of people are going to need to let go of "I learned statistics so the only job I want to do is statistical modeling" to get jobs in this industry.

-----

2 points by 1_over_n 3288 days ago | link

i think this is a very important point - and basically refutes the hypothesis we are in a data scientist bubble.

Right now the term 'data scientist' is far too generic for me and covers so many bases. I graduated university with a strong stats background however it has been difficult for me to transition into solid data science roles due to my lack of programming (which im thankfully improving now) where as i think there is a big risk of some bad data science being done by people who can start messing with data without a solid grounding on what it means for data to be skewed, checking for kurtosis etc etc. For me personally i always knew what i wanted to do with the data, but getting it and cleaning it up was another battle completely. It might be that small discrete data science teams can work together just like any scientific lab would with a technician, professor etc in a bit more of an academic fashion but running under lean methodology

The problems will come from (i anticipate) data scientists who are employed by big corporates because 'we need a data scientist' and HR basically dont understand what they are hiring for or why.

Its more of less on the data science community to self police this - which will likely be the case.

-----

4 points by data_sam 3283 days ago | link

Quite the irony that no one here is trying to use data to answer that question.

-----

4 points by larrydag 3289 days ago | link

This reminds me of the STEM shortage debate. http://spectrum.ieee.org/static/the-stem-crisis-is-a-myth-an...

I'm not sure myself if there is a bubble. I'm get job requests of at least 1 to 2 a week from recruiters.

-----

2 points by lackadaisically 3283 days ago | link

> I'm get job requests of at least 1 to 2 a week from recruiters.

This doesn't mean anything.

Their job is to throw as much potential employee at the employer as possible because these companies do not have brand recognition for programmers to line up to apply to their company.

It's game theory, shot gun approach. The incentive is throw as many programmers at the employer as possible regardless if they fit it or not, it'll increase the chances of the company hiring one of them and the job recruiter get the commission.

When people states something along the line of:

"I'm get job requests of at least 1 to 2 a week from recruiters."

It's a tall tell sign that they themselves are very new at how employment works in real life within the tech industry.

The reason why I'm pointing this out is not to put OP down but to make sure whoever is reading this understand that this statement is very misleading and often spoken but isn't necessary true.

I get well over 3-5 job offer emails from recruiters, aka head hunters, daily and get at least 2-3 calls once a week. It doesn't mean I qualify for it. I would also get spam with job position that aren't even close to my skill set but they'll do it anyway just cause there's a chance I might get hire and they get the commission.

Also do not use this as a gauge to your skill set or value. Your value is base on how fast/easily you can get a job.

It's usually base on your experiences and how able you are to pass interviews (regardless of whether it's bullshit or not, technical, non technical).

-----

1 point by larrydag 3275 days ago | link

Great points. I upvoted. Just because I get recruiters pinging me doesn't mean there is a bubble. I definitely agree it's a game theory approach with recruiters in a "new" field of Data Science that they don't quite have a good read.

-----

3 points by neilvyas 3286 days ago | link

I just wrote a couple posts that kind of address this sentiment. You can read them here:

http://neilvyas.github.io/2015/03/29/on-hiring-ds.html

and

http://neilvyas.github.io/2015/03/27/YC-Austin.html

I don't think we're in a bubble, I think we're just seeing a surge of people trying to enter the field as a consequence of its popularization in tech and academic circles as a "sexy job."

-----

3 points by breucopter 3288 days ago | link

I recently became a data scientist and was lucky enough to find a role where my company is willing to train & mentor me for the role. The large volume of applicants may be more due to the ambiguity of what is required to be good at the role. Companies put out a wide variety of messages around what they want: PhD converts, experienced developers, advanced business analysts. I think the ambiguity coupled with the hype drives up applicants that are misaligned with the company's goal (if they even have a tangible goal).

-----

2 points by 1_over_n 3288 days ago | link

again - strongly agree with this. Its absolutely right that the company should put time and effort into training and mentoring you because its very unrealistic to expect someone who has the potential to be a strong data scientist to have all the skills for that particular company on day 0

-----

3 points by apor 3288 days ago | link

One valuable aspect of mentoring is the transference of institutional knowledge to new people. Technical skills are important, but learning about internal projects, past history/lessons, philosophy, etc are fairly critical for an organization to stay alive and survive through tough times.

Places I've seen that have zero mentoring also have zero transfer of institutional knowledge outside of osmosis and hubris (we train our people during our once a year 3 day meeting!). The end result is a few events can have a major effect. One senior person leaves and suddenly entire projects and systems are derailed and backlogged because no amount of technical skill can replace the knowledge of what actually went on in those products/projects in the first place.

-----

2 points by nofreehunch 3287 days ago | link

I feel data science has already boomed. I hear more about machine learning these days. May be a bias though.

I think there is little room for junior data scientists. People want the unicorns with PhD's. I also feel there is a disconnect between PhD's and practical data science: People stuck with a certain algo, because that is what their supervisor used to specialize in.

Finally the hype certainly is causing programmers and developers to try their hand at machine learning. But running random forests and taking a Coursera course is far from data science. It is familiarity. Like with SEO: Everyone was able to get you good ranking results with no way for the customer to separate the weed from the chaff.

-----

1 point by matchagaucho 3287 days ago | link

I'm not a Computer Scientist, but I use computer science everyday.

I'm not a Data Scientist, by I aspire to use data science everyday.

-----




RSS | Announcements