Aren’t data scientists software developers?

Juan Pablo Escobedo
4 min readSep 20, 2020

Let’s get our concepts together

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structured and unstructured data.

Wikipedia

Therefore, a data scientist is a person who uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data.

A computer programmer, sometimes called a software developer, a programmer or more recently a coder (especially in more informal contexts), is a person who creates computer software.

Wikipedia

We can conclude a data scientist who writes computer algorithms is a software developer.

Analizing Stackoverflow’s 2017 Annual Developer Survey there is a separation of data scientists who are software developers from those who are not. Since this data comes from a survey, we can conclude the second group doesn’t consider itself developer.

There are 3045 developer data scientists, and 1100 non developer data scientists in the survey. That is about a 30% of data scientists who don’t consider themselves software developers.

From here and on we will call dev and non dev the data scientists to differentiate the two groups.

Developer data scientists and non developer data scientists

What characteristics differentiate developer data scientists from the non developer data scientists?

More than a 50% of dev respondants have graduated from a computer science degree. While most of the non devs are split in other science degrees, yet some devs are in these groups too.

Major undergrade

There is something interesting in the last plot, some computer science graduates don’t consider themselves developers. Is it because they have studied it but don’t work at it?

Profession

We can conclude that all respondants based their choice to be considered devs by their current profession.

Anyway, there are other questions in the survey that can help differentiate. I will list a few that I considered interesting.

Type of company
Years programming
Check in code on source control software

Using some algorithms, also showed us that there are characteristics that differentiate greatly the two kind of data scientists. These characteristics describe the habits of the two groups, like the check in code above, or the coding language they have used.

Using the characteristics from the survey could we predict who is a dev and a non dev?

Let’s say we know the habits of a group of data scientists, and we want to offer them some books to help them on their job. If we asked them if they are developers we could instantly get rejected before showing any proposal.

Therefore we can use a machine learning algorithm to know if they are developers or not in advance, and offer them the book that could help them in their career.

Using a quick algorithm I could predict with a 70% accuracy what kind of datascientist replied the survey. Further improving this algorithm can greatly increase the accuracy.

But, if we want to sell all kind of developer books using the same chracteristics?

We can use another algortithm to take apart all kind of developers (data scientists, web, mobile, etc.) from all kind of non developers (c-suite, data scientist, etc.).

Respondant type

Some data scientists do not consider themselves developers because their job title doesn’t say it. Or because their job is not just about writing code, but goes beyond.

We could say all of them are developers because they write code, even if it is not a fully structured computer program.

You can read a more detailed and technical analysis here.

--

--