So yes, this post might somewhat look like a clickbait, but I promise you it is not exactly that (Well somewhat).
I recently got a question on Quora asking something on lines of what exact skills do companies look for when they are recruiting a Data Scientist? and is there a definition of Data Scientist profile? As is pretty obvious, there is no one profile, as every company is solving its own set of problems. But I tried to make a few generic job profiles that can somewhat fit JDs of different companies. I think there is way too more variety, but I had to narrow down on a set of profiles, so here is the list:
- The R using number-cruncher. Can run quick Group By’s and Counts on Numbers in R/Python . This profile is the coding version of Data Analyst from earlier days. Automated report generation in a more analyst-y organization is the most common location one finds this profile in.
Tools Used : R (dataframes), SQL
- The Modeller. Deeply Mathematical mind, who can apply Bayesian/Frequentist inferences or hierarchal models. Probably I am grouping too many people into a single group here, when people analyzing drug trials, scientists modelling complex phenomena and people running autoregressive models on stocks are grouped into one. The common theme here is Mathematics forms the base of the work
Tools Used: R is very popular, Fortran, C++ and sometimes functional languages.
- The Data Engineer who is also a occassional Data Scientist. Take a library from here, take some code from there and make something good enough while you manage the data pipeline. Very common profile, Data Science tasks include writing programs to automate report generation in Pandas, trying out simple Machine Learning models and (now-a-days) running a pretrained Neural Network on the data
Tools: Python toolchain, Pandas, nltk, Keras.
- The tabular ML’er (or the XGBoost specialist). Ardent Kaggler, can train multiple algorithms and stack models and optimize the heck out of them. These guys have deep expertise with running and optimizing standard algorithms like XGBoost, Ridge Regression and (now-a-days) Keras models.
Tools: Python or R, uses XGB, Keras a lot.
- The old style ML’er . Close to 4, but not limited to categorical models only. Very good at feature engineering. This was the only Machine Learning expertise until the newer Deep Learning profile came up.
Tools: C++ / Python with Scikit Learn.
- Deep Learning Guy. Needs a GPU system and a well tagged dataset and needs to try out architectures and do no feature engineering. Will spend lot of time in trying arcitectures and minimal in feature engineering, but the accuracy will be insane.
Tools: Python, Theano, Tensorflow and high level libraries like Keras.
- The domain specialist. Knows a lot about domain, something about linear models. Codes the domain information and trains a linear algorithm on top. Includes mechanical engineers, analysts at different firms and scientists in pure/applied sciences.
Tools: Different Specializations use different things. Matlab by Engineers, C++/Fortran and sometimes R/Python.
- The newbie. The intern. Will evolve into whichever of the 7 categories his/her mentor belongs.
At ParallelDots, we have people of type 2,3,4,5 and 6. (and 8 if you want to join us fulltime).