Data science is one of the coolest sounding jobs around and probably one of the most sought-after skills in the world of computer science. But not many people know what it is and even some of those people who do know what it is cannot do it.
What is even more problematic is that there are people saying they are doing it but can not, or at best can do it but very poorly.
Here I wanted to share 5 steps to data science. Even if it’s not your thing it is worth a read so you can spot some of the traits in others.
You tend to see this diagram a lot when looking for data science skills. It does a good job of giving you an insight into what you need, the main overlap I think is important is data extraction (hacking), machine learning and statistics.
1. Get educated
Most data science jobs require a minimum of a masters (MSc) but most seem to prefer a doctorate (PhD). I have to admit, I agree with these entry requirements. It’s not about being elitist, it’s about the amount of time a person has spent looking at these algorithms and techniques. I think back to how I felt when I had completed my MSc in Intelligent Systems and Robotics, I got a distinction in every module and worked on a cutting-edge deep reinforcement project. I even had a paper published in an IEEE conference for some of the work I did a module. But it wasn’t until about a year into my PhD that I realised I had gaps in my knowledge but the time doing the PhD allowed me to fill these gaps and look into data, the models and everything else associated with it.
But don’t stop there, you need to learn the other side too. I also took 2 postgraduate statistics courses. My opinion is that you get two perspectives on data science:
- Statistician learning machine learning
- Computer Science people who can run some machine learning code and don’t care much for the statistics
To truly appreciate the role of a data scientist you need to appreciate both aspects (ML and Stats).
Alternative to university education?
I’ve taken some good Udemy and Coursera course. Anything by Andrew Ng is going to be good, and I would highly recommend his courses. The project I feel with these types of courses is that firstly they are short, short means you don’t have enough time to assimilate the information and/or explore the work in enough detail. Yes you learning some cutting edge stuff quickly, and yes you might pass a test but did you fully understand it? I’ve known people who have done some of these courses and have skipped through everything to get to the practical.
2. Learn to code
Most of the machine learning/data science work is now done on open source platforms. When looking at the job vacancies this means you’ll see a lot requiring either R or Python.
Most computer science people will be aware of these two but haven’t spent much time with them. And I would assume more people are aware of Python then are aware of R.
I started with R, which is great for statistical programming but it can also do machine learning. I then moved over to Python for 2 reasons, first because people want R or Python (so I had both) and secondly, I feel that the machine learning packages in Python are better. Python is used by the bigger companies too including frameworks like TensorFlow and Theano.
For Python, I’d recommend installing the Anaconda Distribution because it has everything you need to get started. You also have a look at using Jupyter Notebook as an IDE, it forces you to think logically about processing your data.
Other things to have a look at…
- Scikit Learn
These are the fundementals.
Every now and again I see SAS advertised in a job vacancy. I tend to stay away from companies that use this software. Nothing against the software, I know it is good. I just don’t like being bound by a commercial product, especially when you can do more in R and Python. I am probably wrong but it screams “we bought it because someone said it would be good, but now we don’t know”, maybe eh?
3. Learn about Databases and data in general
I guess the ‘learn data in general’ comes about from experience. You need to have had a go at building some models, even the most basic models or tutorials will give you a feel for that the data should look like.
As you get into bigger projects you’ll need to learn to access data directly, so that will mean learning some SQL. Typically you’d expect a small to medium business to be storing data in either Microsoft SQL or MySQL, but you’ll be surprised how many people are still using Access.
4. Get a pet project
I’ve always told my students, colleagues and friends – “You’ve got to have a side project”. I remember when I was at university doing my undergraduate degree my fellow students and I would attend the same class, learning the same things but then I’d go home and play with something that was new to me, like Linux or web design, or anything. And I’m still the same now, I’ve always had a project on the go. It doesn’t matter if you don’t finish it, finishing it is never the goal, the goal is learning something new. Something to talk about, get excited about.
Think, what does everyone else you know do? They do their 9 to 5 job, go home and spend the night watching reality TV or playing some video game. Use that time to learn something new.
Places to start when you’re starting out…
5. Engage with the community
There is a big data science community out there and you need to engage. Here is my big tip – “don’t be a bullshitter”, I mean it. Anything data science, machine learning or AI attracts imposters. It’s like the quote…
[Big data/Machine Learning/AI/Data Science] is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.
The problem is that those who know how to do it will spot you a mile away.
But here is the other side, those who do know how to do it will probably help you or point you in the right direction.
Places to start are…
- Linkedin – follow some groups and connect with some big players, see what is going on
- Quora – you get everything on here, see what the job market is like, how people are doing cool stuff, you just have to look
- Medium – some interesting stuff on here, I should read more on here too