This was the question my mentee asked me when we had our first introductory meeting. He’s a Computer Science graduate from China doing an MSc in Data Science and Business at Exeter University and is trying to understand the world of data science and how best to break into it as a graduate.
Data Science is such a broad topic its worth dissecting it a little more before trying to answer the question directly. You can find lots of definitions on-line, but a simplistic view would be that it’s about trying to derive business value from large datasets, commonly called Big Data.
Big Data itself can take many forms and is often defined in terms of Vs:
- Volume: The main characteristic here is the size of the data. Think every twitter message, every page browse on Amazon, every google search phrase
- Variety: Commonly you’ll hear this is the combination of structured and unstructured data and the challenge of joining it together. For structured, think of a well-defined schema with associations between data elements, while for un-structured, think free form text like you might find on a web page or Instagram post.
- Veracity: Meaning is the data trustworthy, has it been validated, does it need to be cleansed?
- Velocity: This is about the frequency at which new data is being created. I spoke to one engineer recently working at Facebook and they were processing 1Tb of log file data an hour with the ability to further scale horizontally
So, with this in mind, you’ll often see firms make a distinction between Data Engineers who tend to focus on the preparation of the data and Data Scientists who mine the data looking for insights. The skills required across the two roles generally have a degree of overlap though Data Engineers typically need experience of Fast-Data, Streaming, ETL, NoSQL, Hadoop etc…, while Data Scientists need more experience with the various flavours of Machine Learning / Deep Learning and RNNs, languages like R & Python alongside experience of using their associated ML libraries.
The next thing is you need to find a way to differentiate yourself. A quick search of active candidates who reference “Big Data” in their CV on a single, on-line Job Site throws up over 1800 people, so you need to find a way to stand-out.
My advice is:
Participate in the communities
There are so many virtual and physical forums for like-minded individuals to hang out and share, it’s a great way of making connections, often with people who are already working in the industry and can give you a deeper insight into what its really like to work in that domain. Also, contribute to the forums, the more you put in the more you’ll get out and you never know, your future employer may even be part of the forum.
They are a great way of advancing your skills by practising in your spare time or even contributing to other open source projects. When it comes to writing your CV, put a link to your source code repository and use those projects to demonstrate your genuine interest in the subject.
Lots of these are available these days so if there is an area of learning that isn’t covered by your course, do these in your spare time and again make sure you reference these on your CV as it’s another great demonstration of your initiative and genuine interest in the subject.
Once you’ve done all this, all you need to do now is find that perfect job and apply!
Gary Rawlings is our in house tech expert and a key member of our senior leadership team.