The nr. 1 skill a data scientist should have: problem solving

The title sounds a bit obvious right? Of course you have to be able to solve problems But bear with me please; I mean something quite specific here. (Maybe there should be a better term for this skill, but I have not been able to come up with one.) So let me state this very clearly:

The number one skill you need to be a good data scientist is problem solving.

Now, what do  I mean by this? Let's say you have some data, and you want to make a machine learning model to predict something. This takes a lot of steps, and there is no way that you will know exactly which steps those are right at the start. Along the way, you have to figure out what the next step is and how to do it. And if it does not work out as expected, you have to find out why and adjust your approach. This means solving a string of very small problems, in order to achieve your goal of making a model to predict your target.

In a real-world setting you might also run into challenges of other kinds. Often, as I have experienced in my work as a consultant, you have to deal with tools or languages unknown to you because that is what the client has or wants to use. This might mean that you have to not only think about what steps to take, but also how to implement them in the unfamiliar tool or language. Which may then give you errors, or unexpected results, which you then have to figure out how to resolve.

Another place where this skills appears, is with coding and working with the data itself. For example, you notice that a lot of the fields in the data have missing values. Or there are values, but they make no sense to you. Another example is when your code gives you some output, and runs without giving errors, but the output is completely different from what you expected.

The skill I am trying to describe here, is the ability to look at the little problem in front of you, come up with ideas on why it exists, where you want to go, how you might end up there, and then test these ideas out and see what works. All the time, in big or small problems, in various areas or aspects of your work. The best description in a more general sense I have seen so far is in this post by Scott Young.

The people that seem to me to have the most success in data science are those who have this skill. Some people have it naturally, or at least it seems that way, but I also strongly believe that it can be strengthened in anyone. A lot more can be said about this, but here are some suggestions:

  • For coding: in your chosen language, learn to read the error messages, and google all the messages that you get. See what people on forums have to say, what the documentation describes. Try to replicate the problem in a very small way, or print output from every little step. By gradually learning more and more, you get to see what are ways you could go wrong in the language, and see what are common types of mistakes and solutions.
  • For modelling and working with data: the best way for me has so far been to talk the problems and ideas for solutions through with other people. That way, you get to see how they think about the same things, and what worked for them in previous situation. This adds to your own store of ideas and experience. And at the same time, explaining the problem and your ideas to fix them forces you to clarify them, which can lead to new ideas as well.

In my personal experience, this is the skill that determines most strongly your success in data science, because it enables you to learn almost any other skill and apply it effectively. If someone can convince me that they are good at this, I would love the work with them. The rest will take care of itself.

HELLO, WORLD!

Let's start this blog in the traditional way with this nice post title. This blog is the results of months of thinking, wavering, and finally deciding to just go for it and start blogging. I hope that you enjoy reading here, whoever you are. Welcome!

This blog will be my little place on the internet where I plan to share my thoughts, ideas and findings on becoming and being a data scientist. And hopefully a decent one. My interest is mostly on the theoretical side of things, like the details of different algorithms. But also the application of models, thinking about the ethics and finding your way in the real working world are interesting topics that I would like to discuss here.

At the moment I do not think there will be a high frequency of posts here. But let's see how it goes. First up is writing a little bit about myself and my journey to becoming a data scientist. See you then.