How to cross the “My very first project” bridge

Monika Nowakowska
5 min readDec 17, 2020

--

You can hear everybody saying the golden rule for learning data science is: “Find your own problem and solve it”. Maybe you found yourself thinking: “Mhmm…yeah…- just give me a sec, and I will pick one from the top of my head” (this is the part when you turn on the search button in your brain). Strangely enough, you can’t find anything.

Photo by Fabio Comparelli on Unsplash

You’re just a beginner. How on earth can you find a problem when you don’t even know the general rules well enough to solve them yourself?

What’s wrong with this clever, reputable tip that sounds like a mantra in every data science tutorial?

Preface

I’ve been there (in fact I’m still there), in the skin of a person who tries hard to find her “own problem to solve”. I’ve gone through quite a lot of online tutorials, which we know already doesn’t teach you independence. In fact, I’ve participated in one online course, with a Kaggle challenge in the end. The idea was to compile the experiences and knowledge of what we learned during the course in practice. Although I tried hard to take part in it, I gave up fairly quickly.

Why is it so difficult to get started and why does our inner nature postpone it as long as possible? To be clear, there’s no workaround if you really want to learn how to analyze data — this step just CANNOT be skipped.

Let’s dwell on it for a while…

Problem

You don’t know how to start. You’re not sure what you don’t know, or what you’re supposed to know. Sounds bizarre, but this is how it feels. Every dataset is different, so you can’t apply the same steps you saw on the online course (the one where you tricked yourself into thinking you could code but just followed the instructor). Indeed, heading your very own first project is way more difficult. Nobody can really give you a template on how to proceed.

Solution

It would be good to start your analysis with a captivating dataset. You can certainly work with the well-known Iris or Titanic dataset, but what about finding a problem that you REALLY care about? Look at it this way: We are all different. There’s a small chance you will get captivated by the Football dataset (i.e: soccer) as quickly as your male-colleagues. Particularly, if you are a big fan of movies, then you should definitely explore the movies dataset.

Among many things in this world, I believe we all have our own area of expertise. Something that we can spend time doing for hours and never get bored. I bet your area of choice is somehow connected with machine learning. It is only on you to find it out.

All areas of our lives: sport, music, fashion, art, health and many more; create a huge connected, virtual world map. The common denominator is DATA.

Being passionate about your first dataset will help you get the ball rolling. And in my humble opinion, can either help you reach higher or knock you out in the first round.

Photo by Artem Beliaikin on Unsplash

Dataset is ready, what’s next?

Ask yourself: Do you really understand this dataset? What are the potential problems out there? How can you address them? Do you like your dataset well enough to commit to accomplish your task? Ask questions, think about them out loud, consult them, write them down. Understand your data.

In the real world, we probably wouldn’t have the luxury of choosing what we like and don’t like, but learning new concepts will decide the winner. Take advantage of the fact that today you can choose the topic yourself. Don’t torture yourself with things that don’t interest you at all.

Working on a problem you don’t understand increases the likelihood if you get stuck at the first problem, you’ll be stuck forever.

Personally, I found myself wearing the “I don’t know my problem” shoes for way too long. I was tackling with it for quite a while until I stumbled upon “TED Talks” (ted-talks) and Friends dataset (friends). Around the same time, I finished reading a book: TED Talks: The Official TED Guide to Public Speaking by Chris J. Anderson and was somewhere around Friends Season 2 (watching it for the first time at age 29). Without hesitation, I started my very first data analysis in Python.

The rewarding feeling was amazing! Finally, I felt encouraged and interested. Digging in the dataset was rather fun than a sad duty. Even if I got stuck (many times actually), I always forced myself to get the answer. More importantly, I finally made my first step and hands-on experience on things I would probably never encounter in an online course. And even if so, I would not remember it anyway.

If you must put an effort and climb to your heights, you will remember it for a very long time.

Conclusion

I cannot argue with the idea of getting your hands dirty and starting your own project to learn things faster. Do it wisely. Don’t get into just any dataset. Especially at the beginning of the journey, your dataset must be fascinating, compelling, and captivating for you, especially FOR YOU.

“It’s not the Destination, It’s the journey.” ― Ralph Waldo Emerson

Photo by averie woodard on Unsplash

This first experience will help you go out into the wide waters alone, without holding the online instructor’s hand. After the first experience (even if not very successful) it will be the second, third and the fourth, until you feel comfortable enough to pursue your career in this direction.

Don’t ever compare yourself with others. Go at your own pace. Eventually, your solid motivation will get you there (wherever you head off).

Wish you luck in your data science journey! It would be great to hear your story :)

Thank you for reading!

If you enjoyed this article, follow me on Medium

If you want to say “Hello” connect me on Linkedin

--

--