I’ve been wanting to create a portfolio of my work as a data analyst for a while. I tried a few things and, after some trial and error, ended up building a very simple website. It contained my CV, a picture of me smiling, a list of hard skills and a GitHub link to an old (about three years by the time of this post) R project.
The project was a small R package containing some data sets and functions to perform what is called software defect prediction: essentially, you take a look at a piece of code and you’re able to predict, without having to actually run the code, whether there will be bugs – and if so, how many and where.
To be honest, that project (which was part of my master's thesis) didn’t prove very useful to the field of software defect prediction. It introduced a couple of relatively unknown clustering algorithms, but in the end, they turned out to be performing pretty much the same or worse than other unsupervised methods and certain common supervised ones, such as Naïve Bayes or Random Forest.
Even though the results weren’t great, the thesis was turned into an article and published: it was the first time something I ever worked on was made public, and being able to share the result of a somewhat long and difficult project felt really good. After that, everything I worked on may have had an impact within a company and among colleagues, but it always remained behind office walls.
The website, and the posts that come with it, are an attempt at sharing with others what I’m learning while working and studying new topics related to data analysis. I’m constantly updating the website and adding things. I like to receive feedback and collaborate with other people, and if you want to get in touch I always answer comments and emails. Hopefully, you will find something interesting here.