While following Prof. Ng course on Coursera “Machine learning”, I wanted to try to build my own machine learning algorithm. From zero. With Python.
A little bit of context
Even if the course was made in 2011, presented algorithms are really powerful and impressive. Machine learning is a trending subject and most known architectures were created after 2011 (well reported in Adit’s article The 9 Deep Learning Papers You Need To Know About). The new deeplearning.ai specialization (still from Prof. Ng) brings state of the art techniques to beginners.
In “Machine learning”course, code assignments are in Octave (or Matlab). Here I wanted to code in python. This, purely to test what I’ve learned of ML concepts and not just copy/past Octave code and say, “I know Machine Learning”. I see this as a sort of final project for this course.
The idea here is to make a skin detector using UCI skin color dataset. Colors are extracted from FERET face database.
This data set is interesting because the problem is not trivial but input features are simple (three integers) with a binary prediction: skin color or not skin color. I choose this as it is one of the simplest dataset on UCI repository.
I will try to make it short and entertaining, I will explain the work done while explaining some pits I felt in.
Before all, I find reassuring to plot some data. It is always pleasing to visualize what the code is doing. This is also to confirm that skin and non-skin colors are well separated. So, here are some data :
This is the colors represented in a 3D space (axis are red blue green). With blue triangles as tagged skin colors and red dots tagged as non-skin color labels from the ground truth table.
As you can see on the previous figure axis, the fundamentals colors (red blue green) are presented between 0 and 1 instead of 0 and 255. Prf. Ng explains in his course that input features should have the same min and max, which is the case on raw color data (between 0 and 255). But, I found that the sigmoid function implemented with Numpy outputs exactly 1 for a number greater than 30 as input. By bringing inputs features between 0 and 1, the sigmoid function have a better dynamic in its outputs giving better results.
Building the model
The first idea is to build the simplest model possible to validate basic machine learning functions: cost function, back-prop function, gradient checking and prediction function.
Those functions are surprisingly straight forward after Prof. Ng course. As seen in the course, understanding each matrix dimension is fundamental to code a vectorized version of all those functions.
I had trouble with some python and Numpy mechanisms.
At first, my gradient check did not validate my backward propagation implementation. After some hours of swearing, I found that the problem was in my numerical gradient function which always returned 0 ! This was due to the way python store Numpy arrays. Writing something like:
theta_plus = theta
theta_minus = theta
theta_plus[index] = theta_plus[index] + epsilon
theta_minus[index] = theta_minus[index] - epsilon
Will not work as expected. It does not create copies but references. So, when implementing numerical gradient with this code, it will always return zero.
By running the model on the test data, I got a nice figure. Using the same axis showed before in this article, but with blues dots and green triangles as correctly classified data (skin of non-skin color) and red dots as falsely classified as non-skin and black triangles as falsely classified as skin color, this is what I got:
I was really surprised by the performances. The model seems to fit nicely the training data regarding its complexity. Still, by looking at this, the model seems to fail to classify what appears to be dark skin colors and brighter skin colors.
The best way to check this hypothesis is to run the algorithm on a real image !
I tried the model on some portraits with a good variety of skin tones I found on the Internet:
By applying a mask on where the algorithm find a skin color (black pixel when skin color is detected), I got:
Indeed the model fails to recognize skin color on dark and bright skins regions. It “underfits” the data in a way that it failed to fit the data perfectly with some miss classifications. The model is too simple for the data complexity.
I tried different other models to have better results.
At first, I added one hidden layer with three neurons. I was thinking that by having a layer with as many nodes as input features would increase the performances.
Again, after Prof. Ng course, adding a layer is really straight forward. I found that understanding each matrix dimension is a great way to debug those implementations. I also lost some time on small code issues I could have resolved in a minute if I had plotted or printed data (using the debugger at each function output).
After some training, I plotted the result on a fraction of the test set :
Here, the prediction seems perfect ! But we can observe two miss classifications close to each others and also really close to skin colors. The data is randomized and at each run (training step included) those miss-classifications are plotted at a different position. On the previous model, miss-classifications were at the same position at each run, here the miss classifications moves in the figure. The model seems to fit too well the data. I also had a really low cost function.
To give intuitions on what was happening, I tried the model on the test image. Here is the result:
Predictions seems to be enhanced but some false negatives are present almost everywhere on faces at random positions. This model failed to generalize skin and non-skin colors. It overfits training data.
To overcome this, I simplified the model by setting the hidden layer to only one node. This is the result:
Here, the result is way better than the first model and has a better generalization than the last one. But still fail with darker colors and some hair colors.
One good practice I noted is:
Visualize your data !
By visualizing data, it is easier to understand what is happening and for debug purpose. If doing so, use a debugger or an intelligent plot, mask on image, etc not a sequence of prints.
An other idea I noted, which is also well presented in Prof. Ng course is:
Test ideas and have intuitions
Even if the course presents techniques to point directions to follow for new ideas, testing is a good way to have better models. The thing is that one can not have the best model in one go.
This leads to an other important idea behind machine learning:
Have good computation power
On this training set, the model with one hidden layer with three nodes took approximately one hour and a half on my computer (Intel i5–6500 3.2GHz) with a fully vectorized implementation with Numpy ! Computing on the GPU is an idea, using Machine Learning focused APIs like Tensorflow.
To finish with, Prof. Ng’s course on Coursera is an amazing introduction to Machine Learning. So much amazing that I finished it in five weeks instead of eleven recommended weeks. I discovered a fascinating field I would like to follow and apply in my work. That is why I applied to the new deep learning.ai specialization on Coursera.
I might write other posts on ideas or tests I will implement.
Thank you for reading me. Do not hesitate to comment what you think on this little test and if you have intuitions to enhance the algorithm !