I built my first machine learning model

I've started studying deep learning using the Practical Deep learning for Coders course over at Fastai. I have no background in ML or math, but their example-first approach allows you to get demos up extremely quickly.
The course teaches you DL using their fastai library.
Thanks to these resources, I was able to build an impressive demo app in only 2 days.

What I built

I created an image classification model that can recognize 150 different pokemons (the whole first generation).
I also made a UI to interact with the model and uploaded everything to a free codespace on Hugging Face.


you can also play around with it on Hugging Face.

The tools

The backbone of the project is the fastai library. It provides a high level API to interact with PyTorch.
The other major tool I used is the course mentioned above, which teaches you how to use the fastai library.
Additionally, I've used the Hugging Face Dataset collection to get my data and their Spaces tool to host the model and its UI.
Talking about UI, It's made with Gradio.

The building process

Building the project has 3 main components to it:

  1. Collecting and validating the training data
  2. Creating and training the deep learning model
  3. Creating the UI and publishing everything for other people to use

Thanks to the tools above, solving these problems is simple and accessible.

1. Sourcing and validating the training data

Let's define our training data: we need a sizable number of images for each pokemon, and we need to know the correct name for each image.
Additionally, the data should reflect the type of images that we'll feed to the model when using it after training.

A great starting point is this pokemon classification dataset on Hugging Face. It features more than 4000 images, already validated and correctly labeled.

However, while testing I've noticed that the images didn't reflect enough what I was using in production.
To fix this, in the following notebook the default dataset is augmented with new images from the web.

2. Training the model

Ironically, this is the easiest part thanks to the fastai library. Instead of starting from scratch, we can get a pre-trained neural network, and fine tune it by adding additional layers to it.

The pre-trained model used is a general-purpose convolutional neural network, pte-trained on a variety of shapes and objects. The model name is rasnet-n, with n being the number of layers in the neural network - the more the layers the better the accuracy, but training takes more time.

The training principle is simple: we ask the model to predict the label, and we check the prediction against the provided label. After doing so for all the images in the training set, we check the model accuracy using our validation set.
We repeat this process in cycles until we get an acceptable accuracy rate. Each cycle adds a layer to our pre-trained model.

Once the training is done, we export the model as a .pkl file (a way to save the obj state).

Note: this step might take some time, so we are not showing running it here, however, you can run it in colab (make sure to include also the previous notebook, otherwise you'll get errors from missing references).

3. Create the UI and host it

Creating the UI is extremely easy. We can use widgets provided by Gradio to interact with the model, and we can host the final product on Hugging Face.

The basic process is simple: you create a repo in Hugging Face spaces, you put all your assets there (the model file, demo images etc.) along with a script to run your app.

The following notebook creates a Python script that functions as a build process to piece together the app.

And that's it. The model is trained and integrated into an app for everyone to use.
It's amazing how far machine learning and deep learning resources have come.