I used to work at www.comet.ml, where they are building a really amazing tool for machine learning engineers. The short of it is they help track experiments using a single line of code that automagically saves everything to make your model reproducible. You can get great experiment logging and history without being tied to a single platform.

In my own time, I decided to put @cometml to the test while training a model for logo detection, using RetinaNet.

https://twitter.com/rememberlenny/status/983897644094447617

RetinaNet is a very high quality object detector that uses the “Focal Loss for Dense Object Detection” (by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollá) paper’s method. The Keras implementation of this paper can be found here: https://github.com/fizyr/keras-retinanet

Using this repository, I augmented the train.py script and added the www.comet.ml training example code. With a single line of code, I was able to get a live view of the model’s training process, access to the Keras RetinaNet code, a snapshot of the hyperparameters I used when running the program, and the results.


I am obviously biased because I work on this product, but I honestly have to say that I was impressed. While trying to run various training processes in the past, I repeatedly got stuck or underwhelmed by the output. With Comet.ml, I have a live interface into the training process that extends beyond the bash terminal. I do most of my development locally, but train models on a remote development machine I use. I can now train the model and monitor the overall process using Comet.ml, rather than needing to keep an open session to monitor any changes.

I’m reiterating the process I went through below for anyone else who wants to try.


Setup my environment

I started with setting up my remote environment and getting the code I would be using for the RetinaNet training process. I used a dataset of logos from various companies, very similar to something that can be found on Kaggle.

I had to install the RetinaNet library and various dependencies on my remote machine. Because my machine has a GPU, I installed the tensorflow-gpu version 1.4. I also updated the train.py script that RetinaNet uses to run and added the single line of code from Comet.ml to kick off the training process.

Side note: We make it really easy to connect your training process to your github repo. This way, once you figure out how to get the best training result, you can create a pull request that takes a snapshot of your code and hyperparameters.

Comet.ml gives you a code snippet that you can copy into any machine learning program. Just make sure the comet_ml import script is at the top.

Install Comet.ml

All I had to do was copy the initialization script into the Keras RetinaNet train.py file and run the code. Thats it.


Track your experiment

Once Comet.ml is installed, the experiment code will pull all the hyper parameters you define during runtime. Whats also very cool is that depending if you have setup your Comet.ml project to be public or private, you will get a web URL to monitor the experiment training in real-time.

This is the terminal after I run the train.py file with hyperparameters. Notice the experiment URL is generated at the top.

Monitor the experiment

In my case, the training process was estimated to take 10 hours, so I was able to detach from my active session and monitor Comet.ml. The dashboard for the experiment shows a live chart of the loss and accuracy metrics. It also provides a clear picture of the code and hyper parameters used to get the recorded result.

Example of the live loss and accuracy metrics being charted. https://www.comet.ml/lenny/retina-net/d83a54add91b4a10869977ac4d440d81

Example of the code being saved. https://www.comet.ml/lenny/retina-net/d83a54add91b4a10869977ac4d440d81

Example of the hyper parameters being logged. Notice how Comet pulls out the arguments used as well as their value. https://www.comet.ml/lenny/retina-net/d83a54add91b4a10869977ac4d440d81

Finally, Comet.ml also logs the terminal output. This is really useful because you can actually just have the Comet.ml website open rather than needing an active SSH session to the server running your experiment. You also don’t have to worry about accidentally disconnecting and losing your progress.

The “Output” tab on experiment pages show a live view and historical record of the terminal output while training. https://www.comet.ml/lenny/retina-net/d83a54add91b4a10869977ac4d440d81

Thats it!

Is it useful?

If you struggle with managing your experiment history and reproducing results, you should definitely check out Comet.ml.

And let me know what you think!