A summary of a few of my recent personal projects
Pocket Polyglot Mzansi is a small 50M parameter machine translation model for South African languages. The model is part of an ongoing research project that aims to develop a small (<50M parameters) machine translation model that matches or exceeds the accuracy of NLLB-200-600M on South African languages. The current version of the model is > 90% smaller than NLLB-200-600M, but sacrifices only 6.3% in accuracy in terms of chrF++.
I have been taking part in machine learning competitions on Zindi since July 2024. One thing that is really cool about Zindi is that the challenges focus on solving real-world problems in developing countries and Africa in particular. After only eight completed challenges, I'm currently ranked 14th on the global leaderboard (highest rank: #11).
My top results include:
All my results can be viewed here. My open-sourced solutions can be found in this GitHub repo.
BigTabular is a machine learning library that replicates the functionality of the tabular data application in the fastai library to work with larger-than-memory datasets. Pandas, which is used for data transformations in fastai.tabular
, is replaced with Dask DataFrames.
Most of the Dask implementations were written as they were needed for a personal project, but then refactored to match the fastai API more closely. The flow of the Jupyter notebooks follows those from fastai.tabular
closely and most of the examples and tests were replicated.
I still plan on incorporating the functionality into the fastai library as described in this post in the fastai forums.
A web app to visually compare the various learning rate schedulers in PyTorch.