Pocket Polyglot Mzansi: My talk at Deep Learning IndabaX South Africa

A few reflections

The main question that I tried to answer in my talk on small-scale machine translation models at Deep Learning IndabaX South Africa last week was whether it is possible to train small models - small enough to easily run on a smartphone - that matches the accuracy of the best open-source models for South African languages. I showed that a 50M parameter model trained for only 10 hours on a single consumer GPU is 94% as accurate as NLLB-200-600M for translating four South African languages while being only 8% of its size. This is only a start - I believe we can train much better small models.

When I first trained small transformer language models from scratch in 2019, BERT-LARGE with 340M parameters was considered to be huge; today, it is considered tiny. What we consider as small has so shifted so much that when we set out to train a ‘small’ model today, we almost inevitably begin with something with a few hundred million parameters. What I’m exploring is how much we can achieve with models that are up to an order of magnitude smaller than the models we consider to be small language models today. We can spin it any way we like, but the simple fact is that large GPU clusters are devastatingly power hungry. In my opinion, we should always try to build the smallest, most efficient models that are actually useful to people.

In my talk, I mentioned that the choice of four languages was somewhat arbitrary, and only intended to be a baseline. In the Q&A, I was also asked about the decision to select languages based on the number of speakers. This choice resulted in not including any Sotho-Tswana languages, and thus not a representative sample of the major South African languages. I have now extended Pocket Polyglot Mzansi to translate two more languages, Setswana and Sepedi, and it looks like a 50M parameter model can translate six languages just as well as four.

I was also asked whether the model is available on Hugging Face - it is now. I’ll make the training code available soon. See the following links to more information about the work so farc: