Small-footprint key phrase recognizing for low-resource languages with the Nicla Voice
— July sixth, 2023

Speech recognition is in all places nowadays, but some languages, akin to Shakhizat Nurgaliyev and Askat Kuzdeuov’s native Kazakh, lack sufficiently giant public datasets for coaching key phrase recognizing fashions. To make up for this disparity, the duo explored producing artificial datasets utilizing a neural text-to-speech system referred to as Piper, after which extracting speech instructions from the audio with the Vosk Speech Recognition Toolkit.
Past merely constructing a mannequin to acknowledge key phrases from audio samples, Nurgaliyev and Kuzdeuov’s major objective was to additionally deploy it onto an embedded goal, akin to a single-board laptop or microcontroller. In the end, they went with the Arduino Nicla Voice improvement board because it accommodates not simply an nRF52832 SoC, a microphone, and an IMU, however an NDP120 from Syntiant as nicely. This specialised Neural Determination Processor helps to enormously pace up inferencing occasions due to devoted {hardware} accelerators whereas concurrently lowering energy consumption.
With the {hardware} chosen, the crew started to coach their mannequin with a complete of 20.25 hours of generated speech information spanning 28 distinct output lessons. After 100 studying epochs, it achieved an accuracy of 95.5% and solely consumed about 540KB of reminiscence on the NDP120, thus making it fairly environment friendly.
To learn extra about Nurgaliyev and Kuzdeuov’s undertaking and the way they deployed an embedded ML mannequin that was skilled solely on generated speech information, take a look at their write-up right here on Hackster.io.
You may comply with any responses to this entry by the RSS 2.0 feed.
You may go away a response, or trackback from your individual website.
