Abstract: |
A silent speech interface (SSI) allows people to interact with machines. It solves issues regarding the use of machines or devices in noisy environments or use by people incapable of producing audible speech. In this study, an SSI was made by classifying spectrogram images of the surface electromyography signals on the corner of the mouths of seven participants. The signals were digitized by an Arduino board. The Arduino board was connected to a PC via Bluetooth Low Energy. The signals were filtered and amplified as part of preprocessing. Then, short-time Fourier transformed was applied to change their domains to time and frequency, from which the spectrogram images were created. Afterward, the images were preprocessed using ResNetv2. The classification was done by four ensembles of four deep learning, base models, ResNet50V2, VGG16, CNN-LSTM, and MLP. The accuracy scores of the base models were 80.00 %, 70.00 %, 81.67 %, and 71.67 %. The soft-voting ensemble, neural network stacked ensemble, linear SVC stacked ensemble, and SVC stacked ensemble achieved accuracy scores of 90.00 %, 81.67 %, 91.67 %, and 93.33 %. The results indicated that the proposed silent speech interface was viable and could be improved through certain recommended actions. |