Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / DevOps

Hot Word / Voice Recognition using an ESP8266

5.00/5 (2 votes)
20 Jun 2020CPOL5 min read 16.7K   191  
Add hot word recognition to your smarthome the ESP8266 way using MicroPython
I was curious if it would be possible to implement speech recognition (or hot word detection) with an ESP 8266 in Micropython to integrate it to my home automation system.

Introduction

Most of my home automation is controlled by doing HTTP request against devices in my network.
(For example: switch the light on, turn on the radio, control the heating system…).
This could be easily done using an ESP8266. I have one of those controllers combined with a touch sensor and use it to control light and music when I am in my bed.

Adding voice control like Amazon Echo or Google Homepod does would be a nice feature.

Background

The limited capabilities of the ESP 8266 lead to an add on module that does the hard task of hot word detection. Lately, I noticed an article in the make magazine that covered that topic on an Arduino basis with different voice recognition boards. I was wondering if it would be possible to port the function to the ESP using Micropython.

This is not some kind of library that you can add to your own product, this is just some code that shows how to implement the hot word training and the hot word recognition.

I am using the Wemos D1 mini because I had some of them at my desk. The wiring is pretty simple because the Easy VR accepts 5 V as well as 3.3 V at one of his power pins. Besides power, just RX/TX for serial communication is needed.

The Wemos need to be flashed with Micropython (1.12 branch).

I would suggest to put both modules on a breadboard and connect USB power to the Wemos.

Image 1

If you prefer soldering, you can go this route:

Image 2

For programming and testing, I enabled repl and connect to it using webrepl on 192.168.4.1 after I joined the hotspot provided by the Wemos.

The communication protocol is extremely well documented in the manual of the EasyVR provided on the internet here.

The EasyVR accepts two kinds of voice commands. The speaker independent (SI) commands, for what you need a special tool (available at the manufacturers website [not free of charge]) to create, or speaker dependent (SD) which could be recorded using the microphone of the EasyVR. Voice commands are organized in groups in the EasyVR, where group 0 should have the hot words and the other groups should contain voice command. Group 16 is reserved for passwords. You could check within the documentation for technical details.

Using the Code

I integrated three Python code files (train.py, recognition.py and manage.py) which you can copy to the ESP and run them by importing them using this syntax on the Micropython shell.

Python
import train
import recognize
import manage

On some ESP 8266, you will notice that it is complaining about memory when you imported one module and like to import another one. This is because of the small memory footprint of some of the 8266. To avoid this, you could use an ESP32 or just press Ctrl-D, reconnect the webrepl session and import then. This will do a soft reset which frees up most of the used memory.

The functions in detail:

Most of the functions in the modules are self-explaining.

Importing the train module:

Image 3

  1. Shows how many commands are defined in each group
  2. Shows the details for every command in groups (no, number of trains, flags, conflicting commands, label for command)

    Image 4

  3. Set the language to English (important for built in words)
  4. Set the language to German (important for built in words)
  5. Set the microphone sensitivity distance to really short (directly in front of your mouth)
  6. Set the microphone sensitivity distance to about 1-2 meters
  7. Set the microphone sensitivity distance to more than 2 meters (full room)
    1. Add (insert) a new command in a group and define its label. All existing commands in that group will be moved one step up.
    2. Train the voice command for that specific group and command number. (Should be done twice to increase hit rate)

The recognize menu has these functions:

Image 5

  1. Set the language to English (important for built in words)
  2. Set the language to German (important for built in words)
  3. Set the microphone sensitivity distance to really short (directly in front of your mouth)
  4. Set the microphone sensitivity distance to about 1-2 meters
  5. Set the microphone sensitivity distance to more than 2 meters (full room)
  6. Start the voice recognition for a specific group. One try is done.
    The result will be the recognized command number or the error number.
  7. Start the voice recognition for a specific group. Ten recognition tries are done.
    The result will be the recognized command number or the error number.

The manage menu offers these functions:

Image 6

  1. Shows the details for every command in groups (no, number of trains, flags, conflicting commands, label for command):

    Image 7

  2. Removes a defined command from a specific group (the follow up command will move to that position).

Points of Interest

One point of interest is that the basic ESP8266s only have one UART which is also used by the REPL interface, which is used to communicate with device (USB or WIFI). In a runtime environment, this might be no problem, because you normally wouldn’t use the (web)repl so frequently. But during development and debugging, (web)repl is an essential way for monitoring what is going on the device, so I looked around for a solution for that problem. Somewhere on the Internet, I found a forum post that pointed me in the right direction. The trick is to disable the webrepl the moment you communicate with the EasyVR (or whatever other serial device). But you have to make sure that you enable it back again, even in error situations or you will lose the connection to the device.

I implemented it this way:

Python
try:
    uos.dupterm(None, 1) # turn off the repl
    uart=UART(0,9600)
    uart.init(9600, bits=8, parity=None, stop=1)

    *** do your serial communication stuff here ***

except Exception:
    print ("Error occured")
finally:
    uos.dupterm(machine.UART(0, 115200), 1)  # enable the repl

Conclusion

It is easily possible to integrate hot word recognition to an ESP8266 to control other devices, for example in your smart home or even in a business environment.

The EasyVR module has several additional capabilities, e.g., switching I/O laying voice prompts, etc. You can check the documentation for details. The communication is always done in the same way, so you can use this code as some kind of blueprint.

History

  • 20th June, 2020: Version 1.00 released

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)