Rodney - A Long Time Coming Autonomous Robot (Part 3)

Phil Hopley

5.00/5 (11 votes)

17 Jan 2019CPOL15 min read

20.9K

545

Third part in a series on a ROS (Robot Operating System) House Bot

Introduction

The Rodney Robot project is a hobbyist robotic project to design and build an autonomous house-bot using ROS (Robot Operating System). This article is the third in the series describing the project.

Background

In Part 1 to help define the requirements for our robot, we selected our first mission and split it down into a number of Design Goals to make it more manageable.

The mission was taken from the article Let's build a robot! and was: Take a message to... - Since the robot will [have] the ability to recognize family members, how about the ability to make it the 'message taker and reminder'. I could say 'Robot, remind (PersonName) to pick me up from the station at 6pm'. Then, even if that household member had their phone turned on silent, or were listening to loud music or (insert reason to NOT pick me up at the station), the robot could wander through the house, find the person, and give them the message.

The design goals for this mission were:

Design Goal 1: To be able to look around using the camera, search for faces, attempt to identify any people seen and display a message for any identified
Design Goal 2: Facial expressions and speech synthesis. Rodney will need to be able to deliver the message
Design Goal 3: Locomotion controlled by a remote keyboard and/or joystick
Design Goal 4: Addition of a laser ranger finder or similar ranging sensor used to aid navigation
Design Goal 5: Autonomous locomotion
Design Goal 6: Task assignment and completion notification

In the second part of the article, we completed Design Goal 1. In this part, I'll give Rodney facial expressions and speech to complete Design Goal 2.

Mission 1, Design Goal 2

Facial Expression

The hardware making up the robot head includes the 7" Touchscreen Display which is ideal for giving the robot a face, but utilising it for facial expressions seems like a big task. However, in the first article, I explained that there are many ROS packages available from the ROS community leaving us free to concentrate on the robot application and that's exactly what we are going to make use of here.

We will make use of the homer_robot_face package from the University of Koblenz. This package includes two different selectable faces but it is also possible to model your own character. This package also includes speech synthesis using the Mary TTS (Text to Speech) generator. As this consumes a lot of memory, it is not suitable for single board computers, but we will write our own TTS node suitable for the Raspberry Pi later in this article.

This video from the University of Koblenz shows the range of facial expressions available with the package.

To install the robot face package for ROS Kinetic, run the following command in a terminal.

$ sudo apt-get install ros-kinetic-homer-robot-face

The configuration of the face is done by editing the config.cfg file. It would have been more user friendly if you could pass the path of a configuration file to the node, but the location of the file appears to be hard code. It is therefore necessary to edit the file within the package folder. The folder /opt/ros/kinetic/share/homer_robot_face/config contains the config.cfg file and a number of example files. The package comes with two sets of mesh files, 'Lisa' represents a female face and 'GiGo' a male face. For the Rodney project, I edit the config.cfg file to contain the following:

Mesh Filename : GiGo
Head Color : 1.0, 1.0, 1.0
Iris Color : 0.0, 1.0, 1.0
Outline Color : 0.0, 0.0, 0.0
Voice : male
Window Width : 600
Window Height : 600
Window Rotation : 0

Since we will be using our own speech synthesis node, the Voice parameter is not actually used.

If you wish to get creative and design your own character, there are some notes on modeling a face at http://wiki.ros.org/robot_face.

We can test the installation and the configuration with the following.

In a terminal, start a ROS master node with the following command:

$ roscore

In a second terminal, start the robot face node with the following command:

$ rosrun homer_robot_face RobotFace

Running on my Linux PC, I got the following neutral facial expression.

In another terminal, enter the following command:

$ rqt_graph

From the graph, we can see that the node subscribes to the following topic:

/robot_face/talking_finshed - Once you have finished generating the speech, you are required to send a message on this topic
/robot_face_expected_input - With this topic, you can display a status message which appears below the face
/robot_face_image_display - Not used in the Rodney project
/robot_face_ImageFileDisplay - Not used in the Rodney project
/robot_face/text_out - Text in this topic is used to animate the mouth to match the voice and the text appears below the face. By embedding smileys in the text, you can change the facial expression
/recognized/speech - Not used in the Rodney project

We can use rostopic to see this in action. In a terminal, type the following commands:

$ rostopic pub -1 /robot_face/expected_input std_msgs/String "Battery Low"

You should see the status message displayed below the face.

$ rostopic pub -1 /robot_face/text_out std_msgs/String "Hello my name is Rodney:)"

You should see the text below the face (minus the smiley), the face should animate the speech and change from the neutral expression to an happy expression.

Now send the following command to indicate the speech is complete.

$ rostopic pub -1 /robot_face/talking_finshed st_msgs/String "q"

It doesn't matter what this string contains but until you send this message, the face will not respond to another /robot_face/text_out message.

In the /robot_face/text_out message above, we changed the expression by use of a smiley ":)", the list below gives the smileys available.

"." Neutral
":)" Happy
":(" Sad
">:" Angry
":!" Disgusted
":&" Frightened
":O" or ":o" Surprised

We will come back to the robot face node when we integrate it into our system.

Giving Rodney a Voice

Now the installation of the robot face package will also have installed The MARY TTS System, however we are going to write a ROS node that uses the much simpler pico2wav TTS system. Our node will use pico2wav to generate a temporary wav file which will then be played back. We will also add functionality to play existing short wav files.

Our ROS package for the node is called speech and is available in the speech folder. The package contains all the usual ROS files and folders.

The cfg folder contains the file speech.cfg. This file is used by the dynamic reconfiguration server so that we can adjust some of the wav playback parameters on the fly. We used the dynamic reconfiguration server in part 1 of the article to trim the servos. This file contains the follow Python code.

Python

#!/usr/bin/env python
PACKAGE = "speech"

from dynamic_reconfigure.parameter_generator_catkin import *

gen = ParameterGenerator()

gen.add("pitch",  int_t,    0, "Playback Pitch",  -300,  -1000, 1000)
gen.add("vol",    double_t, 0, "Playback volume", 0.75, 0,   1)
gen.add("bass",   int_t,    0, "Bass",            0, -10, 10)
gen.add("treble", int_t,   0, "Treble",          0, -10, 10)
gen.add("norm",   bool_t,   0, "Normalise audio", True)

lang_enum = gen.enum([ gen.const("en_US", str_t, "en-US", "English US"),
                       gen.const("en_GB", str_t, "en-GB", "English GB"),
                       gen.const("fr_FR", str_t, "fr-FR", "French"),
                       gen.const("es_ES", str_t, "es-ES", "Spanish"),
                       gen.const("de_DE", str_t, "de-DE", "German"),
                       gen.const("it_IT", str_t, "it-IT", "Italian")],
                     "An enum to set the language")

gen.add("lang", str_t, 0, "Voice language", "en-GB", edit_method=lang_enum)

exit(gen.generate(PACKAGE, "speechnode", "Speech"))

For a complete understanding of the dynamic reconfiguration server, refer to the ROS Wiki section dynamic reconfiguration. For now in our file, you can see that we add six parameters to the dynamic configuration server.

The msg folder contains a definition file for a user defined message. The file is named voice.msg and contains the following:

string text # Text to speak
string wav  # Path to file to play

The message contains two elements, text will contain the text to turn into speech and wav will contain a path and filename of a wav file to play. Our code will first check to see if wav contains a path and if so, it will play the wav file, if wav is an empty string, then text will be used to create a wav file.

The include/speech and src folders contain the C++ code for the package. For this package we have one C++ class, SpeechNode and a main routine contained within the speech_node.cpp file.

The main routine informs ROS of our node, creates an instance of our class which contains the code for the node, passes a callback function to the dynamic reconfiguration server and creates a ros::Rate variable which will be used to time our control loop to 10Hz. When inside the loop we call r.sleep, this Rate instance will attempt to keep the loop at 10Hz by accounting for the time used to complete the work in the loop.

Our loop will continue while the call to ros::ok returns true, it will return false when the node has finished shutting down, e.g., when you press Ctrl-c on the keyboard.

In the loop, we will call speakingFinshed which is described later in the article.

C++

int main(int argc, char **argv)
{
	ros::init(argc, argv, "speech_node");
			
	SpeechNode *speech_node = new SpeechNode();
	
	dynamic_reconfigure::Server<speech::SpeechConfig> server;
    dynamic_reconfigure::Server<speech::SpeechConfig>::CallbackType f;
  	
  	f = boost::bind(&SpeechNode::reconfCallback, speech_node, _1, _2);
    server.setCallback(f);
  
    std::string node_name = ros::this_node::getName();
	ROS_INFO("%s started", node_name.c_str());
	
	// We want a delay from when a speech finishes to when 
    // /robot_face/talking_finished is published
	
	ros::Rate r(speech_node->LOOP_FREQUENCY_);
	
    while(ros::ok())
    {
        // See if /robot_face/talking_finished should be published 
        speech_node->speakingFinished();
        
        ros::spinOnce();
        r.sleep();
    }
    
	return 0;
}

The constructor for our class subscribes to the topic /speech/to_speak in order to receive the text to speak or the location of the wav file to play. It also advertises that it will publish the topic /robot_face/talking_finshed. As we know, this topic informs the face that the talking has finished.

C++

// Constructor 
SpeechNode::SpeechNode()
{
    voice_sub_ = n_.subscribe("/speech/to_speak", 5, &SpeechNode::voiceCallback, this);

    talking_finished_pub_ = n_.advertise<std_msgs::String>("/robot_face/talking_finished", 5);
    
    finshed_speaking_ = false;
}

I'll now briefly describe the functions that make up the class.

The function reconfCallback is called by the dynamic reconfiguration server if any of the parameters are changed. The function simply stores the new values for use in the next playback of the temporary speech wav file.

C++

// This callback is for when the dynamic configuration parameters change
void SpeechNode::reconfCallback(speech::SpeechConfig &config, uint32_t level)
{
    language_ = config.lang;
    vol_ = config.vol;
    pitch_ = config.pitch;
    bass_ = config.bass;
    treble_ = config.treble;
    norm_ = config.norm;
}

The voiceCallback function is called when a message on the /speech/to_speak topic is received. If the wav element of the message is not empty, then the supplied wav filename is played using sox (Sound eXchange). Note that since we want to playback an already existing wav file and not to speak in our robot's voice, none of the dynamic reconfiguration parameters are used.

If the wav element is empty, then string in the text element is to be spoken. We start by constructing a string for the call to pico2wav, this string includes our temporary filename, and the language parameter. The call to pico2wav should result in the creation of a wav file without text converted to speech. A string is then constructed to be used in making a system call to sox, this time we use the dynamic reconfiguration parameters so that we can control the sound of the robot's voice. For example, pico2wav only contains a female voice but by changing the pitch, we can give the robot a male voice (which we want since ours is called Rodney).

The function finishes by setting a flag to indicate that we need to send a message on the /robot_face/talking_finshed topic. We also set a countdown counter value which is used to time 20 executions of the control loop before the /robot_face/talking_finshed message is sent.

C++

// This callback is for when a voice message received
void SpeechNode::voiceCallback(const speech::voice& voice)
{               
    // Check to see if we have a path to a stock wav file
    if(voice.wav != "")
    {       
        // Play stock wav file
        // Use play in sox (Sound eXchange) to play the wav file, 
        // --norm stops clipping and -q quite mode (no console output)
        std::string str = "play " +  voice.wav + " --norm -q";        
                                             
        ROS_DEBUG("%s", str.c_str());               
        
        if(system(str.c_str()) != 0)
        {
            ROS_DEBUG("SpeechNode: Error on wav playback");            
        }
    }
    else
    {                        
        std::string filename = "/tmp/robot_speach.wav";                         
        
        std::string str;
        
        // create wav file using pico2wav from adjusted text.
        str = "pico2wave --wave=" + filename + " --lang=" + 
               language_ + " \"" + voice.text + "\"";
        
        ROS_DEBUG("%s", str.c_str());
        
        if(system(str.c_str()) != 0)
        {
            ROS_DEBUG("SpeechNode: Error on wav creation");            
        }
        else
        {                
            // Play created wav file using sox play but use parameters bass, 
            // treble, pitch and vol 
            std::string bass = " bass " + std::to_string(bass_);
            std::string treble = " treble " + std::to_string(treble_);
            std::string pitch = " pitch " + std::to_string(pitch_);        
        
            if(norm_ == true)
            {            
                str = "play " +  filename + " --norm -q" + pitch + bass + treble;          
            }
            else
            {
                std::string volume = " vol " + std::to_string(vol_);
                str = "play " +  filename + " -q" + pitch + bass + treble + volume;           
            }
        
            ROS_DEBUG("%s", str.c_str());               
        
            if(system(str.c_str()) != 0)
            {
                ROS_DEBUG("SpeechNode: Error on wav playback");            
            }
        }
    }
    
    // Set up to send talking finished
    finshed_speaking_ = true;
    loop_count_down_ = (int)(LOOP_FREQUENCY_ * 2);
}

The function speakingFinished is called by the control loop in main. If we have kicked off the playback of either a wav file that already exists or our temporary wav file of text to speak, the function will count down each time it is called. When the counter reaches zero, the talking finished message is published. This gives the robot face node 2 seconds to animate the face before the finished speaking message is sent. You can increase this time if you find your robot has a lot to say, but bear in the mind that the pico2wav is intended for use with a limited number of characters for text to speech conversion.

C++

// If finshed speaking delay until the /robot_face/talking_finished topic is published
void SpeechNode::speakingFinished()
{
    if(finshed_speaking_ == true)
    {
        loop_count_down_--;
        
        if(loop_count_down_ <= 0)
        {
            finshed_speaking_ = false;

            // Send talking finished 
            std_msgs::String msg;   
            msg.data = "";
            talking_finished_pub_.publish(msg);        
        }    
    }
}

Face and Voice Integration

In the next article, we will bring the nodes from Goal 1 and Goal 2 together along with a state machine package that will be used to control the robot missions. For now, it is worth testing the robot face with our speech node.

Our ROS package for the test node is called rodney_voice_test and is available in the rodney_voice_test folder.

The include/rodney_voice_test and src folders contain the C++ code for the package. For this package, we have one C++ class, RodneyVoiceTestNode and a main routine contained within the rodney_voice_test_node.cpp file.

The main routine informs ROS of our node, creates an instance of the class for the node and passes it the node handle, logs that the node has started and hands control to ROS with the call to ros::spin.

C++

int main(int argc, char **argv)
{
    ros::init(argc, argv, "rodney_voice_test");
    ros::NodeHandle n;    
    RodneyVoiceTestNode rodney_test_node(n);   
    std::string node_name = ros::this_node::getName();
	ROS_INFO("%s started", node_name.c_str());
    ros::spin();
    return 0;
}

We are going to use a keyboard node which is available from https://github.com/lrse/ros-keyboard, to interact with the system. In the constructor, we subscribe to the topic keyboard/keydown and call the function keyboardCallBack when a message is received on that topic.

The constructor also advertises that the node will publish the topics for the speech and robot face node.

C++

RodneyVoiceTestNode::RodneyVoiceTestNode(ros::NodeHandle n)
{
    nh_ = n;
    
    // Subscribe to receive keyboard input
    key_sub_ = nh_.subscribe("keyboard/keydown", 100, 
               &RodneyVoiceTestNode::keyboardCallBack, this);

    // Advertise the topics we publish
    speech_pub_ = nh_.advertise<speech::voice>("/speech/to_speak", 5);
    face_status_pub_ = nh_.advertise<std_msgs::String>("/robot_face/expected_input", 5);
    text_out_pub_ = nh_.advertise<std_msgs::String>("/robot_face/text_out", 5);
}

The function keyboardCallBack checks the received message for one of three keys. If the lower case 's' is pressed, we test the status display functionality by creating a message and publish it on the /robot_face/expected_input topic.

If the lower case 't' is pressed, we test the speech and speech animation by creating two messages, one that contains the text to speak and the other contains the text to animate the robot face. Note how we add the ':)' smiley to the greeting variable after we have used it to create the text to speak message, we don't want pico2wav trying to speak this as part of the text. We then publish the two messages, one to the face and the other to the speech node.

If the lower case 'w' is pressed, we test the wav file playback and speech animation again by creating two messages. This time, the message going to the speech node contains the path to a wav file instead of the text to speak. Notice however the message to the robot face still contains text to match the contents of the wav file so that the face is still animated during playback.

C++

void RodneyVoiceTestNode::keyboardCallBack(const keyboard::Key::ConstPtr& msg)
{  
    // Check no modifiers apart from num lock is excepted
    if((msg->modifiers & ~keyboard::Key::MODIFIER_NUM) == 0)
    {
        // Lower case
        if(msg->code == keyboard::Key::KEY_s)
        {            
            // Test status display
            std_msgs::String status_msg;            
            status_msg.data = "Rodney on line";
            face_status_pub_.publish(status_msg);                                       
        }
        else if(msg->code == keyboard::Key::KEY_t)
        {
            // Test speech and animation
                      
            // String to send to robot face
            std_msgs::String greeting;
            greeting.data = "Hello my name is Rodney";
            
            // Voice message
            speech:: voice voice_msg;
            voice_msg.text = greeting.data;
            voice_msg.wav = "";
            
            // Add the smiley
            greeting.data += ":)";
            
            // Publish topics for speech and robot face animation
            text_out_pub_.publish(greeting);
            speech_pub_.publish(voice_msg);
        }
        else if(msg->code == keyboard::Key::KEY_w)
        {
            // Test wav playback and animation
            // String to send to robot face
            std_msgs::String greeting;
            greeting.data = "Danger Will Robinson danger:&";
            
            speech:: voice voice_msg;            
            std::string path = ros::package::getPath("rodney_voice_test");
            voice_msg.text = "";
            voice_msg.wav = path + "/sounds/lost_in_space_danger.wav";            
        
            // Publish topics for sound and robot face animation
            text_out_pub_.publish(greeting);
            speech_pub_.publish(voice_msg);        
        }
        else
        {
            ;
        }
    }
}

The folder launch contains the file test.launch. This file will be used to launch the two nodes under test and the two test nodes from one terminal.

XML

<?xml version="1.0" ?>
<launch>  
  <node pkg="homer_robot_face" type="RobotFace" name="RobotFace" output="screen"/>
  <node pkg="speech" type="speech_node" name="speech_node" output="screen"/>
  <node pkg="rodney_voice_test" type="rodney_voice_test_node" 
        name="rodney_voice_test_node" output="screen" />
  <node pkg="keyboard" type="keyboard" name="keyboard" output="screen" />
</launch>

Using the Code

You can test the code on either a Linux PC or on the robot hardware, in my case, a Raspberry Pi.

Robot Hardware

Now if you are testing the code on a PC, you probably already have a speaker and amplifier built in, but as our robot is built around a Raspberry Pi, we need some hardware to hear the voice playback. I have added an Adafruit Mono 2.5W Class D Audio Amplifier PAM8302 and an 8 Ohn speaker to the hardware. I have simply connected this to the Pi audio jack, the speaker and Pi's 5V supply.

The audio amp is on a small Veroboard mounted on the back of the tilt arm and the speaker is mounted on the front of the neck, just below the pan servo.

Whist on the subject of hardware. When I first started running the robot face node, the UK was in the middle of a heatwave and I found that the Raspberry Pi was sometimes over heating. I have therefore added a small heat sink and fan to the processor board. Within the 3D print zip folder, there is a bracket to attach the fan to the Raspberry Pi.

System Applications

If not already installed, install the pico2wav and SoundeXchange applications.

$ sudo apt-get install libttspico-utils

$ sudo apt-get install sox libsox-fmt-all

Building the ROS Packages

Create a workspace with the following commands:

$ mkdir -p ~/test_ws/src 
$ cd ~/test_ws/ 
$ catkin_make

Copy the packages rodney_voice_test, speech and ros-keyboard (from https://github.com/lrse/ros-keyboard) into the ~/test_ws/src folder and then build the code with the following commands:

$ cd ~/test_ws/ 
$ catkin_make

Check that the build completes without any errors.

Running the Code

Now we are ready to run our code. Use the launch file to start the nodes with the following commands. If no master node is running in a system, the launch command will also launch the master node, roscore.

$ cd ~/test_ws/
$ source devel/setup.bash
$ roslaunch rodney_voice_test test.launch

In the terminal, you should see:

a list of parameters now in the parameter server
a list of our nodes
the address of the master
log information from our code

Two other windows will open, one with robot face and a second that when given the focus will input keyboard strokes.

In a second terminal, run the following command to start rqt_graph.

$ rqt_graph

From the graph, you should see the nodes under test and the test nodes running. You should also see the nodes linked by the topics. Any broken links is an indication of misspelt topics in the code.

The robot face will currently have a neutral expression. Run the following tests:

Make sure the keyboard window has the focus and press the 's' key.
The status message will appear below the face.
Make sure the keyboard window has the focus and press the 't' key.
The robot voice will be heard and the robot face mouth will be animated. It will end with a happy expression.
Make sure the keyboard window has the focus and press the 'w' key.
The wav file playback will be heard and the robot face mouth will be animated. It will end with a frightened expression.

Next, you can adjust the playback parameters of the voice. In a terminal, start rqt_reconfigure with the following command:

rosrun rqt_reconfigure rqt_reconfigure

This will bring up a user interface like the one shown below. Adjust the parameters, give the keyboard window the focus and press 't' to hear the difference.

Once you are happy with the values, you can edit the speech.cfg to include the values as the defaults. Then the next time the speech node starts, these values will be used. Note although the speech.cfg file is Python, you must re-make the package for the changes to take effect.

To terminate the nodes, hit Ctrl-c in the terminal.

If you have run the robot face node on the Raspberry Pi, you may have noticed that the face is not central to the screen. The flavour of Ubuntu (Lubuntu) I'm running on Rodney uses openbox for controlling the GUI. By editing .config/openbox/lubuntu-rc.xml and adding the following to the file, the Robot Face is displayed in the centre of the screen when it launches.

XML

        <application name="RobotFace">
            <position force="yes">
                <x>200</x>
                <y>0</y>
            </position>
        </application>

    </applications>
</openbox_config>

Points of Interest

In this part of the article, we have given Rodney facial expression and speech to complete our Design Goal 2.

In the next article, I'll introduce the state machine package that will be used to control the robot missions. We will also add code to give instructions to move the robot around manually and test the code on both the robot hardware and a robot simulation.

History

9^th October, 2018: Initial release
11^th January, 2019: Version 2: Code changes from Part 1 and 2 added to the zip file

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)