Introduction
I recently needed to be able to generate a Morse code audio file based on input text. After a few quick searches, I wasn't able to find anything that suited my needs, so I decided to write a generator myself. The result, CodeGen, is presented here.
For my purposes, I needed to access the Morse code audio over the web, so I decided to use PHP as the main programming language. The above screenshot shows a web page that starts the Morse code generation. The zip-file contains the web page to submit the text and the PHP source code file that actually generates and presents the audio file. If you want to test the PHP code, the web page and the associated PHP file will need to be copied onto a PHP-enabled web server.
Morse code is familiar to many people simply as a sequence of dot and dash characters or a bunch of beeps that appear in a few old movies. Unfortunately, those descriptions don't help very much if we're trying to write computer code to generate Morse code. This article will describe the parameters that define Morse code, explain how a WAVE formatted sound file is generated, and present the PHP code to implement the Morse code translation and generate the WAVE file.
Morse Code
Morse code is a text encoding method that has the advantage of being easy to encode and can be decoded using the human ear. Essentially, Morse code is generated by turning on and off an audio (or RF) source and forming short and long pulses of sound, referred to colloquially as dots and dashes, or in radio communications jargon as dits and dahs. In modern digital communications parlance, Morse code would be described as a form of amplitude shift keying (ASK).
In Morse code, characters (letter, numbers, punctuation, and special symbols) are encoded as a sequence of dits and dahs, so to convert text to Morse code, we first need to determine how to represent the symbols. An obvious choice would be to represent a dit as a 0 bit and a dah as a 1 bit, or vice-versa. Unfortunately, Morse code uses a variable length encoding scheme, so it's also necessary to use a variable length sequence or find a way to pack the data into a fixed bit-size commonly used in computer memory. In addition, it is important to note that Morse code does not differentiate between upper and lower case letters, and also lacks encoding for special symbols and some characters. In this implementation, all characters and symbols not defined are ignored.
In this project, saving memory was not a real issue, so a simple encoding scheme was devised using an associative array of strings representing each dit with a '0' and each dah with a '1'. The PHP code defining the Morse code encoding table is as follows:
$CWCODE = array ('A'=>'01','B'=>'1000','C'=>'1010','D'=>'100','E'=>'0',
'F'=>'0010','G'=>'110','H'=>'0000','I'=>'00','J'=>'0111',
'K'=>'101','L'=>'0100','M'=>'11','N'=>'10', 'O'=>'111',
'P'=>'0110','Q'=>'1101','R'=>'010','S'=>'000','T'=>'1',
'U'=>'001','V'=>'0001','W'=>'011','X'=>'1001','Y'=>'1011',
'Z'=>'1100', '0'=>'11111','1'=>'01111','2'=>'00111',
'3'=>'00011','4'=>'00001','5'=>'00000','6'=>'10000',
'7'=>'11000','8'=>'11100','9'=>'11110','.'=>'010101',
','=>'110011','/'=>'10010','-'=>'10001','~'=>'01010',
'?'=>'001100','@'=>'00101');
Note that if saving memory is an issue, the above encoding can be interpreted as bits. Adding a start bit to each code would yield a bit pattern that can be stored in a single byte for each character. When using the resulting encoding, the byte would be bit shifted left until the start bit is found to determine the variable length code.
Although most people don't realize it, Morse code is mainly defined by timing parameters, so representing those correctly is of primary importance in generating Morse code. The first thing we need to do is define the timing inherent in Morse code characters. By convention, a unit time, dt
, is defined as the length of a single dit sound and the space between dit/dah symbols is the same length as a dit. The length of a dah and the length of time between letters are both equal to 3 times the length of a dit. The space between words is usually equal to 7 times the length of a dit, so the following timing table can be defined:
Item | Duration |
dit | dt |
inter-symbol space | dt |
dah | 3*dt |
inter-character space | 3*dt |
inter-word space | 7*dt |
In Morse code, the transmission speed is usually expressed in words/minute (WPM). Since English words have different lengths, and characters have different numbers of dits and dahs, converting WPM to digital sample timing is not obviously straightforward. One definition adopted by international convention uses 5 characters as an average word length, with numbers and punctuation marks counted as 2 characters. That results in an average of 50 time units per word. As a result, if you specify the speed in WPM, then the timing is 50*WPM time units/minute, and the length of time for a single dit is dt = 1.2/WPM seconds. Given the length of a dit, all of the other timing parameters can be easily determined.
You might have noticed on the web page illustrated above that below 15 WPM, Farnsworth spacing is used. What the heck is "Farnsworth spacing?"
When one is learning to decipher (copy) Morse code by ear, it has been recognized for a long time that as the speed changes, the apparent rhythm of the characters also seems to change. Below about 10 WPM, it is possible for a person to count dits and dahs and then decide what character was sent, but above about 10 WPM, that just isn't ordinarily possible, and code is recognized more by the rhythm of the characters than by the actual number of dits and dahs. People who learn Morse code at a slow speed often have trouble progressing to higher speeds because they either subconsciously count the symbols or because the rhythm seems to change.
In an effort to ease the transition from learning Morse code at a slow speed to copying at higher speeds, Farnsworth spacing was developed. Essentially, the symbols and letters are sent at a high speed, often around 15 WPM, but the overall slower speed is maintained by inserting more space between characters. Thus, one can hear the sound and rhythm of the characters at a reasonable speed, and once the letters are all learned, to increase the speed, you only have to do that faster. Essentially, the Farnsworth spacing technique removes the change in rhythm to permit (hopefully) faster learning.
In this system, for slower speeds, the Morse code timing is calculated so that characters are sent at 15 WPM, corresponding to a dit length of 0.08 seconds, but the inter-character and inter-word timing, instead of being 3 or 7 dits long, is adjusted so the correct overall speed is obtained.
Sound Generation
In the PHP code, a string of characters corresponding to the common Morse audio elements dit, dah, and space are pre-computed. These audio samples are then concatenated as needed to form the sound sequences and finally written to a file with the needed header information to define the WAVE format.
The code to generate the sounds is fairly straightforward, and can be found in the PHP file of the project. I found it convenient to define a "numerical oscillator," Osc()
that returns timed samples from a sine wave each time it is called. Using the sound sampling and sound frequency specifications, generating the audio waveform is easy enough. The generated sine wave varying from -1 to +1 is shifted and adjusted so that the sound byte data varies between 0 and 255 and a value of 128 represents zero amplitude.
There is one other consideration in the sound generation, however. Normally, Morse code is described as being generated by an on-off switch, corresponding to a square wave. If you actually try to do that, you find that the generated signal has an enormous bandwidth and a sound best described as "clicks". For that reason, in radio equipment, the wave is always shaped to give it a "softer" sound and use much less bandwidth.
In our case, we need to do the same thing, but numerically. Since we know the length of time of our smallest sound sample, the dit, it can be shown that the minimum bandwidth occurs when the sound amplitude rises in the shape of a sine wave with a half period equal to the length of a dit. The same effect could be obtained by passing the generated signal through a low pass filter, but since we already know all of the signal characteristics, it's just simpler to generate the filtered signal directly.
The PHP code to generate a dit, dah, and space is as follows:
while ($dt < $DitTime) {
$x = Osc();
if ($dt < (0.5*$DitTime)) {
$x = $x*sin((M_PI/2.0)*$dt/(0.5*$DitTime));
$ditstr .= chr(floor(120*$x+128));
$dahstr .= chr(floor(120*$x+128));
}
else if ($dt > (0.5*$DitTime)) {
$dahstr .= chr(floor(120*$x+128));
$x = $x*sin((M_PI/2.0)*($DitTime-$dt)/(0.5*$DitTime));
$ditstr .= chr(floor(120*$x+128));
}
else {
$ditstr .= chr(floor(120*$x+128));
$dahstr .= chr(floor(120*$x+128));
}
$spcstr .= chr(128);
$dt += $sampleDT;
}
$dt = 0;
while ($dt < $DitTime) {
$x = Osc();
$dahstr .= chr(floor(120*$x+128));
$dt += $sampleDT;
}
$dt = 0;
while ($dt < $DitTime) {
$x = Osc();
if ($dt > (0.5*$DitTime)) {
$x = $x*sin((M_PI/2.0)*($DitTime-$dt)/(0.5*$DitTime));
$dahstr .= chr(floor(120*$x+128));
}
else {
$dahstr .= chr(floor(120*$x+128));
}
$dt += $sampleDT;
}
WAVE File Format
The WAVE file format is a commonly used audio format. In its simplest form, the file simply contains a sequence of integer numbers representing sound amplitude at a specified sample rate preceded by a header. Complete details of the WAVE file specification can be found on the Audio File Format Specifications website[^]. For purposes of generating Morse code, we don't need to use all of the options available in the WAVE format. Only a single 8-bit sound channel is needed, so the format is particularly easy to generate. Note that multiple byte data is represented in little-endian byte order. The WAVE file uses a format known as RIFF, and consists of a series of records called "chunks".
The WAVE file itself starts with the ASCII identifier RIFF, 4 bytes representing the size in bytes, a WAVE header consisting of the ASCII characters, WAVE, followed by data defining the format and sound data.
The first chunk, in our case, consists of the format specifier that begins with the ASCII characters fmt followed by a 4-byte chunk size that is equal to 16, 18, or 40, depending on the sound encoding format used. In this application, I use plain vanilla PCM format, so the chunk size is always 16 bytes and the required data is the number of channels, sound samples/second, average bytes/second, a block align indicator, and the number of bits/sound sample. For this application, there is no need for high quality stereo sound, so the PHP code generates the format chunk as a string of characters assuming single channel (mono), 8-bit sound generated at a rate of 11050 samples/second. Note that standard CD quality audio is 44200 samples/second.
Finally, the actual sound data is contained in the next and final chunk, consisting of the ASCII characters data, the 4-byte chunk size, and then the actual sound data as a sequence of bytes (since we specified 8 bits/sample).
The program generates the sound as a sequence of 8-bit audio amplitude numbers represented in the variable $soundstr
. Once the sound itself has been generated, then the chunk sizes can be determined and the entire file can be put together and written to disk. The brute force PHP code that generates the headers and the sound chunk is shown below. Note that $riffstr
represents the RIFF header, $fmtstr
represents the format chunk, and $soundstr
contains the sound data chunk.
$riffstr = 'RIFF'.$NSizeStr.'WAVE';
$x = SAMPLERATE;
$SampRateStr = '';
for ($i=0; $i<4; $i++) {
$SampRateStr .= chr($x % 256);
$x = floor($x/256);
}
$fmtstr = 'fmt '.chr(16).chr(0).chr(0).chr(0).chr(1).chr(0).chr(1).chr(0)
.$SampRateStr.$SampRateStr.chr(1).chr(0).chr(8).chr(0);
$x = $n;
$NSampStr = '';
for ($i=0; $i<4; $i++) {
$NSampStr .= chr($x % 256);
$x = floor($x/256);
}
$soundstr = 'data'.$NSampStr.$soundstr;
Conclusion and Comments
The Morse code generation software presented here seems to work well enough for converting text to audio Morse code. Of course, there are many modifications and improvements that could be made, including the use of other character sets, reading text directly from a file, generating compressed audio, etc. Since the object of this effort was to make a conversion program available for use over the web, this simple solution seems to have served its purpose.
Of course, as always, any suggestions for improving the brute force code are appreciated. I owe the background information on Morse code to many people who taught me over the years, but I'm sure that any errors or omissions must be my own fault.
Revision History
- 2 June 2010: Initial submission.