Introduction
Have you visited http://www.oddcast.com/ ? With this library you can create your agents in flash with lipsync and dinamically generated (sapi tts) audio in every language.
You need some know-how in javascript, actionscript (very similar to javascript) and Macromedia flash.
You need to istall in your test server:
- latest Microsoft Speech SDK installed on your test server (with the Microsoft English Recognizer set as default ASR);
- MingX, an free activex to generate Macromedia swf files;
- A free utility from James Anderson to extract the phonemes from a wave file;
- Lame.exe, a free utility to generate mp3 files on the fly;
- Some good SAPI 5.1 compliant voice to have good result (try Cepstral voices);
The process to use the library is very simple:
- a page (asp.net very simple) receive as POST DATA the text to TTS, the voice to use, and the swf filename to generate (without extension);
- calling (in this order: Genera.Wave, Genera.Mp3, Genera.Asr, Genera.FlashSwf) the functions of the library (and passing the appropriate data), will generate a swf file with embedded the TTS audio in mp3 format, and for each Flash frame the appropriate mouth position (based on the classical 13 disney visemes) for the lipsync.
A brief description of the functions (remember use Filename without extension):
- Genera.wave(Text, Filename, Voice) used to generate a wave from from a SAPI 5.1 compatible voice;
- Genera.Mp3(Filename) used to generate an mp3 file to use with Macromedia swf;
- Genera.Asr(FileName) used to generate with the James Anderson utility a text files with the timing an the phonemes to use later for the lipsync;
- Genera.FlashSwf(FileName) used to generate the swf file with the embedded audio and the mouth position for every flash frame;
- LeggiVociInstallate() used to get a list of the installed voice on the server.
Every functions return "OK" is the process is finished without error or return an error.
To test in win 2003 the used ApplicationPool in IIS MUST BE SET to LocalSystem (to work with SAPI).
I have tested the library with italian voice, german voice and english voice and is work well (for the lipsync).
Here an example of an asp.net page to post the data:
<%@ Page Language="VB" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "<A href="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd</A>">
<html xmlns="<A href="http://www.w3.org/1999/xhtml">http://www.w3.org/1999/xhtml</A>" >
<head runat="server">
<title>Untitled Page</title>
</head>
<body>
<div>
<form method="post" action="<A href="http://otherserver/tts.aspx">http://otherserver/tts.aspx</A>">
<input type="text" name="Testo" value="prova testo" /><br />
<input type="text" name="Voce" value="5" /><br />
<input type="text" name="FileName" value="3" /><br />
<input type="submit" value="Vai"/>
</form> </div>
</body>
</html>
Here an example of the page that receive the post data (tts.aspx) and the code inside:
<%@ Page Language="VB" AutoEventWireup="false" CodeFile="Tts.aspx.vb" Inherits="Tts" %>
Imports TtsSwf<BR>Partial Class Tts<BR> Inherits System.Web.UI.Page
Public GenerateError As Boolean = False
Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load<BR> Dim Genera As TtsSwf.Genera = New TtsSwf.Genera()<BR> 'Dim Testo As String = Request.QueryString("Text")<BR> 'Dim FileName As String = Request.QueryString("FileName")<BR> 'Dim Voce As String = Request.QueryString("Voce")
Dim Testo As String = Request.Form("Testo")<BR> Dim FileName As String = Request.Form("FileName")<BR> Dim Voce As String = Request.Form("Voce")
If Testo <> "" And FileName <> "" And Voce <> "" Then
If Genera.Wave(Testo, FileName, Voce) <> "OK" Then<BR> SetError()<BR> End If
If GenerateError = False Then<BR> If Genera.Mp3(FileName) <> "OK" Then<BR> SetError()<BR> End If<BR> End If
If GenerateError = False Then<BR> If Genera.Asr(FileName) <> "OK" Then<BR> SetError()<BR> End If<BR> End If
If GenerateError = False Then<BR> If Genera.FlashSwf(FileName) <> "OK" Then<BR> SetError()<BR> End If<BR> End If
Genera.Fine(FileName)
End If<BR> Response.Write("Testo:" + Testo + " voce:" + Voce + " filename:" + FileName)
End Sub
Sub SetError()<BR> GenerateError = True<BR> End Sub<BR>End Class
You can also use a main flash (with the mouth positions) swf to call the page to generate the swf and to load the generated swf.
A brief example:
Create an swf movie putting in a frame this action script code:
<BR>sendinfo = new LoadVars ();<BR><BR>sendinfo.Testo = testo;<BR><BR>sendinfo.FileName = filename;<BR><BR>sendinfo.Voce = voce;<BR><BR>sendinfo.sendAndLoad("http://otherserver/tts.aspx", _root, "POST");<BR><BR>loadMovie (("http://otherserver/ttscache/" + filename) + ".swf", "_root.codice");<BR><BR>delete testo;<BR><BR>delete filename;<BR><BR>delete voce;<BR><BR>par = _root.codice.getBytesLoaded();<BR><BR>tot = _root.codice.getBytesTotal();<BR><BR>if (par == tot) {<BR><BR>caricamento = "Ok";<BR><BR>}<BR>
Label this frame as "sayText".
Create a simple html page where put this javascript code and where you load the main movie:
<BR><script language="javascript"><BR>function sayText(Text,FileName,Voice)<BR>{<BR> var movie = window.document.test<BR><BR> movie.SetVariable("testo",Testo);<BR> movie.SetVariable("filename",FileName);<BR> movie.SetVariable("voce",Voce);<BR> movie.TCallLabel("/", "sayText");<BR>}<BR><BR>function Prova(){<BR> sayText('spero di aver finito il test.','2','5');<BR>}<BR></script><BR>
Create a link (like this) in the page: <a href="javascript:Prova();">Prova testo.<a>
The link call the actionscript (label sayText) in the main flash movie.
The flash call the page posting the data.
The page create the swf file with the audio and the mouth positions.
Once is loaded the main movie start the generated file.
Warning: if you put the main movie in a domain (ex: www.cisco.com) and the loaded movie is generated on an other server (www.microsoft.com) for security reasons this don't work.
Put this file in the generation server (naming it crossdomain.xml):
<?xml version="1.0"?><BR><!DOCTYPE cross-domain-policy <BR>SYSTEM "<A href="http:
To do:
- Use the Annosoft SAPI lipsync to generate the phonemes and the timings;
- Improve the generation of the swf files;
I think this library can be useful to every webmaster.
VB.NET, C# Developer.
Skills: VB.Net, ASP.Net, SQL 2000/2005.