Friday, April 2, 2010

Sphinx4 Demo Help

Sphinx4 provids good number of demos which I used in my program. I actually had to write an application which will record user speech on client side and send it as wav file to Server. On server side I had to recognize this wav file and return back the result with a confidence score attached as to how well the speech was recognized.
Sounds pretty simple?

I decided to use Java Applet like the one in voxforge. Display a list of sentences and ask user to record the voice. I was part successful in it. I developed an Applet that used Java Sound APIs for recording and playing it back. I ran into certain security issues as Applets are not supposed to save any file locally on client machine or access file system. After digging came over with this issue by signing my applet jar using Jarsigner. So my front end is ready. This applet sends the wav file to server.


Next, Server side planning. For demo I used Sockets to receive the input and send out results. Sphinx4 has a sample program that shows how to pass input audio file to sphinx for recognition. Thats it. My task over. I later on created a new program based on demo to recognize more words and used my own Language model for this task. This was my first application using Sphinx. I wished to let users download the application and test. But one problem with Sphinx4 is that its based on Java and the Acoustic Model and Dictionary make the program heavy for me to upload.

Commenting on accuracy, I was not very satisfied. There are various factors that determine accuracy of SRS, like pronunciation, microphone quality, surrounding noise, etc. I got good results when I used it to recognize Digits. But on providing random words for recognition, accuracy came down to less than 50%. I visited forums for solution, still no proper solution.
Still focus is on improving the results. Changing few parameters did increase the accuracy but it did not convince me to use it for production purpose. I had to leave this work stalled for now.

Edit: This is one of the initial samples I had developed. Download

No comments:

Post a Comment