Microsoft Speech SDK
10/18/2000
Welcome to Microsoft® Speech SDK. This file describes system requirements, installation notes, and known issues. This SDK provides the tools, information, and samples you need to incorporate speech technologies into your Windows® applications.
Before installing the SDK, read through this document to become familiar with installation and performance issues. This file accompanies Speech SDK 5.0 and is released under the License Agreement on the license.chm file on the CD or install point.
The following topics are available:
Supported operating systems are:
To save disk space, you can load a minimal configuration. This includes enabling only the following two options:
Component | Minimum RAM | Recommended RAM |
---|---|---|
TTS Engine | 14.5 Mb | 32.0 Mb |
SR Command and Control | 16 Mb | 32 Mb |
SR Dictation | 25.5 Mb | 128 Mb |
SR Both | 26.5 Mb | 128 Mb |
File Name | Approximate File Size | Setup Merge Names |
---|---|---|
Sapi.dll & Sapisvr.exe | .5Mb | Sp5.msm |
Sapi.cpl | 36k | Sp5Intl.msm |
SR Engine | 1.7Mb | Sp5Sr.msm |
Command and Control Datafiles | 13.4Mb | Sp5CCInt.msm |
Dictation Datafiles | 33Mb | Sp5DCInt.msm |
TTS Engine and Voices | 7.8Mb | Sp5TTInt.msm |
SAPI and another speech recognition system are not likely to run simultaneously because the two systems would be in contention as to which would use the microphone. Stop any other speech applications before running a SAPI 5 application. Multiple SAPI 5 applications using the shared recognizer will run simultaneously.
On Windows 98 First edition or non-English Second edition, if the computer previously did not have Windows installer, you need to reboot after installing the Speech SDK.
Before installing SAPI 5.0, you need to uninstall or delete previous versions of the SAPI 5 SDK. If the Microsoft Speech SDK needs to be removed, uninstall it from the same CD or install point used originally. This ensures all files are removed correctly.
The SAPI 5.0 release can coexist on your computer with SAPI 4.0. However, applications using different versions are not necessarily compatible and should not be run simultaneously.
You also need to uninstall any applications that use older SAPI 5 builds, including Microsoft Office® 10 beta.
You need to load SAPI 5.0 through the Windows Installer provided on the CD. Windows Installer support is inherent with newer versions of the operating system. However, should the Windows Installer not already present, it will automatically load from the CD or install point during set up.
You need administrator privileges on the computer to install the Speech SDK 5.0 properly.
If you select the silent installation or the "Only for me" option from the Windows Installer, other users will still see Speech properties in the Control Panel window but will not be able to modify it. In that case, other users need to install SAPI 5.0 as well.
None of the SAPI 5.0 components or compliance tests were tested with power-managed (OnNow) computers. As long as the system determines that there is application activity, it will not put the system or any devices into the sleeping state. However, if you encounter unexpected performance issues while using power management, OnNow should be disabled.
Occasionally, it can be difficult to uninstall a previous release of the Microsoft Speech SDK 5.0, and consequently, install the SDK itself. Here are two options:
(i) Run the application Regedit.exe. Delete all entries under HKEY_CURRENT_USER\Software\Microsoft\Speech\RecoProfiles\Tokens. Deleting the contents of this registry key removes the speech recognition profiles. Next, install the Speech SDK 5.0 .(ii) If your problem continues, delete the HKEY_CURRENT_USER\Software\Microsoft\Speech and the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech keys. Then try installing the Speech SDK 5.0 .
You will need to reinstall any recognizers/voices that you have previously installed on the computer.
The sapi.dll file installs into the directory \Program Files\Common Files\Microsoft Shared\Speech. However, the debugging support files sapi.pdb and sapi.sym install into a directory apart from sapi.dll (specifically, \windows\system on Windows 98 and winnt\system32 on Windows NT systems). To debug into SAPI, those two files (sapi.pdb and sapi.sym) need to be moved or copied into the same directory as sapi.dll.
There may be additional situations or conditions where SAPI 5.0 performs differently than you expect. Please refer to this list of known issues first. If anomalies persist, you are encouraged to contact sapi5@microsoft.com.
Compliance tests for some languages may not compile properly after you install a new SAPI 5.0 on Windows 2000. Load the language support for Japanese and Simplified Chinese that came with the Windows 2000 CD-ROM. For US Windows NT 4.0 or Windows 98, download Global IME for Japanese and Simplified Chinese Language support from http://www.microsoft.com/msdownload/iebuild/ime5_win32/en/ime5_win32.htm. You may also need to install Language packs for Japanese and Chinese from http://windowsupdate.microsoft.com. Click Product Updates, select both Chinese (Traditional) Menus and Dialogs for Internet Explorer, and Japanese Menus and Dialogs for Internet Explorer. Click Download and install. Note that Global IME 5.01 does not work with Internet Explorer 5.5.
Language packs need to be installed before SAPI 5.0. If the Microsoft Speech SDK needs to be removed, uninstall it from the same CD or install point used originally. Afterward, the language pack may be installed. SAPI 5.0 may then be loaded with this CD or install point.
By default, the Speech SDK installs English, Japanese, and Chinese engines. On Windows 98, Windows Me and Windows NT 4.0, you need to either, a) pick the correct engine corresponding to the language support on your computer, or b) install the required language pack.
Japanese or Simplified Chinese SR engines display text improperly or not at all when you use the Microphone Wizard if you have not installed a Japanese or Simplified Chinese font. If you need this capability, you need to install the Global IME. If you use Windows 2000, you can install the respective Language Support from your Windows 2000 CD-ROM.
Microsoft Japanese speech recognition engines may display inconsistent context free grammar (CFG) results using Windows NT 4.0 Japanese with Microsoft IME95 or Microsoft IME97. Install MS-IME98 or a later version to correct the issue.
If you install the Japanese or Chinese version of the SDK and none of the text appears in the Speech properties in the Control Panel, this is probably because it was installed on an operating system that does not support the language. The correct language needs to be enabled in the Regional Options properties in the Control Panel.
If you change the active engine and receive the error "Speech Recognition failed to initialize," please ensure that the correct language pack is installed.
Do not use spaces in text encoded in double byte character set (DBCS).
If a Japanese grammar is written without pronunciation, the Microsoft Japanese SR engine will not properly recognize the CFG. To avoid this, you can write a grammar based on SAPI 5.0 word format of "/display_format/lexical_format/pronunciation;" where "/" is an element separator and, ";" is a word terminator. For Japanese, the "display format" is what you will see. A word may display as Kanji, Kana, or an alphanumeric symbol, or any combination of the three. The "lexical format" is how the word is typed in Hiragana. Pronunciation is indicated using the symbols (Katakana) in the SAPI 5.0 Japanese phonetic list and is similar to the JEIDA TTS Kana list in Katakana. Please refer to SAPI 5.0 documentation for more detail.
The Coffee tutorials contain only English grammars and will work only when an English SR engine is active.
When a Japanese XML grammar specifies either, a) Kanji, Kana, and pronunciation Katakana (display, lexical and pronunciation as /D/L/P;) or, b) Kanji, Kana (/D/L;) as word units, SAPI returns all of the three attributes correctly. If only one of the three forms is specified, it should be the lexical form (Hiragana). If the XML grammar has only plain Kanji word units, SAPI returns the original Kanji phrases in both the display form and lexical form attributes. The engine may not be able to generate the correct pronunciations for this case. Authors are discouraged from using Kanji as the default lexical form.
Roaming profiles sometimes yield less optimal recognitions on different systems. You may need to perform additional training on each system you use if the recognition quality is unacceptable.
In the SDK sample Reco.exe, if the shared recognizer is using the audio device and you attempt to activate a grammar with an in-process (InProc) engine, an error occurs since the audio device will be busy and inaccessible. Reco.exe then clears all the boxes. However, it does not re-enable the InProc/shared selection drop-down boxes and you will be unable to do one of the following: switch from InProc to shared, or select a different engine. Selecting, then clearing the Create Recognition Context will return Reco.exe to a consistent state for further use.
If Microphone Wizard fails with error, "Microphone wizard failed to initialize," you need to verify that both a default recognizer and TTS voice exist. A TTS voice is required for the Microphone Wizard to work.
If you modify SR compliance tests, use the newly-compiled version of srcomp.dll and then copy srcomp.dll to the Microsoft Speech SDK5.0\tools\comp\bin folder.
The Skip button for TTSApp.exe does not function when used to skip backward from the last word of a sentence.
XML tags in the text window of TTSApp.exe are automatically applied. When the SpeakXML option is selected, the XML tags are explicitly spoken. For example "<spell>hello</spell>" is spoken as "less than spell greater than..."
A speech application using the InProc engine will fail to load if Speech properties in the Control Panel is open, as the latter uses the shared engine. Exit all sample applications to start Speech properties in the Control Panel.
For Reco.exe to display properly, you need to make sure the system default locale is set correctly in the Regional Options property of the Control Panel. Otherwise, either no readable characters will appear, or they will display as a series of question marks.
When running recognition in non-English languages, some SDK samples, like Reco.exe, require that you set the system locale to something other than English in order to display non-English characters, even if the sample is compiled as Unicode.
Changes made to rate or volume in TTSApp.exe during synthesis will result in a loss of word highlighting.
Reco.exe fails to load a grammar in a language different from the engine. The engine Language has to match the grammar LangID for this to succeed.
From the command line, Gramcomp.exe cannot open files that contain spaces in the name. Rename the file so that it does not contain spaces.
MkVoice.exe running on Windows 98 fails to register a voice. This is because the executable for MkVoice.exe was built as a Unicode application, and therefore will not work in Windows 98. Take the source files for MkVoice.exe (which ship in the SDK), edit them so that they use char or TCHAR instead of WCHAR (and all the corresponding function calls, such as wmain->main or _tmain; wcscat -> strcat or _tcscat; etc.). Rebuild this as an ANSI application.
SR compliance tests use the LoadStringW() function that depends on Unicode data. Because Windows 98 and Windows Me do not support Unicode, these tests will neither compile nor run with these platforms.
The following procedures for registering the sample TTS voice replace the documented one from the API reference guide.
From the command line:
Copy Microsoft Speech SDK5.0\bin\mkvoice.exe to the Microsoft Speech SDK5.0\Samples\CPP\Engines\TTS\MkVoice directory. Then run: mkvoice wordlist.txt SampleVoice.vce SampleVoice.From the compiler:
After compiling the TTS engine, load Microsoft Speech SDK5.0\Samples\CPP\Engines\TTS\MkVoice\MkVoice.dsw and rebuild it. The voice will automatically register by a post-build command.
With regards to multiple pronunciations or custom pronunciations, the search order of custom pronunciation words should be as follows: user lexicon, application lexicon, vendor lexicon, and LTS. If a custom pronunciation is specified in a grammar, it should take precedence over other pronunciations.
Many grammar operations are asynchronous for efficiency and result in the inability of the application to detect errors unless the engine is in the stopped or paused state. Hence, if the application needs to test for errors in grammar loading operations and/or setting a CFG or dictation rule state, the application should pause the engine first, perform the operation, and then unpause the engine. This is recommended mainly for debugging a speech application.
If you attempt to write audio with TTS to a non-standard device you may receive this error: "The selected voice could not be initialized. You may not be able to play this voice." This error may indicate that the output device is not valid. For example, it would be inappropriate if the line out on the modem were selected as a TTS device.
This is a list of items that were added or changed to sapi.idl since the SAPI 5.0 Beta release:
struct SPSERIALIZEDEVENT64
enum SPCONTEXTSTATE
enum SPENDSRSTREAMFLAGS
Interface ISpObjectToken : ISpDataKey
interface ISpMMSysAudio : ISpAudio
interface ISpTranscript : IUnknown
struct SPPHRASERULE
struct SPPHRASE
enum SPRECOEVENTFLAGS
interface ISpRecoGrammar : ISpGrammarBuilder
interface ISpRecoContext : ISpEventSource
interface ISpObjectToken : ISpDataKey
enum SPPARTOFSPEECH
ISpRecoGrammar : ISpGrammarBuilder
ISpRecoContext : ISpEventSource
ISpRecoGrammar::IsPronounceable
WCHAR SPCAT_NLPS
interface ISpDSoundAudio
struct SPVSENTITEM
ISpRecoResult, ISpPhrase: GetGrammarId
enum SPCFGRULEATTRIBUTES: SPRAF_Select
This is a list of items that were added or changed to sapiddk.idl since the SAPI 5.0 Beta release:
The ISpCFGEngineClient interface has been removed and its methods incorporated into ISpSREngine.
ISpSREngine interface:
_ISpPrivateEngineCall:
coclass SpDSoundAudioEnum
coclass SpDSoundAudioIn
coclass SpDSoundAudioOut
coclass SpWaveFilesAudioIn
coclass SpLTSLexicon
(c) 2000 Microsoft Corporation. All rights reserved.