Microsoft Speech SDK

Release notes

10/18/2000

Introduction

Welcome to Microsoft® Speech SDK. This file describes system requirements, installation notes, and known issues. This SDK provides the tools, information, and samples you need to incorporate speech technologies into your Windows® applications.

Before installing the SDK, read through this document to become familiar with installation and performance issues. This file accompanies Speech SDK 5.0 and is released under the License Agreement on the license.chm file on the CD or install point.

The following topics are available:

System Requirements
Installation Notes
Known Issues
API Changes since SAPI 5.0 Beta release

System Requirements

Operating Systems

Supported operating systems are:

Microsoft Windows 2000 Professional, Server, English edition or Windows 2000 Professional with Japanese or Simplified Chinese Language support.
Microsoft Windows Windows NT® Workstation 4.0, service pack 6a, English, Japanese, or Simplified Chinese edition.
Microsoft Windows Millennium edition.
Microsoft Windows 98 (Windows 95 is not supported).

Software Requirements

Microsoft Visual C++ 6.0, service pack 3 or later version.
Platform SDK (PSDK) April 2000 or later edition. Compiling SDK projects requires components of the PSDK. Within Microsoft Visual C++ 6.0, the PSDK include directories must be listed before the Visual C++ ones. Use the Directories tab to change the order in Tools->Options menu. Move PSDK directories above all Visual C++ directories, if needed.
To save disk space, you can load a minimal configuration. This includes enabling only the following two options:

Configuration Options
Build Environment

http://msdn.microsoft.com/downloads/sdks/platform/platform.asp

Microsoft Internet Explorer 5.0 or later version. Users of Windows NT 4.0 with any version of the service packs require Microsoft Internet Explorer 5.5 or later. You need this to read the online documentation and for executing Microsoft XML. You can download the latest version of Microsoft Internet Explorer from the Web at http://www.microsoft.com/ie.

Hardware Requirements

A Pentium II/Pentium II-equivalent or later processor at 233 MHz with 128 MB is recommended.
SAPI 5.0 can now take advantage of a computer and operating system that supports multiple processors, including those mentioned above. Additionally, you can use SAPI 5.0 in a distributed application environment.
Full-duplex close-talk microphones and sound cards on the PC 99 list are recommended. Full duplex is defined as, "host supports full-duplex operation with simultaneous independent record and play sample rates." Most high quality microphones and sound cards that are currently being shipped meet this requirement. Other sound cards and microphones are also likely to work well with SAPI 5. However, not all sound cards or sound devices are supported by SAPI 5.0, even if the operating system supports them otherwise.
The following table outlines the RAM usage:

Component	Minimum RAM	Recommended RAM
TTS Engine	14.5 Mb	32.0 Mb
SR Command and Control	16 Mb	32 Mb
SR Dictation	25.5 Mb	128 Mb
SR Both	26.5 Mb	128 Mb

The following table outlines the disk usage:

File Name	Approximate File Size	Setup Merge Names
Sapi.dll & Sapisvr.exe	.5Mb	Sp5.msm
Sapi.cpl	36k	Sp5Intl.msm
SR Engine	1.7Mb	Sp5Sr.msm
Command and Control Datafiles	13.4Mb	Sp5CCInt.msm
Dictation Datafiles	33Mb	Sp5DCInt.msm
TTS Engine and Voices	7.8Mb	Sp5TTInt.msm

Installation Notes

SAPI and another speech recognition system are not likely to run simultaneously because the two systems would be in contention as to which would use the microphone. Stop any other speech applications before running a SAPI 5 application. Multiple SAPI 5 applications using the shared recognizer will run simultaneously.

On Windows 98 First edition or non-English Second edition, if the computer previously did not have Windows installer, you need to reboot after installing the Speech SDK.

Before installing SAPI 5.0, you need to uninstall or delete previous versions of the SAPI 5 SDK. If the Microsoft Speech SDK needs to be removed, uninstall it from the same CD or install point used originally. This ensures all files are removed correctly.

The SAPI 5.0 release can coexist on your computer with SAPI 4.0. However, applications using different versions are not necessarily compatible and should not be run simultaneously.

You also need to uninstall any applications that use older SAPI 5 builds, including Microsoft Office® 10 beta.

You need to load SAPI 5.0 through the Windows Installer provided on the CD. Windows Installer support is inherent with newer versions of the operating system. However, should the Windows Installer not already present, it will automatically load from the CD or install point during set up.

You need administrator privileges on the computer to install the Speech SDK 5.0 properly.

If you select the silent installation or the "Only for me" option from the Windows Installer, other users will still see Speech properties in the Control Panel window but will not be able to modify it. In that case, other users need to install SAPI 5.0 as well.

None of the SAPI 5.0 components or compliance tests were tested with power-managed (OnNow) computers. As long as the system determines that there is application activity, it will not put the system or any devices into the sleeping state. However, if you encounter unexpected performance issues while using power management, OnNow should be disabled.

Occasionally, it can be difficult to uninstall a previous release of the Microsoft Speech SDK 5.0, and consequently, install the SDK itself. Here are two options:

(i) Run the application Regedit.exe. Delete all entries under HKEY_CURRENT_USER\Software\Microsoft\Speech\RecoProfiles\Tokens. Deleting the contents of this registry key removes the speech recognition profiles. Next, install the Speech SDK 5.0 .

(ii) If your problem continues, delete the HKEY_CURRENT_USER\Software\Microsoft\Speech and the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech keys. Then try installing the Speech SDK 5.0 .

You will need to reinstall any recognizers/voices that you have previously installed on the computer.

The sapi.dll file installs into the directory \Program Files\Common Files\Microsoft Shared\Speech. However, the debugging support files sapi.pdb and sapi.sym install into a directory apart from sapi.dll (specifically, \windows\system on Windows 98 and winnt\system32 on Windows NT systems). To debug into SAPI, those two files (sapi.pdb and sapi.sym) need to be moved or copied into the same directory as sapi.dll.

Known Issues

There may be additional situations or conditions where SAPI 5.0 performs differently than you expect. Please refer to this list of known issues first. If anomalies persist, you are encouraged to contact sapi5@microsoft.com.

Language Issues

Compliance tests for some languages may not compile properly after you install a new SAPI 5.0 on Windows 2000. Load the language support for Japanese and Simplified Chinese that came with the Windows 2000 CD-ROM. For US Windows NT 4.0 or Windows 98, download Global IME for Japanese and Simplified Chinese Language support from http://www.microsoft.com/msdownload/iebuild/ime5_win32/en/ime5_win32.htm. You may also need to install Language packs for Japanese and Chinese from http://windowsupdate.microsoft.com. Click Product Updates, select both Chinese (Traditional) Menus and Dialogs for Internet Explorer, and Japanese Menus and Dialogs for Internet Explorer. Click Download and install. Note that Global IME 5.01 does not work with Internet Explorer 5.5.

Language packs need to be installed before SAPI 5.0. If the Microsoft Speech SDK needs to be removed, uninstall it from the same CD or install point used originally. Afterward, the language pack may be installed. SAPI 5.0 may then be loaded with this CD or install point.

By default, the Speech SDK installs English, Japanese, and Chinese engines. On Windows 98, Windows Me and Windows NT 4.0, you need to either, a) pick the correct engine corresponding to the language support on your computer, or b) install the required language pack.

Japanese or Simplified Chinese SR engines display text improperly or not at all when you use the Microphone Wizard if you have not installed a Japanese or Simplified Chinese font. If you need this capability, you need to install the Global IME. If you use Windows 2000, you can install the respective Language Support from your Windows 2000 CD-ROM.

Microsoft Japanese speech recognition engines may display inconsistent context free grammar (CFG) results using Windows NT 4.0 Japanese with Microsoft IME95 or Microsoft IME97. Install MS-IME98 or a later version to correct the issue.

If you install the Japanese or Chinese version of the SDK and none of the text appears in the Speech properties in the Control Panel, this is probably because it was installed on an operating system that does not support the language. The correct language needs to be enabled in the Regional Options properties in the Control Panel.

If you change the active engine and receive the error "Speech Recognition failed to initialize," please ensure that the correct language pack is installed.

Do not use spaces in text encoded in double byte character set (DBCS).

If a Japanese grammar is written without pronunciation, the Microsoft Japanese SR engine will not properly recognize the CFG. To avoid this, you can write a grammar based on SAPI 5.0 word format of "/display_format/lexical_format/pronunciation;" where "/" is an element separator and, ";" is a word terminator. For Japanese, the "display format" is what you will see. A word may display as Kanji, Kana, or an alphanumeric symbol, or any combination of the three. The "lexical format" is how the word is typed in Hiragana. Pronunciation is indicated using the symbols (Katakana) in the SAPI 5.0 Japanese phonetic list and is similar to the JEIDA TTS Kana list in Katakana. Please refer to SAPI 5.0 documentation for more detail.

The Coffee tutorials contain only English grammars and will work only when an English SR engine is active.

When a Japanese XML grammar specifies either, a) Kanji, Kana, and pronunciation Katakana (display, lexical and pronunciation as /D/L/P;) or, b) Kanji, Kana (/D/L;) as word units, SAPI returns all of the three attributes correctly. If only one of the three forms is specified, it should be the lexical form (Hiragana). If the XML grammar has only plain Kanji word units, SAPI returns the original Kanji phrases in both the display form and lexical form attributes. The engine may not be able to generate the correct pronunciations for this case. Authors are discouraged from using Kanji as the default lexical form.

Speech Recognition Issues

Roaming profiles sometimes yield less optimal recognitions on different systems. You may need to perform additional training on each system you use if the recognition quality is unacceptable.

Audio Issues

In the SDK sample Reco.exe, if the shared recognizer is using the audio device and you attempt to activate a grammar with an in-process (InProc) engine, an error occurs since the audio device will be busy and inaccessible. Reco.exe then clears all the boxes. However, it does not re-enable the InProc/shared selection drop-down boxes and you will be unable to do one of the following: switch from InProc to shared, or select a different engine. Selecting, then clearing the Create Recognition Context will return Reco.exe to a consistent state for further use.

If Microphone Wizard fails with error, "Microphone wizard failed to initialize," you need to verify that both a default recognizer and TTS voice exist. A TTS voice is required for the Microphone Wizard to work.

SDK Sample Issues

If you modify SR compliance tests, use the newly-compiled version of srcomp.dll and then copy srcomp.dll to the Microsoft Speech SDK5.0\tools\comp\bin folder.

The Skip button for TTSApp.exe does not function when used to skip backward from the last word of a sentence.

XML tags in the text window of TTSApp.exe are automatically applied. When the SpeakXML option is selected, the XML tags are explicitly spoken. For example "<spell>hello</spell>" is spoken as "less than spell greater than..."

A speech application using the InProc engine will fail to load if Speech properties in the Control Panel is open, as the latter uses the shared engine. Exit all sample applications to start Speech properties in the Control Panel.

For Reco.exe to display properly, you need to make sure the system default locale is set correctly in the Regional Options property of the Control Panel. Otherwise, either no readable characters will appear, or they will display as a series of question marks.

When running recognition in non-English languages, some SDK samples, like Reco.exe, require that you set the system locale to something other than English in order to display non-English characters, even if the sample is compiled as Unicode.

Changes made to rate or volume in TTSApp.exe during synthesis will result in a loss of word highlighting.

Reco.exe fails to load a grammar in a language different from the engine. The engine Language has to match the grammar LangID for this to succeed.

From the command line, Gramcomp.exe cannot open files that contain spaces in the name. Rename the file so that it does not contain spaces.

MkVoice.exe running on Windows 98 fails to register a voice. This is because the executable for MkVoice.exe was built as a Unicode application, and therefore will not work in Windows 98. Take the source files for MkVoice.exe (which ship in the SDK), edit them so that they use char or TCHAR instead of WCHAR (and all the corresponding function calls, such as wmain->main or _tmain; wcscat -> strcat or _tcscat; etc.). Rebuild this as an ANSI application.

Miscellaneous Issues (Control Panel, Compliance, Lexicon, SAPI Core)

SR compliance tests use the LoadStringW() function that depends on Unicode data. Because Windows 98 and Windows Me do not support Unicode, these tests will neither compile nor run with these platforms.

The following procedures for registering the sample TTS voice replace the documented one from the API reference guide.

From the command line:
Copy Microsoft Speech SDK5.0\bin\mkvoice.exe to the Microsoft Speech SDK5.0\Samples\CPP\Engines\TTS\MkVoice directory. Then run: mkvoice wordlist.txt SampleVoice.vce SampleVoice.
From the compiler:
After compiling the TTS engine, load Microsoft Speech SDK5.0\Samples\CPP\Engines\TTS\MkVoice\MkVoice.dsw and rebuild it. The voice will automatically register by a post-build command.

With regards to multiple pronunciations or custom pronunciations, the search order of custom pronunciation words should be as follows: user lexicon, application lexicon, vendor lexicon, and LTS. If a custom pronunciation is specified in a grammar, it should take precedence over other pronunciations.

Many grammar operations are asynchronous for efficiency and result in the inability of the application to detect errors unless the engine is in the stopped or paused state. Hence, if the application needs to test for errors in grammar loading operations and/or setting a CFG or dictation rule state, the application should pause the engine first, perform the operation, and then unpause the engine. This is recommended mainly for debugging a speech application.

If you attempt to write audio with TTS to a non-standard device you may receive this error: "The selected voice could not be initialized. You may not be able to play this voice." This error may indicate that the output device is not valid. For example, it would be inappropriate if the line out on the modem were selected as a TTS device.

API Changes

API Changes since SAPI 5.0 Beta release

This is a list of items that were added or changed to sapi.idl since the SAPI 5.0 Beta release:

Items added:

struct SPSERIALIZEDEVENT64

enum SPCONTEXTSTATE

enum SPENDSRSTREAMFLAGS

Items added to:

Interface ISpObjectToken : ISpDataKey

interface ISpMMSysAudio : ISpAudio

interface ISpTranscript : IUnknown

struct SPPHRASERULE

struct SPPHRASE

enum SPRECOEVENTFLAGS

interface ISpRecoGrammar : ISpGrammarBuilder

interface ISpRecoContext : ISpEventSource

Items changed:

interface ISpObjectToken : ISpDataKey

enum SPPARTOFSPEECH

ISpRecoGrammar : ISpGrammarBuilder

ISpRecoContext : ISpEventSource

ISpRecoGrammar::IsPronounceable

Items removed:

WCHAR SPCAT_NLPS

interface ISpDSoundAudio

struct SPVSENTITEM

ISpRecoResult, ISpPhrase: GetGrammarId

enum SPCFGRULEATTRIBUTES: SPRAF_Select

SAPI DDK items changed:

This is a list of items that were added or changed to sapiddk.idl since the SAPI 5.0 Beta release:

The ISpCFGEngineClient interface has been removed and its methods incorporated into ISpSREngine.

ISpSREngine interface:

IsPronounceable has been changed
PrivateCallEx added
SetContextState added

_ISpPrivateEngineCall:

CallEngineEx added.

Coclasses removed:

coclass SpDSoundAudioEnum

coclass SpDSoundAudioIn

coclass SpDSoundAudioOut

coclass SpWaveFilesAudioIn

coclass SpLTSLexicon