Developments in Speech Synthesis

  • ID: 2175241
  • Book
  • February 2005
  • 356 Pages
  • John Wiley and Sons Ltd
Contemporary speech synthesis is perceived as inadequate for general adoption for user interaction, largely because it rests on an inadequate model of human speech production and perception. This book reviews the underlying model, brings out areas of inadequacy and suggests how improvements might be made. It is argued that a greater understanding of the fine detail of speech will enable new research and application initiatives. The authors draw on their extensive experience in both theoretical and applied research to bring forward proposals for producing more natural sounding synthetic speech.

- Provides an overview of the current work in speech synthesis, including a critical review of markup systems (including XML and SSML) embedded in interactive applications.
- Argues that naturalness in synthesis will benefit from enhancements to underlying models of prosody which more accurately account for the properties of human speech, yet can also be productively transferred to speech synthesis.
- Emphasises the importance of an explicit and extensible architecture as the basis for future developments, stressing particularly the importance of close modelling of expressive and emotive content key features of naturalness.
- Focuses on the dynamic nature of prosody, as opposed to the more usual static treatment, especially as an adaptive model compliant with pragmatic and environmental constraints.

Developments in Speech Synthesis provides the basis for a comprehensive approach that will appeal to speech synthesis and language technology engineers specialising in building dialogue systems. It will also be an invaluable resource for computer science and engineering students at both advanced undergraduate and postgraduate levels, as well as researchers in the general field of speech synthesis.
Part I: Current Work.

1. High–Level and Low–Level Synthesis.

2. Low–Level Synthesisers: Current Status.

3. Text–To–Speech.

4. Different Low–Level Synthesisers: What Can Be Expected?

5. Low–Level Synthesis Potential.

Part II: A New Direction for Speech Synthesis.

6. A View of Naturalness.

7. Physical Parameters and Abstract Information Channels.

8. Variability and System Integrity.

9. Automatic Speech Recognition.

Part III: High–Level Control.

10. The Need for High–Level Control.

11. The Input to High–Level Control.

12. Problems for Automatic Text Markup.

Part IV: Areas for Improvement.

13. Filling Gaps.

14. Using Different Units.

15. Waveform Concatenation Systems: Naturalness and Large Databases.

16. Unit Selection Systems.

Part V: Markup.

17. VoiceXML.

18. Speech Synthesis Markup Language (SSML).

19. SABLE.

20. The Need for Prosodic Markup.

Part VI: Strengthening the High–Level Model.

21. Speech.

22. Basic Concepts.

23. Underlying Basic Disciplines: Expression Studies.

24. Labelling Expressive/Emotive Content.

25. The Proposed Model.

26. Types of Model.

Part VII: Expanded Static and Dynamic Modelling.

27. The Underlying Linguistics System.

28. Planes for Synthesis.

Part VIII: The Prosodic Framework, Coding and Intonation.

29. The Phonological Prosodic Framework.

30. Sample Code.

31. XML Coding.

32. Prosody: General.

33. Phonological and Phonetic Models of Intonation.

Part IX: Approaches to Natural–Sounding Synthesis.

34. The General Approach.

35. The Expression Wrapper in XML.

36. Advantages of XML in Wrapping.

37. Considerations in Characterising Expression/Emotion.

38. Summary.

Part X: Concluding Overview.


Author Index.


Mark Tatham
Katherine Morton
