AI Glossary
The complete dictionary of Artificial Intelligence
Text-to-Speech (TTS)
Computer system that converts written text into synthetic human speech through natural language processing and speech synthesis algorithms.
Speech synthesis
Artificial production of human speech by computer systems using linguistic and acoustic models to generate audio signals.
Phoneme
Smallest distinctive unit of sound in a language, serving as the fundamental basis for word construction in speech synthesis.
Prosody
Set of supra-segmental features of speech including intonation, rhythm, stress, and melody, essential for natural synthesis.
Concatenative synthesis
TTS approach that assembles pre-recorded audio segments (diphones, syllables) to create continuous speech with high sound quality.
Parametric synthesis
Method that generates speech from parametric mathematical models representing the acoustic characteristics of the vocal signal.
Neural TTS
Speech synthesis systems using deep neural networks to directly generate audio waveforms from text.
WaveNet
Neural network architecture developed by DeepMind that generates audio waveforms sample by sample for ultra-realistic speech synthesis.
Tacotron
End-to-end attention-based TTS architecture, directly converting text to mel spectrograms with natural prosody.
Vocoder
Algorithm or system that analyzes and re-synthesizes human speech, converting acoustic representations into understandable audio signals.
Mel spectrogram
Visual representation of the frequency spectrum of an audio signal on the mel scale, mimicking human hearing perception for TTS.
MFCC
Mel-frequency cepstral coefficients, acoustic features widely used in speech recognition and synthesis to represent speech signals.
Voice cloning
TTS technique that creates personalized speech synthesis imitating the unique characteristics of a specific voice from limited audio samples.
Unit selection synthesis
Advanced concatenative method that dynamically selects optimal speech units from a large database to maximize naturalness.
HMM synthesis
Parametric approach using hidden Markov models to statistically model acoustic sequences and generate speech.
Articulatory synthesis
TTS method that simulates the physical process of human speech production by modeling the movements of vocal articulators.
FastSpeech
Non-autoregressive TTS architecture generating mel spectrograms in parallel for fast and high-quality speech synthesis.
Text normalization
Linguistic preprocessing converting symbols, numbers, and abbreviations into pronounceable textual form before speech synthesis.