KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Neural Vocoding
Audio reconstruction process from intermediate acoustic representations using neural networks to generate realistic audio waveforms.
Zero-shot TTS
Voice synthesis approach capable of generating speech in never-before-seen voices during training, using a short audio sample as reference.
Audio Diffusion Models
Generative models based on the diffusion process that progressively add and then remove noise to generate high-quality audio samples.
Mel-spectrogram
Spectrographic representation of audio on the Mel scale that better matches human hearing perception, used as input for many TTS models.
Griffin-Lim Algorithm
Iterative algorithm to reconstruct an audio waveform from a magnitude spectrogram by estimating the missing phase through successive projections.
Neural Audio Codec
Audio compression-decompression system based on deep learning that encodes and decodes audio with superior quality to traditional codecs.
Audio Style Transfer
Technique that applies the stylistic characteristics of a source audio signal to a target signal while preserving the original semantic content.
Voice Conversion
Technique that transforms the vocal characteristics of a source speaker to those of a target speaker while preserving the linguistic content of the message.
Music Generation
Process of automatically creating original musical compositions using AI models like Transformers or GANs to generate melodies and harmonies.
Sound Effect Synthesis
Procedural generation of realistic sound effects to enrich training datasets or create audio content for interactive media.
Neural Source Separation
AI technique individually isolating mixed sound sources in an audio recording, allowing voice/music separation or multiple instruments.
Audio Super-resolution
Process of improving the temporal or frequency resolution of existing audio signals to restore or enhance their perceived quality.
Adversarial Audio Generation
Use of generative adversarial networks (GANs) to create realistic audio samples through competition between a generator and a discriminator.