AI-ordlista
Den kompletta ordlistan över AI
Audio Diffusion Model
Generative neural network architecture that applies a diffusion and progressive denoising process to synthesize high-fidelity audio waveforms from an initial random noise.
Conditional Spectrogram
Time-frequency representation of the audio signal used as input or condition in diffusion models, where the iterative denoising process is guided to reconstruct a coherent spectral structure.
Neural Vocoder
Neural network that converts an intermediate acoustic representation, such as a spectrogram or melodic features, into a continuous audio waveform, often integrated at the end of an audio diffusion pipeline.
Speech Diffusion
Specialized application of diffusion models for generating speech signals, aiming to capture phonetic, prosodic, and timbral nuances for natural speech synthesis.
Music Diffusion
Sub-domain of audio diffusion focused on generating musical content, including harmony, rhythm, melody, and timbre, often conditioned by structural information such as scores or chords.
Classifier-Free Guidance
Inference technique that strengthens the diffusion model's adherence to a condition (text, melody, etc.) by interpolating between conditional and unconditional predictions, thereby improving generation fidelity and control.
Diffusion Timestep
Discrete variable representing the stage of the noise addition or denoising process, ranging from 0 (pure signal) to T (pure noise), which conditions the neural network to predict the noise to be removed at each iteration.
Audio Latent Space
Compressed and abstract representation of audio data, obtained via an encoder, in which the diffusion process is applied to reduce computational complexity while preserving semantic information.
Audio Inpainting
Manipulation task involving regenerating or completing a missing or corrupted section of an audio signal using a diffusion model, based on the surrounding audio context.
Audio Super-Resolution
Process by which a diffusion model increases the quality or sampling rate of a low-resolution audio signal, adding plausible and consistent high-frequency details.
Continuous Audio Encoding
Representation method that transforms a discrete waveform into a set of continuous vectors in a latent space, serving as the basis for the diffusion process in generative audio models.
Text-Audio Conditioning
Technique where an audio diffusion model is guided by a textual description to generate a corresponding sound, requiring a multimodal architecture capable of aligning textual and auditory modalities.
Denoising Score Matching
Fundamental training objective for diffusion models, which teaches the neural network to predict the gradient (the score) of the data distribution with respect to the noisy input, thus enabling iterative denoising.
Stochastic Sampling
Inference method for diffusion models where denoising at each step includes a random component, promoting generation diversity but potentially introducing artifacts.
Deterministic Sampling (DDIM)
Inference strategy that accelerates the generation process by performing fewer denoising steps in a deterministic manner, reducing stochasticity for more reproducible results.
Latent Diffusion Model
Variant of diffusion model that operates in a lower-dimensional latent space, learned by an autoencoder, to make training and inference more efficient for high-resolution data such as audio.
Convolutional Transformers for Audio
Hybrid architecture combining convolutional layers to capture local patterns and attention mechanisms for long-term dependencies, often used as backbone in audio diffusion U-Nets.
Audio Generation Pipeline
Complete sequence of operations, from encoding a condition (text, melody) to diffusion in latent space and finally decoding by a vocoder, to produce a final audio signal.
Noise Rescaling
Technique for adjusting the variance of noise added at each step of the diffusion process, used to stabilize training and improve the quality of generated samples in audio models.