Glosarium AI
Kamus lengkap Kecerdasan Buatan
Automatic Speech Recognition (ASR)
Artificial intelligence system capable of automatically converting spoken speech into written text, using acoustic and linguistic models to transcribe audio signals.
Acoustic Model
Statistical or neural model that establishes the correspondence between phonetic units and acoustic features extracted from the speech signal to identify pronounced sounds.
Phonetic Transcription
Symbolic representation of speech sounds using the International Phonetic Alphabet (IPA) or other phonetic notation systems for speech analysis and processing.
Hidden Markov Model (HMM)
Sequential statistical model used in traditional speech recognition to model the temporal sequence of phonemes and their relationship with acoustic observations.
Mel-Frequency Cepstral Coefficients (MFCC)
Set of acoustic features extracted from the audio signal that represent the speech spectrum on a Mel scale, which is more perceptually relevant for speech recognition.
Voice Activity Detection (VAD)
Algorithmic technique that automatically identifies and segments portions of audio signal containing speech as opposed to silences or background noises.
End-to-End Speech Recognition
Modern approach that uses a single neural model to directly map raw audio signals to character sequences, eliminating traditional intermediate components.
Word Error Rate (WER)
Standard evaluation metric in speech recognition that calculates the error rate with respect to a reference, including word substitutions, insertions, and deletions.
Connectionist Temporal Classification (CTC)
Training algorithm that allows neural networks to learn mappings between variable-length sequences without requiring prior alignment between audio and text.
Speaker Diarization
Process of automatically segmenting an audio stream into homogeneous segments and assigning these segments to different identified speakers in the recording.
Speech Enhancement
Set of signal processing techniques aimed at improving speech quality and intelligibility by reducing background noise and acoustic interferences.
Pronunciation Lexicon
Database containing phonetic transcriptions of vocabulary words, essential for mapping recognized phoneme sequences to corresponding orthographic words.