AI 용어집
인공지능 완전 사전
METEOR
Evaluation metric combining precision and recall of n-grams with synonym alignments and grammatical inflections. Offers better correlation with human judgments than BLEU for dialogues.
Coherence Score
Metric evaluating the logical and thematic coherence of a response relative to the previous conversational context. Measures the system's ability to maintain a consistent narrative thread throughout the dialogue.
Engagement Rate
Indicator quantifying a conversational system's ability to maintain user interest and participation. Typically calculated via conversation duration and number of exchange turns.
Task Success Rate
Metric measuring the percentage of dialogues where the user's objective was successfully achieved. Essential for evaluating the effectiveness of task-oriented conversational agents.
F1 Score Dialogue
Harmonic mean between precision and recall adapted to dialog contexts to evaluate response relevance. Particularly useful for response retrieval systems.
Dialogue Act Classification
Process of automatically identifying the communicative intention behind each utterance in a dialogue. Crucial for evaluating the relevance and contextual appropriateness of system responses.
Response Diversity
Metric measuring the variety and originality of responses generated by a conversational system. Avoids repetitive responses and maintains user interest over the long term.
Error Recovery Rate
Indicator evaluating the system's ability to recover from errors or misunderstandings in the dialogue. Measures the robustness and resilience of the conversational system in the face of unexpected events.
User Satisfaction Score
Subjective metric collected from users to evaluate their overall satisfaction after a conversational interaction. Often combined with Likert scales or explicit ratings.
Contextual Consistency
Measure of the temporal and factual consistency of information provided throughout a conversation. Avoids contradictions and ensures reliability of exchanges over time.
Turn-level Evaluation
Evaluation approach analyzing the quality of each individual exchange in a dialogue independently of others. Allows precise identification of system strengths and weaknesses.
Dialogue-level Evaluation
Evaluation method considering the conversation as a whole to judge the overall quality of the interaction. Takes into account narrative consistency and natural dialogue progression.
Automatic Evaluation Metrics
Set of algorithmic indicators allowing objective evaluation of dialogue quality without direct human intervention. Complementary to subjective evaluations for comprehensive analysis.
Human Evaluation Protocols
Standardized methodologies for subjective evaluation of conversational systems by human judges. Include predefined criteria, rating scales, and quality control procedures.
NDCG (Normalized Discounted Cumulative Gain)
Metric evaluating the quality of candidate response ranking by considering their position and relative relevance. Particularly useful for systems generating multiple response options.