Dialogue Evaluation and Metrics

📖

termini

METEOR

Evaluation metric combining precision and recall of n-grams with synonym alignments and grammatical inflections. Offers better correlation with human judgments than BLEU for dialogues.

📖

termini

Metric evaluating the logical and thematic coherence of a response relative to the previous conversational context. Measures the system's ability to maintain a consistent narrative thread throughout the dialogue.

📖

termini

Engagement Rate

Indicator quantifying a conversational system's ability to maintain user interest and participation. Typically calculated via conversation duration and number of exchange turns.

📖

termini

Task Success Rate

Metric measuring the percentage of dialogues where the user's objective was successfully achieved. Essential for evaluating the effectiveness of task-oriented conversational agents.

📖

termini

F1 Score Dialogue

Harmonic mean between precision and recall adapted to dialog contexts to evaluate response relevance. Particularly useful for response retrieval systems.

📖

termini

Dialogue Act Classification

Process of automatically identifying the communicative intention behind each utterance in a dialogue. Crucial for evaluating the relevance and contextual appropriateness of system responses.

📖

termini

Response Diversity

Metric measuring the variety and originality of responses generated by a conversational system. Avoids repetitive responses and maintains user interest over the long term.

📖

termini

Error Recovery Rate

Indicator evaluating the system's ability to recover from errors or misunderstandings in the dialogue. Measures the robustness and resilience of the conversational system in the face of unexpected events.

📖

termini

User Satisfaction Score

Subjective metric collected from users to evaluate their overall satisfaction after a conversational interaction. Often combined with Likert scales or explicit ratings.

📖

termini

Contextual Consistency

Measure of the temporal and factual consistency of information provided throughout a conversation. Avoids contradictions and ensures reliability of exchanges over time.

📖

termini

Turn-level Evaluation

Evaluation approach analyzing the quality of each individual exchange in a dialogue independently of others. Allows precise identification of system strengths and weaknesses.

📖

termini

Dialogue-level Evaluation

Evaluation method considering the conversation as a whole to judge the overall quality of the interaction. Takes into account narrative consistency and natural dialogue progression.

📖

termini

Automatic Evaluation Metrics

Set of algorithmic indicators allowing objective evaluation of dialogue quality without direct human intervention. Complementary to subjective evaluations for comprehensive analysis.

📖

termini

Human Evaluation Protocols

Standardized methodologies for subjective evaluation of conversational systems by human judges. Include predefined criteria, rating scales, and quality control procedures.

📖

termini

NDCG (Normalized Discounted Cumulative Gain)

Metric evaluating the quality of candidate response ranking by considering their position and relative relevance. Particularly useful for systems generating multiple response options.

Glossario IA

METEOR

Coherence Score

Engagement Rate

Task Success Rate

F1 Score Dialogue

Dialogue Act Classification

Response Diversity

Error Recovery Rate

User Satisfaction Score

Contextual Consistency

Turn-level Evaluation

Dialogue-level Evaluation

Automatic Evaluation Metrics

Human Evaluation Protocols

NDCG (Normalized Discounted Cumulative Gain)

Nessun risultato trovato