🏠 Home
Benchmark
📊 Tutti i benchmark 🦖 Dinosauro v1 🦖 Dinosauro v2 ✅ App To-Do List 🎨 Pagine libere creative 🎯 FSACB - Ultimate Showcase 🌍 Benchmark traduzione
Modelli
🏆 Top 10 modelli 🆓 Modelli gratuiti 📋 Tutti i modelli ⚙️ Kilo Code
Risorse
💬 Libreria di prompt 📖 Glossario IA 🔗 Link utili

Glossario IA

Il dizionario completo dell'Intelligenza Artificiale

162
categorie
2.032
sottocategorie
23.060
termini
📖
termini

METEOR

Evaluation metric combining precision and recall of n-grams with synonym alignments and grammatical inflections. Offers better correlation with human judgments than BLEU for dialogues.

📖
termini

Coherence Score

Metric evaluating the logical and thematic coherence of a response relative to the previous conversational context. Measures the system's ability to maintain a consistent narrative thread throughout the dialogue.

📖
termini

Engagement Rate

Indicator quantifying a conversational system's ability to maintain user interest and participation. Typically calculated via conversation duration and number of exchange turns.

📖
termini

Task Success Rate

Metric measuring the percentage of dialogues where the user's objective was successfully achieved. Essential for evaluating the effectiveness of task-oriented conversational agents.

📖
termini

F1 Score Dialogue

Harmonic mean between precision and recall adapted to dialog contexts to evaluate response relevance. Particularly useful for response retrieval systems.

📖
termini

Dialogue Act Classification

Process of automatically identifying the communicative intention behind each utterance in a dialogue. Crucial for evaluating the relevance and contextual appropriateness of system responses.

📖
termini

Response Diversity

Metric measuring the variety and originality of responses generated by a conversational system. Avoids repetitive responses and maintains user interest over the long term.

📖
termini

Error Recovery Rate

Indicator evaluating the system's ability to recover from errors or misunderstandings in the dialogue. Measures the robustness and resilience of the conversational system in the face of unexpected events.

📖
termini

User Satisfaction Score

Subjective metric collected from users to evaluate their overall satisfaction after a conversational interaction. Often combined with Likert scales or explicit ratings.

📖
termini

Contextual Consistency

Measure of the temporal and factual consistency of information provided throughout a conversation. Avoids contradictions and ensures reliability of exchanges over time.

📖
termini

Turn-level Evaluation

Evaluation approach analyzing the quality of each individual exchange in a dialogue independently of others. Allows precise identification of system strengths and weaknesses.

📖
termini

Dialogue-level Evaluation

Evaluation method considering the conversation as a whole to judge the overall quality of the interaction. Takes into account narrative consistency and natural dialogue progression.

📖
termini

Automatic Evaluation Metrics

Set of algorithmic indicators allowing objective evaluation of dialogue quality without direct human intervention. Complementary to subjective evaluations for comprehensive analysis.

📖
termini

Human Evaluation Protocols

Standardized methodologies for subjective evaluation of conversational systems by human judges. Include predefined criteria, rating scales, and quality control procedures.

📖
termini

NDCG (Normalized Discounted Cumulative Gain)

Metric evaluating the quality of candidate response ranking by considering their position and relative relevance. Particularly useful for systems generating multiple response options.

🔍

Nessun risultato trovato