🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Constitutional AI

Alignment methodology where models follow a predefined set of principles or constitution, allowing them to self-evaluate and correct their responses according to these ethical rules.

📖
thuật ngữ

Red Teaming

Systematic process of evaluating model vulnerabilities by experts actively seeking to provoke undesirable or dangerous behaviors to identify and correct weaknesses.

📖
thuật ngữ

Safety Alignment

Set of techniques aimed at ensuring language models avoid generating harmful, dangerous, or inappropriate content while maintaining their overall performance.

📖
thuật ngữ

Value Alignment

Process aimed at aligning the objectives and behaviors of AI systems with fundamental human values, requiring a nuanced understanding of human preferences and ethics.

📖
thuật ngữ

Model Jailbreaking

Attack techniques designed to bypass model safety and alignment mechanisms, forcing them to generate normally restricted or prohibited content.

📖
thuật ngữ

Reward Modeling

Approach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.

📖
thuật ngữ

Constitutional Principles

Set of explicitly defined fundamental rules and principles that guide AI model behavior, ensuring consistency and alignment with desired values.

📖
thuật ngữ

Preference Learning

Machine learning domain where models learn from comparisons between different options to capture human preferences and align with them.

📖
thuật ngữ

Harmlessness Training

Specific training process aimed at teaching models to avoid generating potentially harmful, dangerous, or prejudicial content for users.

📖
thuật ngữ

Truthfulness Alignment

Alignment objective aimed at ensuring models provide factually correct information and avoid hallucinations or unverified claims.

📖
thuật ngữ

Bias Mitigation

Set of techniques to identify, quantify, and reduce systemic biases in language models, ensuring fair and non-discriminatory representation.

📖
thuật ngữ

Guardrails

Safety mechanisms implemented in AI systems to monitor and filter inputs/outputs, preventing dangerous or inappropriate interactions in real-time.

📖
thuật ngữ

Constitutional Supervision

Supervision method where models are guided by an explicit constitution, allowing them to self-criticize and improve their responses according to these guiding principles.

📖
thuật ngữ

Human Preference Data

Dataset collected from comparative human evaluations between different model responses, serving as a basis for alignment training and optimization.

📖
thuật ngữ

Safety Fine-tuning

Specific refinement phase after initial pre-training, aimed at finely adjusting model behaviors to comply with safety and ethical constraints.

📖
thuật ngữ

Alignment Taxonomy

Structured classification of different types and dimensions of alignment in AI, including value alignment, safety, robustness, and model interpretability.

🔍

Không tìm thấy kết quả