AI-woordenlijst
Het complete woordenboek van kunstmatige intelligentie
Constitutional AI
Alignment methodology where models follow a predefined set of principles or constitution, allowing them to self-evaluate and correct their responses according to these ethical rules.
Red Teaming
Systematic process of evaluating model vulnerabilities by experts actively seeking to provoke undesirable or dangerous behaviors to identify and correct weaknesses.
Safety Alignment
Set of techniques aimed at ensuring language models avoid generating harmful, dangerous, or inappropriate content while maintaining their overall performance.
Value Alignment
Process aimed at aligning the objectives and behaviors of AI systems with fundamental human values, requiring a nuanced understanding of human preferences and ethics.
Model Jailbreaking
Attack techniques designed to bypass model safety and alignment mechanisms, forcing them to generate normally restricted or prohibited content.
Reward Modeling
Approach where a reward model learns to predict human preferences, serving as a guide for reinforcement learning of main language models.
Constitutional Principles
Set of explicitly defined fundamental rules and principles that guide AI model behavior, ensuring consistency and alignment with desired values.
Preference Learning
Machine learning domain where models learn from comparisons between different options to capture human preferences and align with them.
Harmlessness Training
Specific training process aimed at teaching models to avoid generating potentially harmful, dangerous, or prejudicial content for users.
Truthfulness Alignment
Alignment objective aimed at ensuring models provide factually correct information and avoid hallucinations or unverified claims.
Bias Mitigation
Set of techniques to identify, quantify, and reduce systemic biases in language models, ensuring fair and non-discriminatory representation.
Guardrails
Safety mechanisms implemented in AI systems to monitor and filter inputs/outputs, preventing dangerous or inappropriate interactions in real-time.
Constitutional Supervision
Supervision method where models are guided by an explicit constitution, allowing them to self-criticize and improve their responses according to these guiding principles.
Human Preference Data
Dataset collected from comparative human evaluations between different model responses, serving as a basis for alignment training and optimization.
Safety Fine-tuning
Specific refinement phase after initial pre-training, aimed at finely adjusting model behaviors to comply with safety and ethical constraints.
Alignment Taxonomy
Structured classification of different types and dimensions of alignment in AI, including value alignment, safety, robustness, and model interpretability.