Alignment and Safety
Red Teaming
Systematic process of evaluating model vulnerabilities by experts actively seeking to provoke undesirable or dangerous behaviors to identify and correct weaknesses.
← Geri