Contention Mechanisms
Harmlessness Classification
Binary classification task to determine if an LLM output is 'harmless' or 'harmful', often implemented as a safety filter.
← IndietroBinary classification task to determine if an LLM output is 'harmless' or 'harmful', often implemented as a safety filter.
← Indietro