Contention Mechanisms
Harmlessness Classification
Binary classification task to determine if an LLM output is 'harmless' or 'harmful', often implemented as a safety filter.
← KembaliBinary classification task to determine if an LLM output is 'harmless' or 'harmful', often implemented as a safety filter.
← Kembali