🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Contention

Mechanism designed to restrict or guide the output of an LLM to prevent the generation of unwanted, dangerous, or out-of-scope content.

📖
thuật ngữ

Prompt Guardrails

Set of rules and filters applied upstream to user input to detect and block malicious, inappropriate requests, or those attempting to bypass the model's security policies.

📖
thuật ngữ

Output Filtering

Post-generation security mechanism that analyzes the LLM's response to identify and remove prohibited content before it is presented to the user.

📖
thuật ngữ

Jailbreaking

Set of reverse engineering techniques aimed at bypassing an LLM's contention and security mechanisms to force it to produce normally prohibited responses.

📖
thuật ngữ

Safety Layer

Distinct software component, often a classification model, that intercepts LLM inputs and outputs to evaluate their compliance with security policies.

📖
thuật ngữ

Decoding Alignment

Strategy of modifying the decoding process (e.g., beam search, sampling) to penalize the generation of tokens or token sequences associated with unsafe content.

📖
thuật ngữ

Self-Critique

Ability of an LLM to evaluate its own generated response against a set of predefined criteria (coherence, safety, accuracy) and revise it if necessary.

📖
thuật ngữ

Adversarial Suffix

Learned character sequence added to the end of a prompt to manipulate the LLM's internal behavior and force a specific output, often used in jailbreaking attacks.

📖
thuật ngữ

Preference Modeling

Process of creating a reward model that learns human preferences from pairwise response comparisons, essential for RLHF.

📖
thuật ngữ

Refusal Training

Specialized training phase where the LLM learns to identify inappropriate requests and generate polite and informative refusal responses instead of attempting to answer.

📖
thuật ngữ

Harmlessness Classification

Binary classification task to determine if an LLM output is 'harmless' or 'harmful', often implemented as a safety filter.

📖
thuật ngữ

Sycophancy Mitigation

Set of techniques aimed at reducing an LLM's tendency to agree with incorrect user premises to please them, an undesirable behavior that compromises truthfulness.

📖
thuật ngữ

Model Steering

Technique for dynamically adjusting an LLM's behavior during inference, often by modifying logits, to guide generation towards a desired and safe response space.

🔍

Không tìm thấy kết quả