BenchVibe AI Ecosystem

VIP 👤

🏠 홈

벤치마크

📊 모든 벤치마크 🦖 공룡 v1 🦖 공룡 v2 ✅ 할 일 목록 앱 🎨 창의적인 자유 페이지 🎯 FSACB - 궁극의 쇼케이스 🌍 번역 벤치마크

모델

🏆 톱 10 모델 🆓 무료 모델 📋 모든 모델 ⚙️ 킬로 코드 모드

리소스

💬 프롬프트 라이브러리 📖 AI 용어 사전 🔗 유용한 링크

AI 용어집

인공지능 완전 사전

162

카테고리

2,032

하위 카테고리

23,060

용어

Asynchronous Advantage Actor-Critic (A3C)

Distributed architecture where multiple agents train in parallel on copies of the environment, sampling uncorrelated trajectories and accelerating convergence.

Soft Actor-Critic (SAC)

Off-policy algorithm that maximizes based on expected reward and policy entropy, promoting exploration and better robustness to hyperparameter tuning.

Deep Deterministic Policy Gradient (DDPG)

Off-policy algorithm for continuous action spaces combining DQN and Actor-Critic, using target networks and a deterministic policy.

Twin Delayed DDPG (TD3)

Improvement of DDPG using two critic networks to reduce overestimation bias and delayed actor updates to increase stability.

Munchausen-RL

Algorithm introducing a logarithmic entropy term in the Q update, inspired by Munchausen's algorithm, improving exploration and stability.

🔍

결과를 찾을 수 없습니다