🏠 홈
벤치마크
📊 모든 벤치마크 🦖 공룡 v1 🦖 공룡 v2 ✅ 할 일 목록 앱 🎨 창의적인 자유 페이지 🎯 FSACB - 궁극의 쇼케이스 🌍 번역 벤치마크
모델
🏆 톱 10 모델 🆓 무료 모델 📋 모든 모델 ⚙️ 킬로 코드 모드
리소스
💬 프롬프트 라이브러리 📖 AI 용어 사전 🔗 유용한 링크

AI 용어집

인공지능 완전 사전

162
카테고리
2,032
하위 카테고리
23,060
용어
📖
용어

Actor-Critic

Reinforcement learning architecture combining an actor network that learns a stochastic policy and a critic network that estimates the value function to reduce the policy gradient variance.

📖
용어

Value Function

Mathematical function estimating the expected cumulative return from a state or state-action pair, serving as the learning signal for the critic in the Actor-Critic architecture.

📖
용어

Asynchronous Advantage Actor-Critic

Distributed architecture where multiple agents train in parallel on independent environments, periodically sharing their gradients to accelerate learning.

📖
용어

Deep Deterministic Policy Gradient

Actor-Critic algorithm for continuous action spaces using deep neural networks with deterministic policy and replay buffer for stable off-policy learning.

📖
용어

Twin Delayed Deep Deterministic Policy Gradient

Improvement over DDPG using twin critics to reduce value overestimation and delayed updates of the actor and targets for better stability.

📖
용어

Soft Actor-Critic

Actor-Critic algorithm maximizing an entropy-augmented reward combining return and entropy to encourage exploration, using stable and efficient off-policy updates.

📖
용어

Advantage Actor-Critic

Synchronous variant of A3C using advantage estimation to reduce policy gradient variance, with batch updates for better stability on GPU.

📖
용어

Critic Network

Neural network estimating the value function V(s) or Q(s,a) to provide the TD learning signal to the actor, using prediction error as optimization gradient.

🔍

결과를 찾을 수 없습니다