BenchVibe AI Ecosystem

VIP 👤

🏠 Hem

Benchmarkar

📊 Alla benchmarkar 🦖 Dinosaur v1 🦖 Dinosaur v2 ✅ To-Do List-applikationer 🎨 Kreativa fria sidor 🎯 FSACB - Ultimata uppvisningen 🌍 Översättningsbenchmark

Modeller

🏆 Topp 10 modeller 🆓 Gratis modeller 📋 Alla modeller ⚙️ Kilo Code

Resurser

💬 Promptbibliotek 📖 AI-ordlista 🔗 Användbara länkar

AI-ordlista

Den kompletta ordlistan över AI

162

kategorier

2 032

underkategorier

23 060

termer

Asynchronous Advantage Actor-Critic (A3C)

Distributed architecture where multiple agents train in parallel on copies of the environment, sampling uncorrelated trajectories and accelerating convergence.

Soft Actor-Critic (SAC)

Off-policy algorithm that maximizes based on expected reward and policy entropy, promoting exploration and better robustness to hyperparameter tuning.

Deep Deterministic Policy Gradient (DDPG)

Off-policy algorithm for continuous action spaces combining DQN and Actor-Critic, using target networks and a deterministic policy.

Twin Delayed DDPG (TD3)

Improvement of DDPG using two critic networks to reduce overestimation bias and delayed actor updates to increase stability.

Munchausen-RL

Algorithm introducing a logarithmic entropy term in the Q update, inspired by Munchausen's algorithm, improving exploration and stability.

🔍

Inga resultat hittades