Słownik AI
Kompletny słownik sztucznej inteligencji
Policy
Strategy or mapping that defines the action to take in each possible state, representing the agent's behavior in a reinforcement learning process.
Multi-Armed Bandit Problem
Sequential optimization problem where an agent must choose among several options with unknown rewards to maximize cumulative reward over time.
Cumulative Reward
Sum of expected future rewards that the agent seeks to maximize, often calculated with a discount factor to give less weight to distant rewards.
SARSA Algorithm
On-policy reinforcement learning algorithm that updates Q-values based on the State-Action-Reward-State-Action sequence, unlike Q-learning.
Deep Q-Network
Deep neural network architecture used to approximate the Q-function in complex state spaces, combining deep learning and Q-learning.
Deep Reinforcement Learning
Approach integrating deep neural networks into reinforcement learning to handle high-dimensional state or action spaces.
Epsilon-Greedy Policy
Action selection strategy where with probability ε the agent explores (chooses a random action) and with probability 1-ε it exploits (chooses the best known action).
Policy Optimization
Class of methods in reinforcement learning that directly optimize the policy without going through a value function, often using policy gradient techniques.
Policy Gradient Algorithm
Optimization method that directly adjusts policy parameters by following the gradient of the expected reward with respect to these parameters.
Multi-Agent Reinforcement Learning
Extension of reinforcement learning where multiple agents learn simultaneously, often in competition or cooperation, in a shared environment.
Experience Replay Memory
Data structure storing transitions (state, action, reward, next state) for resampling during training, improving data usage efficiency.
Actor-Critic Algorithm
Architecture combining an actor that selects actions according to a policy and a critic that evaluates these actions, enabling more stable and efficient learning.