Hands-On Intelligent Agents with OpenAI Gym
上QQ阅读APP看书,第一时间看更新

State-value function

A state-value function is a function that represents the agent's estimate of how good it is to be in a state  at time step t. It is denoted by  and is usually just called the value function. It represents the agent's prediction of the future reward it would get if it were to end up in state  at time step t. Mathematically, it can be represented as follows:

What this expression means is that the value of state  under policy  is the expected sum of the discounted future rewards, where  is the discount factor and is a real number in the range [0,1]. Practically, the discount factor is typically set to be in the range of [0.95,0.99]. The other new term is , which is the policy of the agent.