上QQ阅读APP看书,第一时间看更新
State-value function
A state-value function is a function that represents the agent's estimate of how good it is to be in a state at time step t. It is denoted by and is usually just called the value function. It represents the agent's prediction of the future reward it would get if it were to end up in state at time step t. Mathematically, it can be represented as follows:
What this expression means is that the value of state under policy is the expected sum of the discounted future rewards, where is the discount factor and is a real number in the range [0,1]. Practically, the discount factor is typically set to be in the range of [0.95,0.99]. The other new term is , which is the policy of the agent.