|
MultiAgentDecisionProcess
|
AgentQLearner applies standard single-agent Q-learning in the joint action and state space. More...
#include <AgentQLearner.h>
Public Member Functions | |
| Index | Act (Index sI, Index joI, double reward) |
| This method returns the next action for state sI. More... | |
| AgentQLearner (const PlanningUnitDecPOMDPDiscrete *pu, Index id, double initValue, double epsilon, double alpha, double gamma, ExplorationT expl=EXPL_EGREEDY, double temp=0.4) | |
| Constructor. More... | |
| AgentQLearner (const AgentQLearner &a) | |
| Copy constructor. More... | |
| Index | getGreedyAction (Index sI) const |
| This method returns the greedy action, corresponding to the action with the highest Q-value, for the given state. More... | |
| double | getMaxState (Index sI, std::list< Index > *actions=NULL) const |
| This method returns the highest Q-value in state sI. More... | |
| Index | getNonGreedyAction (Index sI) const |
| QTable | GetQTable () const |
| Return learned (infinite horizon) Q-Table. More... | |
| bool | isFirstAgent () const |
| void | Learn (Index jaI, double r, Index sI, Index prevSI) |
| Update the internal Q table. More... | |
| virtual void | ResetEpisode () |
| Will be called before an episode, to reinitialize the agent. More... | |
| void | SetFirstAgent (const AgentQLearner *firstAgent) |
| void | setTemp (double temp) |
| void | updateEpsilon (double fract) |
| ~AgentQLearner () | |
| Destructor. More... | |
Public Member Functions inherited from AgentFullyObservable | |
| AgentFullyObservable (const PlanningUnitDecPOMDPDiscrete *pu, Index id) | |
| (default) Constructor More... | |
| AgentFullyObservable (const AgentFullyObservable &a) | |
| Copy constructor. More... | |
| ~AgentFullyObservable () | |
| Destructor. More... | |
Public Member Functions inherited from AgentDecPOMDPDiscrete | |
| AgentDecPOMDPDiscrete (const PlanningUnitDecPOMDPDiscrete *pu, Index id) | |
| (default) Constructor More... | |
| AgentDecPOMDPDiscrete (const AgentDecPOMDPDiscrete &a) | |
| Copy constructor. More... | |
| const PlanningUnitDecPOMDPDiscrete * | GetPU () const |
Public Member Functions inherited from SimulationAgent | |
| virtual Index | GetIndex () const |
| Retrieves the index of this agent. More... | |
| virtual bool | GetVerbose () const |
| If true, the agent will report more. More... | |
| void | Print () const |
| Print out some information about this agent. More... | |
| virtual void | SetIndex (Index id) |
| Sets the index of this agent. More... | |
| virtual void | SetVerbose (bool verbose) |
| Set whether this agent should be verbose. More... | |
| SimulationAgent (Index id, bool verbose=false) | |
| (default) Constructor More... | |
| virtual std::string | SoftPrint () const |
| Return some information about this agent. More... | |
| virtual | ~SimulationAgent () |
| Destructor. More... | |
Private Member Functions | |
| Index | GetLastActionChosen () const |
Private Attributes | |
| double | _m_alpha |
| learning rate More... | |
| double | _m_epsilon |
| greedy probability More... | |
| ExplorationT | _m_exploration |
| exploration strategy More... | |
| const AgentQLearner * | _m_firstAgent |
| agent with id 0 for last action lookup More... | |
| double | _m_gamma |
| discount rate More... | |
| double | _m_initValue |
| initial Q-value More... | |
| Index | _m_prevSI |
The state for which Act was last called. More... | |
| QTable | _m_Q |
| The tabular Q function to be learned. More... | |
| Index | _m_selJaI |
The selected action in the previous Act call. More... | |
| size_t | _m_t |
| The episode count. More... | |
| double | _m_temp |
| boltzmann temperature More... | |
AgentQLearner applies standard single-agent Q-learning in the joint action and state space.
| AgentQLearner::AgentQLearner | ( | const PlanningUnitDecPOMDPDiscrete * | pu, |
| Index | id, | ||
| double | initValue, | ||
| double | epsilon, | ||
| double | alpha, | ||
| double | gamma, | ||
| ExplorationT | expl = EXPL_EGREEDY, |
||
| double | temp = 0.4 |
||
| ) |
Constructor.
References _m_alpha, _m_epsilon, _m_exploration, _m_gamma, _m_initValue, _m_prevSI, _m_selJaI, _m_t, and _m_temp.
| AgentQLearner::AgentQLearner | ( | const AgentQLearner & | a | ) |
Copy constructor.
| AgentQLearner::~AgentQLearner | ( | ) |
Destructor.
References _m_Q, QTable::GetNrActions(), and QTable::GetNrStates().
This method returns the next action for state sI.
Based on the member variables either a greedy action or an exploration action is taken. This is based on the exploration strategy (either Boltzmann or e-greedy). When a greedy action is taken but multiple Q-values have the same value, one of these is selected randomly.
| sI | current state |
| r | reward received at last iteration |
Implements AgentFullyObservable.
References _m_epsilon, _m_exploration, _m_prevSI, _m_Q, _m_selJaI, _m_t, _m_temp, EXPL_BOLTZMANN, EXPL_EGREEDY, getGreedyAction(), SimulationAgent::GetIndex(), GetLastActionChosen(), getNonGreedyAction(), AgentDecPOMDPDiscrete::GetPU(), QTable::GetRow(), isFirstAgent(), PlanningUnitMADPDiscrete::JointToIndividualActionIndices(), and Learn().
This method returns the greedy action, corresponding to the action with the highest Q-value, for the given state.
When multiple Q-values have the same optimum value, one of these actions is selected randomly.
| sI | State for which greedy action has to be determined. |
References getMaxState().
Referenced by Act().
|
inlineprivate |
This method returns the highest Q-value in state sI.
This corresponds to the value associated with the getGreedyAction action.
| sI | state of which maximum Q-value will be returned |
References _m_Q, EPSILON, and QTable::GetRow().
Referenced by getGreedyAction(), and Learn().
References QTable::GetNrActions().
Referenced by Act().
|
inline |
Return learned (infinite horizon) Q-Table.
References _m_Q.
|
inline |
References _m_firstAgent.
Update the internal Q table.
This method updates the Q-values (prevSI,jaI) given the next state sI and the received reward using the standard Bellman equation.
References _m_alpha, _m_epsilon, _m_gamma, _m_Q, _m_t, getMaxState(), and isFirstAgent().
Referenced by Act().
|
virtual |
Will be called before an episode, to reinitialize the agent.
Implements SimulationAgent.
References _m_t.
|
inline |
|
inline |
|
inline |
|
private |
learning rate
Referenced by AgentQLearner(), and Learn().
|
private |
greedy probability
Referenced by Act(), AgentQLearner(), and Learn().
|
private |
exploration strategy
Referenced by Act(), and AgentQLearner().
|
private |
agent with id 0 for last action lookup
Referenced by isFirstAgent().
|
private |
discount rate
Referenced by AgentQLearner(), and Learn().
|
private |
initial Q-value
Referenced by AgentQLearner().
|
private |
The state for which Act was last called.
Referenced by Act(), and AgentQLearner().
|
private |
The tabular Q function to be learned.
Referenced by Act(), getMaxState(), GetQTable(), Learn(), and ~AgentQLearner().
|
private |
The selected action in the previous Act call.
Referenced by Act(), AgentQLearner(), and GetLastActionChosen().
|
private |
The episode count.
Referenced by Act(), AgentQLearner(), Learn(), and ResetEpisode().
|
private |
boltzmann temperature
Referenced by Act(), and AgentQLearner().