MultiAgentDecisionProcess
|
AgentQLearner applies standard single-agent Q-learning in the joint action and state space. More...
#include <AgentQLearner.h>
Public Member Functions | |
Index | Act (Index sI, Index joI, double reward) |
This method returns the next action for state sI. More... | |
AgentQLearner (const PlanningUnitDecPOMDPDiscrete *pu, Index id, double initValue, double epsilon, double alpha, double gamma, ExplorationT expl=EXPL_EGREEDY, double temp=0.4) | |
Constructor. More... | |
AgentQLearner (const AgentQLearner &a) | |
Copy constructor. More... | |
Index | getGreedyAction (Index sI) const |
This method returns the greedy action, corresponding to the action with the highest Q-value, for the given state. More... | |
double | getMaxState (Index sI, std::list< Index > *actions=NULL) const |
This method returns the highest Q-value in state sI. More... | |
Index | getNonGreedyAction (Index sI) const |
QTable | GetQTable () const |
Return learned (infinite horizon) Q-Table. More... | |
bool | isFirstAgent () const |
void | Learn (Index jaI, double r, Index sI, Index prevSI) |
Update the internal Q table. More... | |
virtual void | ResetEpisode () |
Will be called before an episode, to reinitialize the agent. More... | |
void | SetFirstAgent (const AgentQLearner *firstAgent) |
void | setTemp (double temp) |
void | updateEpsilon (double fract) |
~AgentQLearner () | |
Destructor. More... | |
Public Member Functions inherited from AgentFullyObservable | |
AgentFullyObservable (const PlanningUnitDecPOMDPDiscrete *pu, Index id) | |
(default) Constructor More... | |
AgentFullyObservable (const AgentFullyObservable &a) | |
Copy constructor. More... | |
~AgentFullyObservable () | |
Destructor. More... | |
Public Member Functions inherited from AgentDecPOMDPDiscrete | |
AgentDecPOMDPDiscrete (const PlanningUnitDecPOMDPDiscrete *pu, Index id) | |
(default) Constructor More... | |
AgentDecPOMDPDiscrete (const AgentDecPOMDPDiscrete &a) | |
Copy constructor. More... | |
const PlanningUnitDecPOMDPDiscrete * | GetPU () const |
Public Member Functions inherited from SimulationAgent | |
virtual Index | GetIndex () const |
Retrieves the index of this agent. More... | |
virtual bool | GetVerbose () const |
If true, the agent will report more. More... | |
void | Print () const |
Print out some information about this agent. More... | |
virtual void | SetIndex (Index id) |
Sets the index of this agent. More... | |
virtual void | SetVerbose (bool verbose) |
Set whether this agent should be verbose. More... | |
SimulationAgent (Index id, bool verbose=false) | |
(default) Constructor More... | |
virtual std::string | SoftPrint () const |
Return some information about this agent. More... | |
virtual | ~SimulationAgent () |
Destructor. More... | |
Private Member Functions | |
Index | GetLastActionChosen () const |
Private Attributes | |
double | _m_alpha |
learning rate More... | |
double | _m_epsilon |
greedy probability More... | |
ExplorationT | _m_exploration |
exploration strategy More... | |
const AgentQLearner * | _m_firstAgent |
agent with id 0 for last action lookup More... | |
double | _m_gamma |
discount rate More... | |
double | _m_initValue |
initial Q-value More... | |
Index | _m_prevSI |
The state for which Act was last called. More... | |
QTable | _m_Q |
The tabular Q function to be learned. More... | |
Index | _m_selJaI |
The selected action in the previous Act call. More... | |
size_t | _m_t |
The episode count. More... | |
double | _m_temp |
boltzmann temperature More... | |
AgentQLearner applies standard single-agent Q-learning in the joint action and state space.
AgentQLearner::AgentQLearner | ( | const PlanningUnitDecPOMDPDiscrete * | pu, |
Index | id, | ||
double | initValue, | ||
double | epsilon, | ||
double | alpha, | ||
double | gamma, | ||
ExplorationT | expl = EXPL_EGREEDY , |
||
double | temp = 0.4 |
||
) |
Constructor.
References _m_alpha, _m_epsilon, _m_exploration, _m_gamma, _m_initValue, _m_prevSI, _m_selJaI, _m_t, and _m_temp.
AgentQLearner::AgentQLearner | ( | const AgentQLearner & | a | ) |
Copy constructor.
AgentQLearner::~AgentQLearner | ( | ) |
Destructor.
References _m_Q, QTable::GetNrActions(), and QTable::GetNrStates().
This method returns the next action for state sI.
Based on the member variables either a greedy action or an exploration action is taken. This is based on the exploration strategy (either Boltzmann or e-greedy). When a greedy action is taken but multiple Q-values have the same value, one of these is selected randomly.
sI | current state |
r | reward received at last iteration |
Implements AgentFullyObservable.
References _m_epsilon, _m_exploration, _m_prevSI, _m_Q, _m_selJaI, _m_t, _m_temp, EXPL_BOLTZMANN, EXPL_EGREEDY, getGreedyAction(), SimulationAgent::GetIndex(), GetLastActionChosen(), getNonGreedyAction(), AgentDecPOMDPDiscrete::GetPU(), QTable::GetRow(), isFirstAgent(), PlanningUnitMADPDiscrete::JointToIndividualActionIndices(), and Learn().
This method returns the greedy action, corresponding to the action with the highest Q-value, for the given state.
When multiple Q-values have the same optimum value, one of these actions is selected randomly.
sI | State for which greedy action has to be determined. |
References getMaxState().
Referenced by Act().
|
inlineprivate |
This method returns the highest Q-value in state sI.
This corresponds to the value associated with the getGreedyAction action.
sI | state of which maximum Q-value will be returned |
References _m_Q, EPSILON, and QTable::GetRow().
Referenced by getGreedyAction(), and Learn().
References QTable::GetNrActions().
Referenced by Act().
|
inline |
Return learned (infinite horizon) Q-Table.
References _m_Q.
|
inline |
References _m_firstAgent.
Update the internal Q table.
This method updates the Q-values (prevSI,jaI) given the next state sI and the received reward using the standard Bellman equation.
References _m_alpha, _m_epsilon, _m_gamma, _m_Q, _m_t, getMaxState(), and isFirstAgent().
Referenced by Act().
|
virtual |
Will be called before an episode, to reinitialize the agent.
Implements SimulationAgent.
References _m_t.
|
inline |
|
inline |
|
inline |
|
private |
learning rate
Referenced by AgentQLearner(), and Learn().
|
private |
greedy probability
Referenced by Act(), AgentQLearner(), and Learn().
|
private |
exploration strategy
Referenced by Act(), and AgentQLearner().
|
private |
agent with id 0 for last action lookup
Referenced by isFirstAgent().
|
private |
discount rate
Referenced by AgentQLearner(), and Learn().
|
private |
initial Q-value
Referenced by AgentQLearner().
|
private |
The state for which Act
was last called.
Referenced by Act(), and AgentQLearner().
|
private |
The tabular Q function to be learned.
Referenced by Act(), getMaxState(), GetQTable(), Learn(), and ~AgentQLearner().
|
private |
The selected action in the previous Act
call.
Referenced by Act(), AgentQLearner(), and GetLastActionChosen().
|
private |
The episode count.
Referenced by Act(), AgentQLearner(), Learn(), and ResetEpisode().
|
private |
boltzmann temperature
Referenced by Act(), and AgentQLearner().