MultiAgentDecisionProcess
AgentQLearner Class Reference

AgentQLearner applies standard single-agent Q-learning in the joint action and state space. More...

#include <AgentQLearner.h>

Inheritance diagram for AgentQLearner:
[legend]

Public Member Functions

Index Act (Index sI, Index joI, double reward)
 This method returns the next action for state sI. More...
 
 AgentQLearner (const PlanningUnitDecPOMDPDiscrete *pu, Index id, double initValue, double epsilon, double alpha, double gamma, ExplorationT expl=EXPL_EGREEDY, double temp=0.4)
 Constructor. More...
 
 AgentQLearner (const AgentQLearner &a)
 Copy constructor. More...
 
Index getGreedyAction (Index sI) const
 This method returns the greedy action, corresponding to the action with the highest Q-value, for the given state. More...
 
double getMaxState (Index sI, std::list< Index > *actions=NULL) const
 This method returns the highest Q-value in state sI. More...
 
Index getNonGreedyAction (Index sI) const
 
QTable GetQTable () const
 Return learned (infinite horizon) Q-Table. More...
 
bool isFirstAgent () const
 
void Learn (Index jaI, double r, Index sI, Index prevSI)
 Update the internal Q table. More...
 
virtual void ResetEpisode ()
 Will be called before an episode, to reinitialize the agent. More...
 
void SetFirstAgent (const AgentQLearner *firstAgent)
 
void setTemp (double temp)
 
void updateEpsilon (double fract)
 
 ~AgentQLearner ()
 Destructor. More...
 
- Public Member Functions inherited from AgentFullyObservable
 AgentFullyObservable (const PlanningUnitDecPOMDPDiscrete *pu, Index id)
 (default) Constructor More...
 
 AgentFullyObservable (const AgentFullyObservable &a)
 Copy constructor. More...
 
 ~AgentFullyObservable ()
 Destructor. More...
 
- Public Member Functions inherited from AgentDecPOMDPDiscrete
 AgentDecPOMDPDiscrete (const PlanningUnitDecPOMDPDiscrete *pu, Index id)
 (default) Constructor More...
 
 AgentDecPOMDPDiscrete (const AgentDecPOMDPDiscrete &a)
 Copy constructor. More...
 
const
PlanningUnitDecPOMDPDiscrete
GetPU () const
 
- Public Member Functions inherited from SimulationAgent
virtual Index GetIndex () const
 Retrieves the index of this agent. More...
 
virtual bool GetVerbose () const
 If true, the agent will report more. More...
 
void Print () const
 Print out some information about this agent. More...
 
virtual void SetIndex (Index id)
 Sets the index of this agent. More...
 
virtual void SetVerbose (bool verbose)
 Set whether this agent should be verbose. More...
 
 SimulationAgent (Index id, bool verbose=false)
 (default) Constructor More...
 
virtual std::string SoftPrint () const
 Return some information about this agent. More...
 
virtual ~SimulationAgent ()
 Destructor. More...
 

Private Member Functions

Index GetLastActionChosen () const
 

Private Attributes

double _m_alpha
 learning rate More...
 
double _m_epsilon
 greedy probability More...
 
ExplorationT _m_exploration
 exploration strategy More...
 
const AgentQLearner_m_firstAgent
 agent with id 0 for last action lookup More...
 
double _m_gamma
 discount rate More...
 
double _m_initValue
 initial Q-value More...
 
Index _m_prevSI
 The state for which Act was last called. More...
 
QTable _m_Q
 The tabular Q function to be learned. More...
 
Index _m_selJaI
 The selected action in the previous Act call. More...
 
size_t _m_t
 The episode count. More...
 
double _m_temp
 boltzmann temperature More...
 

Detailed Description

AgentQLearner applies standard single-agent Q-learning in the joint action and state space.

Constructor & Destructor Documentation

AgentQLearner::AgentQLearner ( const PlanningUnitDecPOMDPDiscrete pu,
Index  id,
double  initValue,
double  epsilon,
double  alpha,
double  gamma,
ExplorationT  expl = EXPL_EGREEDY,
double  temp = 0.4 
)
AgentQLearner::AgentQLearner ( const AgentQLearner a)

Copy constructor.

AgentQLearner::~AgentQLearner ( )

Destructor.

References _m_Q, QTable::GetNrActions(), and QTable::GetNrStates().

Member Function Documentation

Index AgentQLearner::Act ( Index  sI,
Index  joI,
double  r 
)
virtual

This method returns the next action for state sI.

Based on the member variables either a greedy action or an exploration action is taken. This is based on the exploration strategy (either Boltzmann or e-greedy). When a greedy action is taken but multiple Q-values have the same value, one of these is selected randomly.

Parameters
sIcurrent state
rreward received at last iteration
Returns
action for state sI

Implements AgentFullyObservable.

References _m_epsilon, _m_exploration, _m_prevSI, _m_Q, _m_selJaI, _m_t, _m_temp, EXPL_BOLTZMANN, EXPL_EGREEDY, getGreedyAction(), SimulationAgent::GetIndex(), GetLastActionChosen(), getNonGreedyAction(), AgentDecPOMDPDiscrete::GetPU(), QTable::GetRow(), isFirstAgent(), PlanningUnitMADPDiscrete::JointToIndividualActionIndices(), and Learn().

Index AgentQLearner::getGreedyAction ( Index  sI) const

This method returns the greedy action, corresponding to the action with the highest Q-value, for the given state.

When multiple Q-values have the same optimum value, one of these actions is selected randomly.

Parameters
sIState for which greedy action has to be determined.
Returns
action id

References getMaxState().

Referenced by Act().

Index AgentQLearner::GetLastActionChosen ( ) const
inlineprivate

References _m_selJaI.

Referenced by Act().

double AgentQLearner::getMaxState ( Index  sI,
std::list< Index > *  actions = NULL 
) const

This method returns the highest Q-value in state sI.

This corresponds to the value associated with the getGreedyAction action.

Parameters
sIstate of which maximum Q-value will be returned
Returns
maximum Q-value in sI

References _m_Q, EPSILON, and QTable::GetRow().

Referenced by getGreedyAction(), and Learn().

Index AgentQLearner::getNonGreedyAction ( Index  sI) const
inline

References QTable::GetNrActions().

Referenced by Act().

QTable AgentQLearner::GetQTable ( ) const
inline

Return learned (infinite horizon) Q-Table.

References _m_Q.

bool AgentQLearner::isFirstAgent ( ) const
inline

References _m_firstAgent.

Referenced by Act(), and Learn().

void AgentQLearner::Learn ( Index  jaI,
double  r,
Index  sI,
Index  prevSI 
)

Update the internal Q table.

This method updates the Q-values (prevSI,jaI) given the next state sI and the received reward using the standard Bellman equation.

References _m_alpha, _m_epsilon, _m_gamma, _m_Q, _m_t, getMaxState(), and isFirstAgent().

Referenced by Act().

void AgentQLearner::ResetEpisode ( )
virtual

Will be called before an episode, to reinitialize the agent.

Implements SimulationAgent.

References _m_t.

void AgentQLearner::SetFirstAgent ( const AgentQLearner firstAgent)
inline
void AgentQLearner::setTemp ( double  temp)
inline
void AgentQLearner::updateEpsilon ( double  fract)
inline

Member Data Documentation

double AgentQLearner::_m_alpha
private

learning rate

Referenced by AgentQLearner(), and Learn().

double AgentQLearner::_m_epsilon
private

greedy probability

Referenced by Act(), AgentQLearner(), and Learn().

ExplorationT AgentQLearner::_m_exploration
private

exploration strategy

Referenced by Act(), and AgentQLearner().

const AgentQLearner* AgentQLearner::_m_firstAgent
private

agent with id 0 for last action lookup

Referenced by isFirstAgent().

double AgentQLearner::_m_gamma
private

discount rate

Referenced by AgentQLearner(), and Learn().

double AgentQLearner::_m_initValue
private

initial Q-value

Referenced by AgentQLearner().

Index AgentQLearner::_m_prevSI
private

The state for which Act was last called.

Referenced by Act(), and AgentQLearner().

QTable AgentQLearner::_m_Q
private

The tabular Q function to be learned.

Referenced by Act(), getMaxState(), GetQTable(), Learn(), and ~AgentQLearner().

Index AgentQLearner::_m_selJaI
private

The selected action in the previous Act call.

Referenced by Act(), AgentQLearner(), and GetLastActionChosen().

size_t AgentQLearner::_m_t
private

The episode count.

Referenced by Act(), AgentQLearner(), Learn(), and ResetEpisode().

double AgentQLearner::_m_temp
private

boltzmann temperature

Referenced by Act(), and AgentQLearner().