AgentQLearner applies standard single-agent Q-learning in the joint action and state space. More...

#include <AgentQLearner.h>

Inheritance diagram for AgentQLearner:

Public Member Functions
Index	Act (Index sI, Index joI, double reward)
	This method returns the next action for state sI. More...

	AgentQLearner (const PlanningUnitDecPOMDPDiscrete *pu, Index id, double initValue, double epsilon, double alpha, double gamma, ExplorationT expl=EXPL_EGREEDY, double temp=0.4)
	Constructor. More...

	AgentQLearner (const AgentQLearner &a)
	Copy constructor. More...

Index	getGreedyAction (Index sI) const
	This method returns the greedy action, corresponding to the action with the highest Q-value, for the given state. More...

double	getMaxState (Index sI, std::list< Index > *actions=NULL) const
	This method returns the highest Q-value in state sI. More...

Index	getNonGreedyAction (Index sI) const

QTable	GetQTable () const
	Return learned (infinite horizon) Q-Table. More...

bool	isFirstAgent () const

void	Learn (Index jaI, double r, Index sI, Index prevSI)
	Update the internal Q table. More...

virtual void	ResetEpisode ()
	Will be called before an episode, to reinitialize the agent. More...

void	SetFirstAgent (const AgentQLearner *firstAgent)

void	setTemp (double temp)

void	updateEpsilon (double fract)

	~AgentQLearner ()
	Destructor. More...

Public Member Functions inherited from AgentFullyObservable
	AgentFullyObservable (const PlanningUnitDecPOMDPDiscrete *pu, Index id)
	(default) Constructor More...

	AgentFullyObservable (const AgentFullyObservable &a)
	Copy constructor. More...

	~AgentFullyObservable ()
	Destructor. More...

Public Member Functions inherited from AgentDecPOMDPDiscrete
	AgentDecPOMDPDiscrete (const PlanningUnitDecPOMDPDiscrete *pu, Index id)
	(default) Constructor More...

	AgentDecPOMDPDiscrete (const AgentDecPOMDPDiscrete &a)
	Copy constructor. More...

const PlanningUnitDecPOMDPDiscrete *	GetPU () const

Public Member Functions inherited from SimulationAgent
virtual Index	GetIndex () const
	Retrieves the index of this agent. More...

virtual bool	GetVerbose () const
	If true, the agent will report more. More...

void	Print () const
	Print out some information about this agent. More...

virtual void	SetIndex (Index id)
	Sets the index of this agent. More...

virtual void	SetVerbose (bool verbose)
	Set whether this agent should be verbose. More...

	SimulationAgent (Index id, bool verbose=false)
	(default) Constructor More...

virtual std::string	SoftPrint () const
	Return some information about this agent. More...

virtual	~SimulationAgent ()
	Destructor. More...

Private Member Functions
Index	GetLastActionChosen () const

Private Attributes
double	_m_alpha
	learning rate More...

double	_m_epsilon
	greedy probability More...

ExplorationT	_m_exploration
	exploration strategy More...

const AgentQLearner *	_m_firstAgent
	agent with id 0 for last action lookup More...

double	_m_gamma
	discount rate More...

double	_m_initValue
	initial Q-value More...

Index	_m_prevSI
	The state for which `Act` was last called. More...

QTable	_m_Q
	The tabular Q function to be learned. More...

Index	_m_selJaI
	The selected action in the previous `Act` call. More...

size_t	_m_t
	The episode count. More...

double	_m_temp
	boltzmann temperature More...

Detailed Description

AgentQLearner applies standard single-agent Q-learning in the joint action and state space.

Constructor & Destructor Documentation

AgentQLearner::AgentQLearner	(	const PlanningUnitDecPOMDPDiscrete *	pu,
		Index	id,
		double	initValue,
		double	epsilon,
		double	alpha,
		double	gamma,
		ExplorationT	expl = `EXPL_EGREEDY`,
		double	temp = `0.4`
	)

Constructor.

References _m_alpha, _m_epsilon, _m_exploration, _m_gamma, _m_initValue, _m_prevSI, _m_selJaI, _m_t, and _m_temp.

AgentQLearner::AgentQLearner ( const AgentQLearner & a )

Copy constructor.

AgentQLearner::~AgentQLearner ( )

Destructor.

References _m_Q, QTable::GetNrActions(), and QTable::GetNrStates().

Member Function Documentation

Index AgentQLearner::Act	(	Index	sI,
		Index	joI,
		double	r
	)

virtual

This method returns the next action for state sI.

Based on the member variables either a greedy action or an exploration action is taken. This is based on the exploration strategy (either Boltzmann or e-greedy). When a greedy action is taken but multiple Q-values have the same value, one of these is selected randomly.

Parameters