All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class sim.TDLambda

java.lang.Object
   |
   +----sim.Experiment
           |
           +----sim.TDLambda

public class TDLambda
extends Experiment
Perform Temporal Difference learning, TD(lambda), with a given Markov Decision Process or Markov chain and function approximator. If the MDP is a Markov chain, then one can set the exploration factor to 0 and perform standard TD(lambda) for predicting the value of the states. Given an MDP then the object implements TD(lambda) such that anytime the system explores the trace is set to 0. This object has a decay factor for the exploration rate, so that one can explore extensively in the initial stages of learning and reduce the exploration rate in latter stages of learning. The derivative calculations with respect to the inputs have not been fully implemented here.

This code is (c) 1996 Mance E. Harmon <harmonme@aa.wpafb.af.mil>, http://www.cs.cmu.edu/~baird/java
The source and object code may be redistributed freely. If the code is modified, please state so in the comments.

Version:
1.06 25 Aug 97
Author:
Mance E. Harmon, Leemon Baird

Constructor Index

 o TDLambda()

Method Index

 o evaluate()
return the scalar output for the current dInput vector
 o findGradient()
update the fGradient vector based on the current fInput vector
 o getGradient()
The gradient of f(x) with respect to x (a column vector)
 o getInput()
The input x sent to the function f(x) (a column vector)
 o getParameters(int)
Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
 o initialize(int)
Initialize, either partially or completely.
 o runOneStep()
This runs one step of the simulation.
 o setWatchManager(WatchManager, String)
Register all variables with this WatchManager.

Constructors

 o TDLambda
 public TDLambda()

Methods

 o getParameters
 public Object[][] getParameters(int lang)
Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.

Overrides:
getParameters in class Experiment
See Also:
getParameters
 o initialize
 public void initialize(int level)
Initialize, either partially or completely.

Overrides:
initialize in class Experiment
See Also:
initialize
 o setWatchManager
 public void setWatchManager(WatchManager wm,
                             String name)
Register all variables with this WatchManager. This will be called after all parsing is done. setWatchManager should be overridden and forced to call the same method on all the other objects in the experiment.

Overrides:
setWatchManager in class Experiment
 o runOneStep
 public boolean runOneStep()
This runs one step of the simulation. The function returns true when the simulation is completely done. As the simulation is running, it should call the watchManager.update() function when varaibles change so all the display windows can be updated.

Overrides:
runOneStep in class Experiment
 o getInput
 public Matrix getInput()
The input x sent to the function f(x) (a column vector)

 o getGradient
 public Matrix getGradient()
The gradient of f(x) with respect to x (a column vector)

 o evaluate
 public double evaluate()
return the scalar output for the current dInput vector

 o findGradient
 public void findGradient()
update the fGradient vector based on the current fInput vector


All Packages  Class Hierarchy  This Package  Previous  Next  Index