All Packages Class Hierarchy This Package Previous Next Index
Class sim.TDLambda
java.lang.Object
|
+----sim.Experiment
|
+----sim.TDLambda
- public class TDLambda
- extends Experiment
Perform Temporal Difference learning, TD(lambda), with a given Markov Decision
Process or Markov chain and function approximator. If the MDP is a Markov chain, then one
can set the exploration factor to 0 and perform standard TD(lambda) for predicting the
value of the states. Given an MDP then the object implements TD(lambda) such that anytime
the system explores the trace is set to 0. This object has a decay factor for the
exploration rate, so that one can explore extensively in the initial stages of learning
and reduce the exploration rate in latter stages of learning. The derivative
calculations with respect to the inputs have not been fully implemented here.
This code is (c) 1996 Mance E. Harmon
<harmonme@aa.wpafb.af.mil>,
http://www.cs.cmu.edu/~baird/java
The source and object code may be redistributed freely.
If the code is modified, please state so in the comments.
- Version:
- 1.06 25 Aug 97
- Author:
- Mance E. Harmon, Leemon Baird
-
TDLambda()
-
-
evaluate()
- return the scalar output for the current dInput vector
-
findGradient()
- update the fGradient vector based on the current fInput vector
-
getGradient()
- The gradient of f(x) with respect to x (a column vector)
-
getInput()
- The input x sent to the function f(x) (a column vector)
-
getParameters(int)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
-
initialize(int)
- Initialize, either partially or completely.
-
runOneStep()
- This runs one step of the simulation.
-
setWatchManager(WatchManager, String)
- Register all variables with this WatchManager.
TDLambda
public TDLambda()
getParameters
public Object[][] getParameters(int lang)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
- Overrides:
- getParameters in class Experiment
- See Also:
- getParameters
initialize
public void initialize(int level)
- Initialize, either partially or completely.
- Overrides:
- initialize in class Experiment
- See Also:
- initialize
setWatchManager
public void setWatchManager(WatchManager wm,
String name)
- Register all variables with this WatchManager.
This will be called after all parsing is done.
setWatchManager should be overridden and forced to
call the same method on all the other objects in the experiment.
- Overrides:
- setWatchManager in class Experiment
runOneStep
public boolean runOneStep()
- This runs one step of the simulation. The function returns true when the simulation
is completely done. As the simulation is running, it should call
the watchManager.update() function when varaibles change so all the display
windows can be updated.
- Overrides:
- runOneStep in class Experiment
getInput
public Matrix getInput()
- The input x sent to the function f(x) (a column vector)
getGradient
public Matrix getGradient()
- The gradient of f(x) with respect to x (a column vector)
evaluate
public double evaluate()
- return the scalar output for the current dInput vector
findGradient
public void findGradient()
- update the fGradient vector based on the current fInput vector
All Packages Class Hierarchy This Package Previous Next Index