All Packages Class Hierarchy This Package Previous Next Index
Class sim.mdp.HCDemo
java.lang.Object
|
+----sim.mdp.MDP
|
+----sim.mdp.HCDemo
- public class HCDemo
- extends MDP
A Markov Decision Process that takes a state and action
and returns a new state and a reinforcement. This is a module used to demonstrate
the capabilities of the VRMLInterface module. This module was created by changing
the HC mdp module. The policy was hardcoded so that the missile move directly toward
the plane and the plane would move at a right angle to the missile.
This code is (c) 1996 Mance E. Harmon
<mharmon@acm.org>,
http://eureka1.aa.wpafb.af.mil
The source and object code may be redistributed freely provided
no fee is charged. If the code is modified, please state so
in the comments.
- Version:
- 1.01, 22 Aug 97
- Author:
- Mance Harmon
-
HCDemo()
-
-
actionSize()
- Return the number of elements in the action vector.
-
findValAct(Matrix, Matrix, FunApp, Matrix, PBoolean)
- Find the value and best action of this state.
-
findValue(Matrix, Matrix, PDouble, FunApp, PDouble, Matrix, PDouble, PBoolean, NumExp, Random)
- Find the max over action for where V(x') is the value of the successor state
given state x, R is the reinforcement, gamma is the discount factor.
-
getAction(Matrix, Matrix, Random)
- Return the next possible action in a state given an action.
-
getParameters(int)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
-
getState(Matrix, PDouble, Random)
- Return the next state when doing epoch-wise training.
-
initialAction(Matrix, Matrix, Random)
- Return an initial action possible in a given state.
-
initialState(Matrix, Random)
- Return a start state for epoch-wise training.
-
nextState(Matrix, Matrix, Matrix, PDouble, PBoolean, Random)
- Find a next state given a state and action,
and return the reinforcement received.
-
numActions(Matrix)
- Return the number of actions in each state.
-
numPairs(PDouble)
- Return the number of state/action pairs for a given dt.
-
numStates(PDouble)
- Return the number of states in this MDP.
-
randomAction(Matrix, Matrix, Random)
- Generates a random action from those possible: (missile,plane) {(-1,-1),(-1,1),(1,-1),(1,1)}
-
randomState(Matrix, Random)
- Generates a random state from those possible.
-
setWatchManager(WatchManager, String)
- Register all variables with this WatchManager.
-
stateSize()
- Return the number of elements in the state vector.
HCDemo
public HCDemo()
getParameters
public Object[][] getParameters(int lang)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
- Overrides:
- getParameters in class MDP
- See Also:
- getParameters
setWatchManager
public void setWatchManager(WatchManager wm,
String name)
- Register all variables with this WatchManager.
- Overrides:
- setWatchManager in class MDP
numStates
public int numStates(PDouble dt)
- Return the number of states in this MDP. This will always be epochSize because state space is continuous.
- Overrides:
- numStates in class MDP
stateSize
public int stateSize()
- Return the number of elements in the state vector.
- Overrides:
- stateSize in class MDP
initialState
public void initialState(Matrix state,
Random random) throws MatrixException
- Return a start state for epoch-wise training. This is actually NOT the state, but rather the difference
in the state variables of the two players.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- initialState in class MDP
getState
public void getState(Matrix state,
PDouble dt,
Random random) throws MatrixException
- Return the next state when doing epoch-wise training.
Because this MDP is defined with continuous state space, this simply returns a random state.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- getState in class MDP
actionSize
public int actionSize()
- Return the number of elements in the action vector.
- Overrides:
- actionSize in class MDP
numActions
public int numActions(Matrix state)
- Return the number of actions in each state.
- Overrides:
- numActions in class MDP
initialAction
public void initialAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Return an initial action possible in a given state.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- initialAction in class MDP
getAction
public void getAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Return the next possible action in a state given an action.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- getAction in class MDP
numPairs
public int numPairs(PDouble dt)
- Return the number of state/action pairs for a given dt.
Because we have continuous states, this returns the number of actions in a given state (4)
times the pseudo-epoch size passed in to this as a parameter.
- Overrides:
- numPairs in class MDP
randomAction
public void randomAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Generates a random action from those possible: (missile,plane) {(-1,-1),(-1,1),(1,-1),(1,1)}
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- randomAction in class MDP
randomState
public void randomState(Matrix state,
Random random) throws MatrixException
- Generates a random state from those possible.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- randomState in class MDP
nextState
public double nextState(Matrix state,
Matrix action,
Matrix newState,
PDouble dt,
PBoolean valueKnown,
Random random) throws MatrixException
- Find a next state given a state and action,
and return the reinforcement received.
All 3 should be vectors (single-column matrices).
The duration of the time step, dt, is also returned. Most MDPs
will generally make this a constant, given in the parsed string.
- Throws: MatrixException
- if sizes aren't right.
- Overrides:
- nextState in class MDP
findValAct
public double findValAct(Matrix state,
Matrix action,
FunApp f,
Matrix outputs,
PBoolean valueKnown) throws MatrixException
- Find the value and best action of this state. This corrupts the original action passed in
by returning in its place the best action for the given state.
- Throws: MatrixException
- column vectors are wrong size or shape
- Overrides:
- findValAct in class MDP
findValue
public double findValue(Matrix state,
Matrix optAction,
PDouble gamma,
FunApp f,
PDouble dt,
Matrix outputs,
PDouble reinforcement,
PBoolean valueKnown,
NumExp explorationFactor,
Random random) throws MatrixException
- Find the max over action for where V(x') is the value of the successor state
given state x, R is the reinforcement, gamma is the discount factor. This method is used in
the object ValIter (value iteration).
- Throws: MatrixException
- column vectors are wrong size or shape
- Overrides:
- findValue in class MDP
All Packages Class Hierarchy This Package Previous Next Index