All Packages Class Hierarchy This Package Previous Next Index
Class sim.mdp.HC
java.lang.Object
|
+----sim.mdp.MDP
|
+----sim.mdp.HC
- public class HC
- extends MDP
A Markov Decision Process that takes a state and action
and returns a new state and a reinforcement. This MDP is deterministic.
This code is (c) 1996 Mance E. Harmon
<mharmon@acm.org>,
http://eureka1.aa.wpafb.af.mil
The source and object code may be redistributed freely provided
no fee is charged. If the code is modified, please state so
in the comments.
- Version:
- 1.03, 19 Sep 97
- Author:
- Mance Harmon
-
epochSize
- Size of the epoch.
-
HC()
-
-
actionSize()
- Return the number of elements in the action vector.
-
findValAct(Matrix, Matrix, FunApp, Matrix, PBoolean)
- Find the value and best action of this state.
-
findValue(Matrix, Matrix, PDouble, FunApp, PDouble, Matrix, PDouble, PBoolean, NumExp, Random)
- Find the max over action for where V(x') is the value of the successor state
given state x, R is the reinforcement, gamma is the discount factor.
-
getAction(Matrix, Matrix, Random)
- Return the next possible action in a state given an action.
-
getParameters(int)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
-
getState(Matrix, PDouble, Random)
- Return the next state when doing epoch-wise training.
-
initialAction(Matrix, Matrix, Random)
- Return an initial action possible in a given state.
-
initialState(Matrix, Random)
- Return a start state for epoch-wise training.
-
nextState(Matrix, Matrix, Matrix, PDouble, PBoolean, Random)
- Find a next state given a state and action,
and return the reinforcement received.
-
numActions(Matrix)
- Return the number of actions in each state.
-
numPairs(PDouble)
- Return the number of state/action pairs for a given dt.
-
numStates(PDouble)
- Return the number of states in this MDP.
-
randomAction(Matrix, Matrix, Random)
- Generates a random action from those possible: (missile,plane) {(-1,-1),(-1,1),(1,-1),(1,1)}
-
randomState(Matrix, Random)
- Generates a random state from those possible.
-
setWatchManager(WatchManager, String)
- Register all variables with this WatchManager.
-
stateSize()
- Return the number of elements in the state vector.
epochSize
protected IntExp epochSize
- Size of the epoch. Only needed when doing epochwise training on continuous state space
HC
public HC()
getParameters
public Object[][] getParameters(int lang)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
- Overrides:
- getParameters in class MDP
- See Also:
- getParameters
setWatchManager
public void setWatchManager(WatchManager wm,
String name)
- Register all variables with this WatchManager.
- Overrides:
- setWatchManager in class MDP
numStates
public int numStates(PDouble dt)
- Return the number of states in this MDP. This will always be epochSize because state space is continuous.
- Overrides:
- numStates in class MDP
stateSize
public int stateSize()
- Return the number of elements in the state vector.
- Overrides:
- stateSize in class MDP
initialState
public void initialState(Matrix state,
Random random) throws MatrixException
- Return a start state for epoch-wise training. This is actually NOT the state, but rather the difference
in the state variables of the two players.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- initialState in class MDP
getState
public void getState(Matrix state,
PDouble dt,
Random random) throws MatrixException
- Return the next state when doing epoch-wise training.
Because this MDP is defined with continuous state space, this simply returns a random state.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- getState in class MDP
actionSize
public int actionSize()
- Return the number of elements in the action vector.
- Overrides:
- actionSize in class MDP
numActions
public int numActions(Matrix state)
- Return the number of actions in each state.
- Overrides:
- numActions in class MDP
initialAction
public void initialAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Return an initial action possible in a given state.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- initialAction in class MDP
getAction
public void getAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Return the next possible action in a state given an action.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- getAction in class MDP
numPairs
public int numPairs(PDouble dt)
- Return the number of state/action pairs for a given dt.
Because we have continuous states, this returns the number of actions in a given state (4)
times the pseudo-epoch size passed in to this as a parameter.
- Overrides:
- numPairs in class MDP
randomAction
public void randomAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Generates a random action from those possible: (missile,plane) {(-1,-1),(-1,1),(1,-1),(1,1)}
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- randomAction in class MDP
randomState
public void randomState(Matrix state,
Random random) throws MatrixException
- Generates a random state from those possible.
- Throws: MatrixException
- Vector is wrong length.
- Overrides:
- randomState in class MDP
nextState
public double nextState(Matrix state,
Matrix action,
Matrix newState,
PDouble dt,
PBoolean valueKnown,
Random random) throws MatrixException
- Find a next state given a state and action,
and return the reinforcement received.
All 3 should be vectors (single-column matrices).
The duration of the time step, dt, is also returned. Most MDPs
will generally make this a constant, given in the parsed string.
- Throws: MatrixException
- if sizes aren't right.
- Overrides:
- nextState in class MDP
findValAct
public double findValAct(Matrix state,
Matrix action,
FunApp f,
Matrix outputs,
PBoolean valueKnown) throws MatrixException
- Find the value and best action of this state. This corrupts the original action passed in
by returning in its place the best action for the given state.
- Throws: MatrixException
- column vectors are wrong size or shape
- Overrides:
- findValAct in class MDP
findValue
public double findValue(Matrix state,
Matrix optAction,
PDouble gamma,
FunApp f,
PDouble dt,
Matrix outputs,
PDouble reinforcement,
PBoolean valueKnown,
NumExp explorationFactor,
Random random) throws MatrixException
- Find the max over action for where V(x') is the value of the successor state
given state x, R is the reinforcement, gamma is the discount factor. This method is used in
the object ValIter (value iteration).
- Throws: MatrixException
- column vectors are wrong size or shape
- Overrides:
- findValue in class MDP
All Packages Class Hierarchy This Package Previous Next Index