All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class sim.mdp.XORmdp

java.lang.Object
   |
   +----sim.mdp.MDP
           |
           +----sim.mdp.XORmdp

public class XORmdp
extends MDP
A Markov Decision Process that takes a state and action and returns a new state and a reinforcement. This MDP is deterministic. The state space consists of 4 states: {[0,0],[0,1],[1,0],[1,1]}. In state [0,0] there are two possible actions: go left and transition to state [0,1], or go right and transition to state [1,0]. Each action returns a reinforcement of -1. Both states [0,1] and [1,0] have a single action that transitions to state [1,1] and returns a reinforcement of 1. State [1,1] is defined to have a value of 0. This is the analog of the XOR problem for supervised learning systems.

This code is (c) 1996 Mance E. Harmon <harmonme@aa.wpafb.af.mil>, http://www.aa.wpafb.af.mil/~harmonme
The source and object code may be redistributed freely provided no fee is charged. If the code is modified, please state so in the comments.

Version:
1.02, 22 Aug 97
Author:
Mance Harmon

Constructor Index

 o XORmdp()

Method Index

 o actionSize()
Return the number of elements in the action vector.
 o findValAct(Matrix, Matrix, FunApp, Matrix, PBoolean)
Find the value and best action of this state.
 o findValue(Matrix, Matrix, PDouble, FunApp, PDouble, Matrix, PDouble, PBoolean, NumExp, Random)
Find the max over action for where V(x') is the value of the successor state given state x, R is the reinforcement, gamma is the discount factor.
 o getAction(Matrix, Matrix, Random)
Return the next possible action in a state given an action.
 o getParameters(int)
Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
 o getState(Matrix, PDouble, Random)
Return the next state when doing epoch-wise training.
 o initialAction(Matrix, Matrix, Random)
Return an initial action possible in a given state.
 o initialState(Matrix, Random)
Return a start state for epoch-wise training.
 o nextState(Matrix, Matrix, Matrix, PDouble, PBoolean, Random)
Find a next state given a state and action, and return the reinforcement received.
 o numActions(Matrix)
Return the number of actions in each state.
 o numPairs(PDouble)
Return the number of state/action pairs for a given dt.
 o numStates(PDouble)
Return the number of states for this mdp.
 o randomAction(Matrix, Matrix, Random)
Generates a random action from those possible.
 o randomState(Matrix, Random)
Generates a random state from those possible.
 o stateSize()
Return the number of elements in the state vector.

Constructors

 o XORmdp
 public XORmdp()

Methods

 o getParameters
 public Object[][] getParameters(int lang)
Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.

Overrides:
getParameters in class MDP
See Also:
getParameters
 o numStates
 public int numStates(PDouble dt)
Return the number of states for this mdp. This does not include the terminal state [1,1].

Overrides:
numStates in class MDP
 o stateSize
 public int stateSize()
Return the number of elements in the state vector.

Overrides:
stateSize in class MDP
 o initialState
 public void initialState(Matrix state,
                          Random random) throws MatrixException
Return a start state for epoch-wise training.

Throws: MatrixException
Vector is wrong length.
Overrides:
initialState in class MDP
 o getState
 public void getState(Matrix state,
                      PDouble dt,
                      Random random) throws MatrixException
Return the next state when doing epoch-wise training. If the state passed in is [1,1] then the next state is [0,0].

Throws: MatrixException
Vector is wrong length.
Overrides:
getState in class MDP
 o actionSize
 public int actionSize()
Return the number of elements in the action vector.

Overrides:
actionSize in class MDP
 o numActions
 public int numActions(Matrix state)
Return the number of actions in each state.

Overrides:
numActions in class MDP
 o initialAction
 public void initialAction(Matrix state,
                           Matrix action,
                           Random random) throws MatrixException
Return an initial action possible in a given state.

Throws: MatrixException
Vector is wrong length.
Overrides:
initialAction in class MDP
 o getAction
 public void getAction(Matrix state,
                       Matrix action,
                       Random random) throws MatrixException
Return the next possible action in a state given an action.

Throws: MatrixException
Vector is wrong length.
Overrides:
getAction in class MDP
 o numPairs
 public int numPairs(PDouble dt)
Return the number of state/action pairs for a given dt.

Overrides:
numPairs in class MDP
 o randomAction
 public void randomAction(Matrix state,
                          Matrix action,
                          Random random) throws MatrixException
Generates a random action from those possible.

Throws: MatrixException
Vector is wrong length.
Overrides:
randomAction in class MDP
 o randomState
 public void randomState(Matrix state,
                         Random random) throws MatrixException
Generates a random state from those possible.

Throws: MatrixException
Vector is wrong length.
Overrides:
randomState in class MDP
 o nextState
 public double nextState(Matrix state,
                         Matrix action,
                         Matrix newState,
                         PDouble dt,
                         PBoolean valueKnown,
                         Random random) throws MatrixException
Find a next state given a state and action, and return the reinforcement received. All 3 should be vectors (single-column matrices). The duration of the time step, dt, is also returned. Most MDPs will generally make this a constant, given in the parsed string.

Throws: MatrixException
if sizes aren't right.
Overrides:
nextState in class MDP
 o findValAct
 public double findValAct(Matrix state,
                          Matrix action,
                          FunApp f,
                          Matrix outputs,
                          PBoolean valueKnown) throws MatrixException
Find the value and best action of this state. This corrupts the original action passed in by returning in its place the best action for the given state.

Throws: MatrixException
column vectors are wrong size or shape
Overrides:
findValAct in class MDP
 o findValue
 public double findValue(Matrix state,
                         Matrix optAction,
                         PDouble gamma,
                         FunApp f,
                         PDouble dt,
                         Matrix outputs,
                         PDouble reinforcement,
                         PBoolean valueKnown,
                         NumExp explorationFactor,
                         Random random) throws MatrixException
Find the max over action for where V(x') is the value of the successor state given state x, R is the reinforcement, gamma is the discount factor. This method is used in the object ValueIteration. The new state is returned in the state variable, reinforcemet is returned in the reinforcement parameter, the optimal action is returned as a parameter, and the max value is returned.

Throws: MatrixException
column vectors are wrong size or shape
Overrides:
findValue in class MDP

All Packages  Class Hierarchy  This Package  Previous  Next  Index