Coerces a state to a history
Equations
- MDPs.instCoeHist = { coe := fun (s : σ) => MDPs.Hist.init s }
The length of the history corresponds to the zero-based step of the decision
Equations
- (MDPs.Hist.init a).length = 0
- (h.foll a a_1).length = 1 + h.length
Instances For
Nonempty histories
Equations
- MDPs.HistNE m = { h : MDPs.Hist m // h.length ≥ 1 }
Instances For
Returns the last state of the history
Equations
- (MDPs.Hist.init a).last = a
- (h.foll a a_1).last = a_1
Instances For
Equations
- MDPs.Hist.one s₀ a s = (MDPs.Hist.init s₀).foll a s
Instances For
Return the prefix of hist of length k
Equations
- MDPs.Hist.prefix k (MDPs.Hist.init a) = MDPs.Hist.init a
- MDPs.Hist.prefix k (h_1.foll a a_1) = if h_1.length + 1 ≤ k then h_1.foll a a_1 else MDPs.Hist.prefix k h_1
Instances For
Equations
- MDPs.tuple2hist (h, as) = ⟨h.append as, ⋯⟩
Instances For
Equations
- MDPs.hist2tuple ⟨h.foll a s, property⟩ = (h, a, s)
Instances For
Equations
- MDPs.emb_tuple2hist_l1 = { toFun := MDPs.tuple2hist, inj' := ⋯ }
Instances For
Equations
- MDPs.emb_tuple2hist = { toFun := fun (x : MDPs.Hist M × α × σ) => ↑(MDPs.tuple2hist x), inj' := ⋯ }
Instances For
Equations
Instances For
Equations
- MDPs.hist2state (MDPs.Hist.init a) = a
- MDPs.hist2state (h.foll a a_1) = a_1
Instances For
Equations
- MDPs.state2hist_emb = { toFun := MDPs.state2hist, inj' := ⋯ }
Instances For
Checks if the first hist is the prefix of the second hist.
Equations
- One or more equations did not get rendered due to their size.
- MDPs.isprefix (MDPs.Hist.init s₁) (MDPs.Hist.init s₂) = (s₁ = s₂)
- MDPs.isprefix (MDPs.Hist.init s₁) (hp.foll a a_1) = MDPs.isprefix (MDPs.Hist.init s₁) hp
- MDPs.isprefix (a.foll a_1 a_2) (MDPs.Hist.init a_3) = False
Instances For
All histories of additional length t that follow history h
Equations
- MDPs.Histories h Nat.zero = {h}
- MDPs.Histories h t.succ = Finset.map MDPs.emb_tuple2hist (MDPs.Histories h t ×ˢ M.A ×ˢ M.S)
Instances For
All histories of a given length
Equations
Instances For
Equations
- MDPs.instCoeSubtypeMemFinsetAPolicyHROfDecidableEq = { coe := fun (a : { x : α // x ∈ M.A }) (x : MDPs.Hist M) => Finprob.dirac_dist M.A a }
Equations
- MDPs.instCoeSubtypeMemFinsetADecisionRuleOfDecidableEq = { coe := fun (a : { x : α // x ∈ M.A }) (x : σ) => a }
A deterministic stationary policy
Equations
Instances For
Equations
- MDPs.instCoePolicySDPolicyHROfDecidableEq = { coe := fun (d : MDPs.PolicySD M) (h : MDPs.Hist M) => Finprob.dirac_dist M.A (d h.last) }
Probability distribution over histories induced by the policy and transition probabilities
Equations
- One or more equations did not get rendered due to their size.
- MDPs.HistDist h π Nat.zero = Finprob.dirac_ofsingleton h
Instances For
Reward of a history
Equations
- MDPs.reward (MDPs.Hist.init a) = 0
- MDPs.reward (h.foll a a_1) = M.r h.last a a_1 + MDPs.reward h
Instances For
Reward at a specific position; 0-based
Equations
- MDPs.reward_at i (MDPs.Hist.init a) = 0
- MDPs.reward_at i (h.foll a a_1) = if h.length = i then M.r h.last a a_1 else MDPs.reward_at i h
Instances For
Sum of rewards from a specific position to the end
Equations
- MDPs.reward_to j (MDPs.Hist.init a) = 0
- MDPs.reward_to j (h.foll a a_1) = if h.length ≤ j then M.r h.last a a_1 + MDPs.reward h else MDPs.reward_to j h
Instances For
Sum of rewards from a specific position to the end
Equations
- MDPs.reward_from j (MDPs.Hist.init a) = 0
- MDPs.reward_from j (h.foll a a_1) = if h.length ≥ j then M.r h.last a a_1 + MDPs.reward_from j h else 0
Instances For
The probability of a history
Equations
- MDPs.prob_h h π T h' = (MDPs.Δℋ h π T).prob.p ↑h'
Instances For
Probability of a boolean event
Equations
- MDPs.probability_h h π T B = ℙ[{ val := B }]
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Expectation over histories for a r.v. X for horizon T and policy π
Equations
- MDPs.expect_h h π T X = 𝔼[{ val := X }]
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Condtional expectation over histories for a r.v. X for horizon T and policy π
Equations
- MDPs.expect_h_cnd h π T X B = 𝔼[{ val := X }|{ val := B }]
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
The k-th state of a history. The initial state is state 0.
Equations
- MDPs.state k (MDPs.Hist.init a) = a
- MDPs.state k (h_1.foll a a_1) = if (h_1.foll a a_1).length = k then a_1 else MDPs.state k h_1
Instances For
The k-th action of a history. The first action is action 0.
Equations
- MDPs.action k (MDPs.Hist.init a) = default
- MDPs.action k (h_1.foll a a_1) = if (h_1.foll a a_1).length = k then a else MDPs.action k h_1
Instances For
Random variable on histories sans distribution (policy dependent)
Equations
- MDPs.Histrv M = (MDPs.Hist M → ℝ)
Instances For
Equations
- MDPs.instCoeRtoRV = { coe := fun (c : ℝ) (x : MDPs.Hist M) => c }
Equations
- MDPs.instHAddHRV = { hAdd := fun (a b : MDPs.Histrv M) (h : MDPs.Hist M) => a h + b h }
Equations
- MDPs.instHAddRVRV = { hAdd := fun (a : ℝ) (b : MDPs.Histrv M) (h : MDPs.Hist M) => a + b h }
Expected return can be expressed as a sum of expected rewards
Equations
- MDPs.rew_sum h = ∑ k ∈ Finset.range h.length, M.r (MDPs.state k h) (MDPs.action k h) (MDPs.state (k + 1) h)
Instances For
Sum of rewards with start (b) and end (e) (is exclusive)
Equations
- MDPs.rew_sum_rg b e h = ∑ k ∈ Finset.range (e - b), M.r (MDPs.state (b + k) h) (MDPs.action (b + k) h) (MDPs.state (b + k + 1) h)
Instances For
Expected return can be expressed as a sum of expected rewards
When the random variable beyond a point does not matter, cut the horizon's expectation