Button to scroll to the top of the page.

Events

Monthly View
By Month
Weekly View
By Week
Daily View
Today
Search
Search
Junior Applied Math
Download as iCal file
Harrison Waldon, Zoom: Temporal Difference Learning in Continuous Time and Space
Friday, February 04, 2022, 02:00pm - 03:00pm
Temporal difference (TD) learning is a workhorse algorithm for policy evaluation in reinforcement learning. Typically implemented on a Markov Decision Process, TD learning exploits the Bellman equation for the value function given a certain policy to update in an offline or online way the value of our current state using only observed reward. TD learning was extended to a continuous time and space setting in 2000 by Doya, where state dynamics are assumed to be deterministic. Doya's algorithm does not work, however, when the underlying state dynamics are assumed to be stochastic. In this talk, we will look at recent work by Jia and Zhou in which the authors show Doya's TD learning naively applied in the stochastic case leads not to an approximation of the true value process, but a smoothing of that process's quadratic variation. The authors then propose an analogue of TD learning which makes explicit use of the martingality of the value process and has the desired approximation properties. Here is a link to the paper to be discussed: https://arxiv.org/abs/2108.06655.
Zoom link: https://utexas.zoom.us/j/3511114068
Location: Zoom

Math Calendar Login