当前位置: 首页 讲座报告 讲座 正文
Mean-Variance Optimization and Algorithm for Finite-Horizon Markov Decision Processes

发布日期:2024年11月13日 10:40浏览次数:

主讲人:夏俐教授

地点:经管西楼400会议室

主办方:万象城awcsport官网(邀请人:吴伟平)

开始时间:2024-11-15 09:30:00

结束时间:2024-11-15 11:30:00

报告题目:Mean-Variance Optimization and Algorithm for Finite-Horizon Markov Decision Processes

报告摘要:Multi-period mean-variance optimization is a long-standing problem, caused by the failure of dynamic programming principle. This paper studies the mean-variance optimization in a setting of finite-horizon discrete-time Markov decision processes (MDPs), where the objective is to maximize the combined metric of mean and variance of the accumulated rewards at terminal stage, By introducing the concepts of pseudo mean and pseudo variance, we convert the original mean-variance MDP problem to a bilevel optimization problem, where the outer is a single parameter optimization of the pseudo mean and the inner is a standard finite-horizon MDP with an augmented state space by adding an auxiliary state of accumulated rewards. We further study the property of this bilevel optimization problem, including the optimality of deterministic history-dependent policies and the piecewise quadratic concavity of the optimal values of inner MDPs with respect to the outer parameter. To efficiently solve this bilevel optimization problem, we propose an iterative algorithm that alternatingly updates the inner optimal policy and the outer pseudo mean, We prove that this algorithm converges to a local optimum. We also derive a sufficient condition under which our algorithm converges to the global optimum. Furthermore, we apply this approach to study the mean-variance optimization of multi-period portfolio selection problem, which shows that our approach exactly coincides with the classical result by Li and Ng (2000) in financial engineering. Our approach builds a new avenue to solve mean-variance optimization problems and has wide applicability to any problem modeled by MDPs, which is further demonstrated by examples of mean-variance optimization for queueing control and inventory management.

报告人简介:夏俐,中山大学管理学院教授。分别于2002年和2007年在清华大学自动化系获得学士和博士学位,2011年至2019年在清华大学自动化系任教,历任讲师、副教授(博士生导师),2019年调入中山大学。主要研究方向为马氏决策过程、强化学习、排队论、随机博弈等理论研究,以及在能源、金融等领域的应用研究。发表论文100余篇,获得10余项中国和美国发明专利,主持5项国家自然科学基金项目等。担任IEEE Transactions on Automation Science and Engineering、Discrete Event Dynamic Systems等国际权威SCI期刊的副主编(AE)等学术兼职。


关闭