网站首页  英汉词典

请输入您要查询的英文单词:

 

单词 Policy iteration
释义

Policy iteration

中文百科

马可夫决策过程 Markov decision process

(重定向自Policy iteration)

在概率论和统计学中,马可夫决策过程英语:Markov Decision Processes,缩写为 MDPs)提供了一个数学架构模型,用于面对部份随机,部份可由决策者控制的状态下,如何进行决策,以俄罗斯数学家安德雷·马尔可夫的名字命名。在经由动态规划与强化学习以解决最佳化问题的研究领域中,马可夫决策过程是一个有用的工具。

马尔可夫过程在概率论和统计学方面皆有影响。一个通过不相关的自变量定义的随机过程,并(从数学上)体现出马尔可夫性质,以具有此性质为依据可推断出任何马尔可夫过程。实际应用中更为重要的是,使用具有马尔可夫性质这个假设来创建模型。在建模领域,具有马尔可夫性质的假设是向随机过程模型中引入统计相关性的同时,当分支增多时,允许相关性下降的少有几种简单的方式。

英语百科

Markov decision process 马可夫决策过程

(重定向自Policy iteration)
Example of a simple MDP with three states and two actions.
policy-iteration

Markov decision processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s (cf. Bellman 1957). A core body of research on Markov decision processes resulted from Ronald A. Howard's book published in 1960, Dynamic Programming and Markov Processes. They are used in a wide area of disciplines, including robotics, automated control, economics, and manufacturing.

随便看

 

英汉网英语在线翻译词典收录了3779314条英语词汇在线翻译词条,基本涵盖了全部常用英语词汇的中英文双语翻译及用法,是英语学习的有利工具。

 

Copyright © 2004-2024 encnc.com All Rights Reserved
更新时间:2025/6/20 3:50:15