A model-based approximate λ-policy iteration approach to online evasive path planning and the video

来源 :Journal of Control Theory and Applications | 被引量 : 0次 | 上传用户:zhubajie527
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
This paper presents a model-based approximate λ-policy iteration approach using temporal differences for optimizing paths online for a pursuit-evasion problem,where an agent must visit several target positions within a region of interest while simultaneously avoiding one or more actively pursuing adversaries.This method is relevant to applications,such as robotic path planning,mobile-sensor applications,and path exposure.The methodology described utilizes cell decomposition to construct a decision tree and implements a temporal difference-based approximate λ-policy iteration to combine online learning with prior knowledge through modeling to achieve the objectives of minimizing the risk of being caught by an adversary and maximizing a reward associated with visiting target locations.Online learning and frequent decision tree updates allow the algorithm to quickly adapt to unexpected movements by the adversaries or dynamic environments.The approach is illustrated through a modified version of the video game Ms.Pac-Man,which is shown to be a benchmark example of the pursuit-evasion problem.The results show that the approach presented in this paper outperforms several other methods as well as most human players. This paper presents a model-based approximate λ-policy iteration approach using temporal differences for optimizing paths online for a pursuit-evasion problem, where an agent must visit several target places within a region of interest while simultaneously avoiding one or more aggressive pursuing adversaries. This method is relevant to applications, such as robotic path planning, mobile-sensor applications, and path exposure. The method described in cell decomposition to construct a decision tree and implements a temporal difference-based approximate λ-policy iteration to combine online learning with prior knowledge through modeling to achieve the objective of minimizing the risk of being caught by an adversary and maximizing a reward associated with visiting target locations. Online learning and frequent decision tree updates allow the algorithm to quickly adapt to unexpected movements by the adversaries or dynamic environments The approach is illustrated through a modified ve rsion of the video game Ms.Pac-Man, which is shown to be a benchmark example of the pursuit-evasion problem.The results show that the approach presented in this paper outperforms several other methods as well as most human players.
其他文献
随着经济社会的不断发展,城乡之间的差距正在不断缩小,人民群众对文化的需求也在日益增长.这也为县级媒体的进一步发展提供了时机.然而,由于县级媒体受到人才、技术、设备等
广播电视台作为传统的新闻媒介,在新媒体视域下如何发挥好播音主持在新闻媒体中的作用是当前的重要问题,对播音主持也提出了更高的要求,不仅要求播音主持要对新闻内容进行正
壳聚糖作为自然界中唯一的碱性多糖,由甲壳素脱乙酰制得,具有良好的生物相容性,生物可降解性,而且能够止血和抗菌。壳聚糖骨架上含有丰富的官能团-羟基和氨基使其极其容易进行化
本文通过对荣华二采区10
摘要:就目前而言,网络群体言论已经成为了一股对网络舆情的发展走向以及价值倾向有着严重影响的负面能量,它能够对主流话语的影响力起到弱化作用,不仅如此,还会进一步形成偏激的社会心态以致于社会情绪被过度宣泄,这严重影响着政治稳定和社会和谐。因此,为了改善这种情况,就应该将公众的网络表达朝正确方向进行引导,建立健全网络舆情监管机制,打造一个健康的网络舆论环境,实现网络舆情健康发展的目标。  关键词:网络群
目的 观察和比较刺山柑果风湿止痛贴和温和灸对佐剂性关节炎大鼠滑膜组织免疫蛋白HSP70和PPARγ表达的影响,探讨其可能的免疫调节作用机制.方法 50只大鼠随机分为正常组、模
摘要:媒体融合的优势在哪里,其作用如何彰显,怎样才能从传统媒体的思维和禁锢中解放出来,实现完美转型,从而更好地引导群众、服务群众,这一直是县(市)融媒体中心不断探索和思考的课题。本文以广南县融媒体中心重大事件(主题)宣传为例,就媒体融合的优势进行了论述,以期坚定对县级融媒体中心建设的信心,加快县级融媒体中心的建设。  关键词:重大事件(主题)宣传;媒体融合;优势  一、重大事件(主题)宣传,迫切需
在新媒体与传统媒体互相融合的大背景下,迅速发展起来的融媒体给视觉新闻的创新发展带来了机遇.本文就摄影记者在追求新闻性、时效性、真实性的同时,该怎样提高视觉新闻影影
“沉浸式体验新闻”是引导记者和主持人从办公楼“走出去”,深入基层“走下去”,面对群众“走进去”切实践行“四力”要求的最有效路径.本文论述了《主播在现场》专栏创作的
习近平总书记强调“垃圾分类工作就是新时尚”.垃圾分类,看似小事情,实则大文明.rn让我们从自身做起,从家庭做起、从小事做起、从现在做起,自觉分类投放在学习、生活等场所中
期刊