TY - GEN

T1 - On the complexity of finite memory policies for markov decision processes

AU - Beauquier, Danièle

AU - Burago, Dima

AU - Slissenko, Anatol

N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 1995.

PY - 1995

Y1 - 1995

N2 - We consider some complexity questions concerning a model of uncertainty known as Markov decision processes. Our results concern the problem of constructing optimal policies under a criterion Of optimality defined in terms of constraints on the behavior of the process. The constraints are described by regular languages, and the motivation goes from robot motion planning. It is known that, in the case of perfect information, optimal policies under the traditional cost criteria can be found among Markov policies and in polytime. We show, firstly, that for the behavior criterion optimal policies are not Markovian for finite as well as infinite horizon. On the other hand, optimal policies in this case lie in the class of finite memory policies defined in the paper, and can be found in polytime. We remark that in the case of partial information, finite memory policies cannot be optimal in the general situation. Nevertheless, the class of finite memory policies seems to be of interest for probabilistic policies: though probabilistic policies are not better than deterministic ones in the general class of history remembering policies, the former ones can be better in the class of finite memory policies.

AB - We consider some complexity questions concerning a model of uncertainty known as Markov decision processes. Our results concern the problem of constructing optimal policies under a criterion Of optimality defined in terms of constraints on the behavior of the process. The constraints are described by regular languages, and the motivation goes from robot motion planning. It is known that, in the case of perfect information, optimal policies under the traditional cost criteria can be found among Markov policies and in polytime. We show, firstly, that for the behavior criterion optimal policies are not Markovian for finite as well as infinite horizon. On the other hand, optimal policies in this case lie in the class of finite memory policies defined in the paper, and can be found in polytime. We remark that in the case of partial information, finite memory policies cannot be optimal in the general situation. Nevertheless, the class of finite memory policies seems to be of interest for probabilistic policies: though probabilistic policies are not better than deterministic ones in the general class of history remembering policies, the former ones can be better in the class of finite memory policies.

UR - http://www.scopus.com/inward/record.url?scp=84947940603&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947940603&partnerID=8YFLogxK

U2 - 10.1007/3-540-60246-1_125

DO - 10.1007/3-540-60246-1_125

M3 - Conference contribution

AN - SCOPUS:84947940603

SN - 3540602461

SN - 9783540602460

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 191

EP - 200

BT - Mathematical Foundations of Computer Science 1995 - 20th International Symposium, MFCS 1995, Proceedings

A2 - Wiedermann, Jiri

A2 - Hajek, Petr

PB - Springer Verlag

T2 - 20th International Symposium on Mathematical Foundations of Computer Science, MFCS 1995

Y2 - 28 August 1995 through 1 September 1995

ER -