TY - JOUR
T1 - On the complexity of partially observed Markov decision processes
AU - Burago, Dima
AU - De Rougemont, Michel
AU - Slissenko, Anatol
N1 - Funding Information:
* C’orrespondence address: UniversitC Paris-12, Bit.P3, Informatque. hl, Ave. du GCnCral de Gaulle, Y40 IO CrPtcil, France. E-mail: sliss@litp.ibp.fr. The research of this author was partially supported by DRET contract 91/1061.
PY - 1996/5/5
Y1 - 1996/5/5
N2 - In the paper we consider the complexity of constructing optimal policies (strategies) for some type of partially observed Markov decision processes. This particular case of the classical problem deals with finite stationary processes, and can be represented as constructing optimal strategies to reach target vertices from a starting vertex in a graph with colored vertices and probabilis-tic deviations from an edge chosen to follow. The colors of the visited vertices is the only information available to a strategy. The complexity of Markov decision in the case of perfect information (bijective coloring of vertices) is known and briefly surveyed at the beginning of the paper. For the unobservable case (all the colors are equal) we give an improvement of the result of Papadimitriou and Tsitsiklis, namely we show that the problem of constructing even a very weak approximation to an optimal strategy is NP-hard. Our main results concern the case of a fixed bound on the multiplicity of coloring, that is a case of partially observed processes where some upper bound on the unobservability is supposed. We show that the problem of finding an optimal strategy is still NP-hard, but polytime approximations are possible. Some relations of our results to the Max-Word Problem are also indicated.
AB - In the paper we consider the complexity of constructing optimal policies (strategies) for some type of partially observed Markov decision processes. This particular case of the classical problem deals with finite stationary processes, and can be represented as constructing optimal strategies to reach target vertices from a starting vertex in a graph with colored vertices and probabilis-tic deviations from an edge chosen to follow. The colors of the visited vertices is the only information available to a strategy. The complexity of Markov decision in the case of perfect information (bijective coloring of vertices) is known and briefly surveyed at the beginning of the paper. For the unobservable case (all the colors are equal) we give an improvement of the result of Papadimitriou and Tsitsiklis, namely we show that the problem of constructing even a very weak approximation to an optimal strategy is NP-hard. Our main results concern the case of a fixed bound on the multiplicity of coloring, that is a case of partially observed processes where some upper bound on the unobservability is supposed. We show that the problem of finding an optimal strategy is still NP-hard, but polytime approximations are possible. Some relations of our results to the Max-Word Problem are also indicated.
UR - http://www.scopus.com/inward/record.url?scp=0030570119&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030570119&partnerID=8YFLogxK
U2 - 10.1016/0304-3975(95)00158-1
DO - 10.1016/0304-3975(95)00158-1
M3 - Article
AN - SCOPUS:0030570119
SN - 0304-3975
VL - 157
SP - 161
EP - 183
JO - Theoretical Computer Science
JF - Theoretical Computer Science
IS - 2
ER -