In the paper we consider the complexity of constructing optimal policies (strategies) for some type of partially observed Markov decision processes. This particular case of the classical problem deals with finite stationary processes, and can be represented as constructing optimal strategies to reach target vertices from a starting vertex in a graph with colored vertices and probabilis-tic deviations from an edge chosen to follow. The colors of the visited vertices is the only information available to a strategy. The complexity of Markov decision in the case of perfect information (bijective coloring of vertices) is known and briefly surveyed at the beginning of the paper. For the unobservable case (all the colors are equal) we give an improvement of the result of Papadimitriou and Tsitsiklis, namely we show that the problem of constructing even a very weak approximation to an optimal strategy is NP-hard. Our main results concern the case of a fixed bound on the multiplicity of coloring, that is a case of partially observed processes where some upper bound on the unobservability is supposed. We show that the problem of finding an optimal strategy is still NP-hard, but polytime approximations are possible. Some relations of our results to the Max-Word Problem are also indicated.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)