In this paper we describe a methodology for utilizing team mental models as a basis for evaluating the usa-bility and utility of collaborative systems. We present a case study of the evaluation of team mental models within the NeoCITIES 3.1 simulation. Paired comparison ratings, which are one of the most popularly used methods of team mental model assessment, were used to capture the team members' taskwork-related knowledge, which was then compared to ratings from subject matter experts. These analyses were the driv-ing force behind several design modifications in the NeoCITIES interface. We discuss the limitations of the method and its implications within the scope of collaborative systems evaluation and the field of HCI.