Existing measures for evaluating the performance of tracking algorithms are difficult to interpret, which makes it hard to identify the best approach for a particular situation. As we show, a dummy algorithm which does not actually track scores well under most existing measures. Although some measures characterize specific error sources quite well, combining them into a single aggregate measure for comparing approaches or tuning parameters is not straightforward. In this work we propose 'mean time between failures' as a viable summary of solution quality - especially when the goal is to follow objects for as long as possible. In addition to being sensitive to all tracking errors, the performance numbers are directly interpretable: how long can an algorithm operate before a mistake has likely occurred (the object is lost, its identity is confused, etc.)? We illustrate the merits of this measure by assessing solutions from different algorithms on a challenging dataset.