Visual object tracking can be considered as a figure-ground classification task. In this paper, different features are used to generate a set of likelihood maps for each pixel indicating the probability of that pixel belonging to foreground object or scene background. For example, intensity, texture, motion, saliency and template matching can all be used to generate likelihood maps. We propose a generic likelihood map fusion framework to combine these heterogeneous features into a fused soft segmentation suitable for mean-shift tracking. All the component likelihood maps contribute to the segmentation based on their classification confidence scores (weights) learned from the previous frame. The evidence combination framework dynamically updates the weights such that, in the fused likelihood map, discriminative foreground/background information is preserved while ambiguous information is suppressed. The framework is applied here to track ground vehicles from thermal airborne video, and is also compared to other state-of-the-art algorithms.