Security enforcement inlined into user threads often delays the protected programs; inlined resource reclamation may interrupt program execution and defer resource release. We propose software cruising, a novel technique that migrates security enforcement and resource reclamation from user threads to a concurrent monitor thread. The technique leverages the increasingly popular multicore and multiprocessor architectures and uses lock-free data structures to achieve non-blocking and efficient synchronization between the monitor and user threads. As a case study, software cruising is applied to the heap buffer overflow problem. Previous mitigation and detection techniques for this problem suffer from high performance overhead, legacy code compatibility, semantics loyalty, or tedious manual program transformation. We present a concurrent heap buffer overflow detector, Cruiser, in which a concurrent thread is added to the user program to monitor heap integrity, and custom lock-free data structures and algorithms are designed to achieve high efficiency and scalability. The experiments show that our approach is practical: it imposes an average of 5% performance overhead on SPEC CPU2006, and the throughput slowdown on Apache is negligible on average.