Device level heterogeneity promises high energy efficiency over a larger range of voltages than a single device technology alone can provide. In this paper, starting from device models, we first present ground-up modeling of CMOS and TFET cores, and verify this model against existing processors. Using our core models, we construct a 32-core TFET-CMOS heterogeneous multicore. We then show that it is a very challenging task to identify the ideal runtime configuration to use in such a heterogeneous multicore, which includes finding the best number/type of cores to activate and the corresponding voltages/frequencies to select for these cores. In order to effectively utilize this heterogeneous processor, we propose a novel automated runtime scheme. Our scheme is designed to automatically improve the performance of applications running on heterogeneous CMOS-TFET multicores operating under a fixed power budget, without requiring any effort from the application programmer or the user. Our scheme combines heterogeneous thread-to-core mapping, dynamic work partitioning, and dynamic power partitioning to identify energy efficient operating points. With simulations we show that our runtime scheme can enable a CMOS-TFET multicore to serve a diversity of workloads with high energy efficiency and achieve 21% average speedup over the best performing equivalent homogeneous multicore.