Processing-in-memory (PIM) architectures have gained significant importance as an alternative paradigm to the von-Neumann architectures to alleviate the memory wall and technology scaling problems. PIM architectures have achieved significant latency and energy consumption improvements for various emerging and widely used workloads such as deep neural networks, graph analytics, databases and computational genomics. In this work, we propose IMC-Sort, an in-memory parallel sorting architecture using the hybrid memory cube (HMC) for accelerating the sort workloads. Sort is one of the fundamental and widely used algorithm in various applications such as databases, networking, and data analytics. IMC-Sort architecture augments the hybrid memory cube memory system by incorporating custom sorting network at each of the HMC vault's logic layer. IMC-Sort uses optimized folded Bitonic sort and merge network to sort input sequences of arbitrary length at each vault and optimized address mapping mechanism to distribute the input data across HMC vaults. Merging of the sorted results across individual vaults is also performed using the vault's sorting network by communicating with other vaults through the HMC's crossbar network. Overall, IMC-Sort achieves 16.8×, 1.1× speedup and 375.5×, 13.6× savings in energy consumption compared to the widely used CPU implementation and state-of-the-art near memory custom sort accelerator respectively.