Cellular networks can provide pervasive data access for smartphones, but also consume lots of energy, because the cellular interface has to stay in high power state for a long time (called long tail problem) after a data transmission. In this paper, we propose to reduce the tail energy by aggregating the data traffic of multiple nodes using their P2P interfaces. This traffic aggregation problem is formalized as finding the best task schedule to minimize energy. We first propose an A* search algorithm, which can reduce the search space for finding the optimal schedule offline, and then introduce an online traffic aggregation algorithm. We have implemented the online traffic aggregation algorithm on Android smartphones, and have built a small testbed. Trace-driven simulations and Experimental results show that our traffic aggregation algorithm can significantly reduce the energy and delay.