Many embedded/portable applications from image and video processing domains are characterized by spending a large fraction of their energy in executing load/store instructions that access off-chip memory. Although most performance-oriented locality optimization techniques reduce the number of memory instructions and, consequently, improve memory energy consumption, we also need to consider energy-oriented approaches if we are to improve energy behavior further. Our focus in this paper is on a system with multiple homogeneous processors and a multi-bank memory architecture that process large arrays of signals. To reduce energy consumption in such a system, we use a compiler-based approach which exploits low-power operating modes. In such an architecture, one of the major problems is to address the conflicting requirements of maximizing parallelism and reducing energy consumption. This conflict arises because maximizing parallelism requires independent concurrent accesses to different memory banks, whereas reducing energy consumption implies limiting the accesses at a given period of time to a small set of memory banks (so that the remaining banks can be placed into a low-power operating mode). Our approach consists of three complementary steps, namely, parallel access pattern detection, array allocation across memory banks, and data layout transformations. Our preliminary results indicate that our approach leads to significant off-chip memory energy savings without sacrificing the available parallelism.