Improving locality of data references is becoming increasingly important due to increasing gap between processor cycle times and off-chip memory access latencies. Improving data locality not only improves effective memory access time but also reduces memory system energy consumption due to data references. An optimizing compiler can play an important role in enhancing data locality in array-intensive embedded media applications with regular data access patterns. This paper presents a compiler-based data space-oriented tiling approach (DST). In this strategy, the data space (e.g., an array of signals) is logically divided into chunks (called data tiles) and each data tile is processed in turn. In processing a data tile, our approach traverses the entire iteration space of all nests in the code and executes all iterations (potentially coming from different nests) that access the data tile being processed. In doing so, it also takes data dependences into account. Since a data space is common across all nests that access it, DST can potentially achieve better results than traditional iteration space (loop) tiling by exploiting internest data locality. We also present an example application of DST for improving the effectiveness of a scratch pad memory (SPM) for data accesses. SPMs are alternatives to conventional cache memories in embedded computing world. These small on-chip memories, like caches, provide fast and low-power access to data; but, they differ from conventional data caches in that their contents are managed by compiler instead of hardware. We have implemented DST in a source-to-source translator and quantified its benefits using a simulator. Our preliminary results with several array-intensive applications and varying input sizes show that our approach outperforms classical iteration spaceoriented tiling as well as a data-oriented approach that considers each nest in isolation.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture