Deep-submicron designs have to take care of process variation effects as variations in critical process parameters result in large variations in access latencies of hardware components. This is severe in the case of memory components as minimum sized transistors are used in their design. In this work, by considering on-chip data caches, we study the effect of access latency variations on performance. We discuss performance losses due to the worst-case design, wherein the entire cache operates with the worst-case process variation delay, followed by process variation aware cache designs which work at set-level granularity. We then propose a technique called block rearrangement to minimize performance loss incurred by a process variation aware cache which works at set-level granularity. Using block rearrangement technique, we rearrange the physical locations of cache blocks such that a cache set can have its "n" blocks (assuming a n-way set-associative cache) in multiple rows instead of a single row as in the case of a cache with conventional addressing scheme. By distributing blocks of a cache set over multiple sets, we minimize the number of sets being affected by process variation. We evaluate our technique using SPEC2000 CPU benchmarks and show that our technique achieves significant performance benefits over caches with conventional addressing scheme.