Packet-based on-chip networks are increasingly being adopted in complex System-on-Chip (SoC) designs supporting numerous homogeneous and heterogeneous functional blocks. These Network-on-Chip (NoC) architectures are required to not only provide ultra-low latency, but also occupy a small footprint and consume as little energy as possible. Further, reliability is rapidly becoming a major challenge in deep sub-micron technologies due to the increased prominence of permanent faults resulting from accelerated aging effects and manufacturing/testing challenges. Towards the goal of designing low-latency, energy-efficient and reliable on-chip communication networks, we propose a novel fine-grained modular router architecture. The proposed architecture employs decoupled parallel arbiters and uses smaller crossbars for row and column connections to reduce output port contention probabilities as compared to existing designs. Furthermore, the router employs a new switch allocation technique known as "Mirroring Effect" to reduce arbitration depth and increase concurrency. In addition, the modular design permits graceful degradation of the network in the event of permanent faults and also helps to reduce the dynamic power consumption. Our simulation results indicate that in an 8 × 8 mesh network, the proposed architecture reduces packet latency by 4-40% and power consumption by 6-20% as compared to two existing router architectures. Evaluation using a combined performance, energy and fault-tolerance metric indicates that the proposed architecture provides 35-50% overall improvement compared to the two earlier routers.