A variety of emerging applications in medical ultrasound rely on 3D volumetric imaging, calling for dense 2D transducer arrays with thousands of elements. Due to this high channel count, the traditional per-element cable interface used for 1D arrays is no longer viable. To address this issue, recent work has proven the viability of flip-chip bonding  or direct transducer integration . This shifts the burden to a CMOS substrate, which must provide dense signal conditioning and processing before the massively parallel image data can be pushed off chip. A common approach for data reduction is to employ subarray beamforming (BF), which applies delay and sum operations within a group of pixels. To implement such functionality within the tight pixel pitch, prior works have implemented the delays using simple S/H circuits  or analog filters , and typically suffer from a combination of issues related to limited delay, coarse delay resolution and limited SNR.