Abstract
Rising food demand, labor shortages, and field variability are pushing crop management toward robust, on-device computer vision systems. This review discussed the development of Vision Transformers and Vision Mamba (ViM) in the context of convolutional neural networks (CNNs), comparing their architectures and drawing their implications for the next generation of agricultural AI. CNNs still remain efficient in baselines for local pattern recognition in many simple agri-vision tasks, but struggle with long-range dependencies and complex canopy scenes. By introducing global context with self-attention, ViT improves robustness to occlusion, scale change, and multimodal fusion. ViM replaces self-attention with selective state-space sequence modeling, avoiding quadratic complexity and enabling linear-time scaling for long sequences and high-resolution imagery while maintaining competitive accuracy. A synthesis of 2020–2025 studies across disease monitoring, weed/pest control, fruit counting and yield estimation, and land/soil/water analysis indicates: (i) ViT is highly likely surpasses CNN baselines under cluttered, variable field conditions with sufficient data and appropriate training; (ii) ViM provides an effective accuracy-efficiency balance for edge and UAV platforms; and (iii) hybrid designs (e.g., CNN combines with ViT/ViM heads or ViM necks in detectors) are more reliable in low light and dense canopies. Persistent challenges included limited multi-site data diversity, sensitivity to illumination and occlusion, edge-device energy/latency constraints, and limited interpretability. A forward pathway was outlined: standardized domain-shift tests, energy/latency reporting, quantization/distillation for edge deployment, uncertainty-aware outputs, and principled multimodal fusion, which can lead ViM-centric or hybrid pipelines to practical next step for field-ready agri-vision.
| Original language | English (US) |
|---|---|
| Article number | 111551 |
| Journal | Computers and Electronics in Agriculture |
| Volume | 245 |
| DOIs | |
| State | Published - Apr 2026 |
All Science Journal Classification (ASJC) codes
- Forestry
- Agronomy and Crop Science
- Computer Science Applications
- Horticulture
Fingerprint
Dive into the research topics of 'What's next in agri-vision: A review of Vision Transformers and Vision Mamba in crop management'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver