Abstract
This manuscript presents VisionEV, a multimodal large language model (LLM) framework designed to predict electric vehicle (EV) charging demand by leveraging satellite imagery and structured textual data. Spatial context—including parking availability, land use, and nearby amenities—is critical for accurate demand estimation. However, site selection in EV infrastructure planning remains both labor-intensive and inconsistent, requiring human experts to conduct in-person audits and manually define spatial features. To overcome these limitations, VisionEV introduces an automated spatial reasoning pipeline that integrates satellite imagery of candidate locations as visual inputs, allowing the model to learn nuanced spatial patterns directly from imagery, without relying on predefined descriptors. Complementary station-level attributes, including traffic flow and temporal indicators, are embedded into domain-informed textual prompts to simulate planner reasoning. A core technical challenge lies in enabling coherent reasoning across semantically distinct inputs—structured textual data and perceptual visual context. VisionEV addresses this by reformulating the task as multimodal text generation, aligning both modalities within a shared embedding space through vision-informed prompting and lightweight domain-adaptive fine-tuning. We evaluate VisionEV using a real-world dataset of 22,852 training samples and 2,858 test samples collected from 189 public EV charging stations in Kansas City, Missouri. In the full-shot setting, VisionEV achieves superior accuracy (RMSE: 2.87, MAE: 1.98), outperforming the strongest baseline, LightGBM, by 1.0% and 5.3%, respectively. Few-shot, within-city zero-shot and cross-region spatial hold-out experiments demonstrate VisionEV's ability to generalize across unseen scenarios, and ablation studies confirm the contributions of visual input, prompt design, and fine-tuning. These results underscore the promise of multimodal LLMs in supporting scalable, data-driven EV infrastructure planning through automated spatial understanding.
| Original language | English (US) |
|---|---|
| Article number | 105069 |
| Journal | Transportation Research Part D: Transport and Environment |
| Volume | 150 |
| DOIs | |
| State | Published - Jan 2026 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 15 Life on Land
All Science Journal Classification (ASJC) codes
- Civil and Structural Engineering
- Transportation
- General Environmental Science
Fingerprint
Dive into the research topics of 'VisionEV: multimodal large language models for spatially aware electric vehicle charging demand prediction using satellite imagery'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver