Collaborative Research: Origins of Southeast Asian Rainforests from Paleobotany and Machine Learning

Project: Research project

Project Details


Fossil leaves are the most abundant record of ancient plant life and millions of specimens are contained in museum collections around Fossil leaves are the most abundant record of ancient plant life, and millions of specimens are contained in museum collections around the world, with more discoveries every year. Nevertheless, leaf fossils alone currently provide limited information about the evolution of regional and global plant communities because individual leaf characteristics from a single plant species can vary widely, and detailed, time-consuming examination of each leaf fossil might still not connect it to its true biological family. This project addresses the problem in two ways. First will be the development of the Virtual Paleobotany Assistant (VPA), an artificial intelligence tool that will use machine learning techniques to rapidly analyze leaf characteristics to assign individual fossils to plant families and orders. The VPA, together with more traditional methods of paleobotany, will then be used to interpret the origins of the incredibly diverse tropical rain forests that now exist in Southeast Asia. These plant communities evolved during times of major continental movements and have connections to the former supercontinent of Gondwana, the Indian subcontinent, and Eurasia. Ascertaining the evolutionary and biogeographic pathways that led to the assembly of these tropical forests will help in preserving this important natural resource as the regional human population burgeons. The VPA will be made freely available on the internet and mobile platforms, enabling paleobotanists around the world to make discoveries far beyond this project. The unique collaboration between paleontologists and machine-learning experts will create extremely fertile ground for interdisciplinary advances, while catalyzing new international partnerships and student opportunities.

The project addresses two of the most difficult challenges in paleobotany, fossil leaf identification and the fossil history of Southeast Asian (Malesian) rainforests. Decoding the biological affinities of leaf fossils holds central significance for the improved knowledge of plant evolution, biogeography, and paleoclimate. This project will use deep learning on image databases of extant and fossil leaves to develop the first application (the Virtual Paleobotany Assistant, VPA) for computer-assisted identifications of leaf fossils to plant families and orders. The living floras of Southeast Asia are composed of a stunningly complex juxtaposition of plant lineages that diversified after arriving from disparate sources, including Gondwana (fossils to be studied in Patagonia and Australia), the Indian Plate (India and Pakistan), and Eurasia (South China, Indochina, Malay Archipelago). However, the diverse biogeographic components remain poorly understood due to limited paleobotanical data in many of the source areas. Many widely cited hypotheses are weakly corroborated from fossils; paleobotany and machine vision will coordinate to reveal the identities of fossil plants, correlate them to the geologic time scale, and re-interpret Malesia's floristic history. The influx of new paleobotanical data will test fundamental hypotheses about the relative contributions to Southeast Asian rainforest floras from different source areas.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Effective start/end date7/15/196/30/24


  • National Science Foundation: $1,412,820.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.