Date: 8am, June 12, 2025
Location: Room 107B
The integration of computer vision and deep learning into agriculture has gained significant traction, with advancements in multi-modal computer vision foundation models driving innovation in agricultural applications. This tutorial will introduce attendees to the latest developments, focusing on multi-modal sensor fusion, domain adaptation, and the application of foundation models in agricultural tasks.
The tutorial features leading experts who will provide insights into cutting-edge research and practical implementations. Through presentations and discussions, attendees will gain knowledge about the latest AI trends in multi-modal computer vision and foundation models applied to agriculture.
Half-Day Tutorial – CVPR 2025
8:30 AM – Opening Remarks & Introductions
8:45 AM – 9:45 AM: Dr. Melba Crawford
Multi-modal sensor fusion with Computer Vision models for multi-temporal yield prediction
ABSTRACT: Acquisition of remotely sensed data has exploded in recent years with platforms and sensors ranging from satellites to proximal technologies, and from RGB to multi/hyperspectral imaging and LiDAR over multiple spatial scales, providing new opportunities to both leverage and advance computer vision-based modeling. This tutorial will provide an overview of platforms and sensor technologies, including multiple modalities that currently acquire data for data-driven models. An overview of achievements, challenges, and future opportunities for applying and advancing vision-based modeling will be demonstrated in the context of a case study for predicting yield of maize in plant breeding and management practice trials
9:45 AM – 10:00 AM: Coffee Break & Networking
10:00 AM – 11:00 AM: Dr. Alex Schwing
Foundations of Foundation Models
ABSTRACT: Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. This tutorial will highlight this by providing an overview of foundation models, including CLIP, SAM, Dino, Text-2-Image/Video, and LLMs in general. We cover the underlying ideas, the architectures, and some of the observed results. We will also discuss recent trends, including multi-modal LLMs. By summarizing recent advances and trends, we hope this tutorial will provide an overview and inspiring insights into the fast evolving research landscape of foundation models and its applications.
11:00 AM – 12:00 PM: Dr. Soumik Sarkar
Multi-modal Foundational Models in Agriculture
ABSTRACT: In this tutorial, Dr. Sarkar will present a few recent case studies on the development and evaluation of Ag foundation models that integrate multi-modal and multi-platform data—including RGB imagery, textual descriptions, hyperspectral data, weather patterns, and soil conditions—for a variety of agricultural applications such as pest and disease monitoring and yield prediction. He will also share best practices, lessons learned, and practical insights regarding data curation, computational requirements, and model evaluation.
By the end of this session, attendees will:
✅ Gain an understanding of multi-modal sensor fusion techniques for yield prediction.
✅ Learn about foundation models used for identifying pests, weeds, and crop health analysis.
✅ Explore recent advancements in video object segmentation and instance segmentation applied to agriculture.
✅ Connect with leading researchers and practitioners in the field of agricultural AI.
This tutorial is designed for researchers, engineers, and practitioners in computer vision, AI, and agriculture. It aims to bridge the gap between cutting-edge AI research and real-world agricultural challenges, attracting professionals and students interested in applying AI to high-impact agricultural problems.
This tutorial prioritizes diversity in expertise, gender, and background, bringing together experts from academia and industry to promote inclusivity in AI and agriculture.