July 1st, 2025 - June 30th, 2030
Categories: Applications, Software, Visualization, Data Science, Computer Vision, Image Processing
UIC CS Assistant Professor Wei Tang has received the NSF CAREER Award for his proposed research “Compositional Learning and Understanding of the Physical World”. $539,051 has been awarded for the five-year project period July 2025 to June 2030.
Tan’s research project aims to develop a computer vision framework that learns and understands the physical world in a compositional manner, offering two significant benefits. First, a compositional interpretation of objects and scenes enables intelligent systems to engage in richer physical interactions and accomplish more complex tasks. Second, by decomposing complex entities into simpler constituents and modeling their relationships, this compositional approach addresses fundamental challenges faced by purely data-driven methods, including data inefficiency, curse of dimensionality, and limited explainability. The outcomes of this project will impact a wide range of emerging applications, including robots that support manufacturing or assist with daily tasks, autonomous vehicles that enhance mobility and safety, and virtual or augmented reality interfaces that facilitate assistive workflows and remote collaboration. This project will tightly integrate research and education through curriculum development, research training for high school, undergraduate, and graduate students, and community outreach.
This project will develop new methodologies for learning and understanding the innate compositionality of objects and scenes in the physical world. It consists of three innovative thrusts. Thrust I aims to establish a unified framework for representing, parsing, and learning the compositionality of physical objects, through disentangled modeling of large shape variations, constituent parts, and detailed deformations of each part as multi-granularity neural fields. Thrust II aims to develop a new compositional model that parses 3D dynamic scenes from streaming video into an explainable layout graph on the fly, by constructing distributed representations of low-level geometry and motion and performing explicit reasoning about high-level scene compositionality. Thrust III will extend the first two thrusts by modeling the compositionality of generic articulated objects and investigating test-time adaptation for 3D dynamic scene parsing. Distinct from purely data-driven methods, this new compositional paradigm reduces reliance on extensive 3D annotations, naturally handles the high dimensionality of geometry and motion, and enables a deeper, more explainable understanding of the physical world. This project will advance and enrich fundamental research in visual compositionality, physical object and scene understanding, and explainable parsing.