Music, robustness, and vision–language: ICML & NeurIPS highlights for Oore and collaborators
Sageev Oore and collaborators have had a standout period of research activity spanning music generation, robust learning, and vision–language understanding, with multiple major conference papers and a keynote talk.
At ICML 2024 (oral, top 1.5% of submissions), the team presented “Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion,” co-authored by Y. Huang, A. Ghatare, Y. Liu, Z. Hu, Q. Zhang, C. S. Sastry, S. Gururani, S. Oore, and Y. Yue. The work studies symbolic music generation—such as generating piano rolls—under non-differentiable musical rules on note density, harmony, or progression. They introduce Stochastic Control Guidance (SCG), a plug-and-play guidance method that needs only forward evaluation of rule functions and can be layered on top of pre-trained diffusion models. SCG supports rich rule sets without requiring gradients, enabling powerful and flexible rule-driven music generation.
In the robustness space, a NeurIPS 2024 paper titled “DiffAug: A Diffuse-and-Denoise Augmentation for Training Robust Classifiers” (with C. S. Sastry, S. H. Dumpala, and S. Oore) proposes a remarkably simple diffusion-based augmentation: training data are perturbed by a single forward diffusion step followed by a single reverse diffusion step. Using both ResNet-50 and Vision Transformer architectures, the authors show that this one-step “diffuse-and-denoise” procedure significantly improves robustness to covariate shifts, certified adversarial accuracy, and out-of-distribution detection. DiffAug is also data-efficient, allowing a classifier to be strengthened using a diffusion model trained only on the same dataset.
A companion NeurIPS 2024 contribution, “SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations,” introduces an expanded benchmark designed to probe how vision–language models respond when captions undergo controlled semantic and lexical changes. By systematically testing whether models notice subtle but meaningful differences between paired descriptions, SUGARCREPE++ helps reveal where current systems are robust—and where their understanding is still surprisingly brittle.
Beyond papers, Oore delivered a keynote talk at Canadian AI 2025 titled “Music, AI, & Us: Reflections on Musical Tools, Creativity, and Listening.” The keynote explored how AI systems can function as creative tools rather than replacements, and how they change the way we listen, compose, and collaborate. The talk included a live demonstration of the group’s newest interactive AI-music system, giving the audience a first-hand experience of AI-supported musical performance.