Podcast Guide
Cover art for The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Munawar Hayat

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Why Vision Language Models Ignore What They See with Munawar Hayat

Published
December 9, 2025
Duration
57:40
Summary source
description
Last updated
Apr 21, 2026

Discusses multimodal, generative-ai.

Summary

In this episode, we’re joined by Munawar Hayat, researcher at Qualcomm AI Research, to discuss a series of papers presented at NeurIPS 2025 focusing on multimodal and generative AI. We dive into the persistent challenge of object hallucination in Vision-Language Models (VLMs), why models often discard visual information in favor of pre-trained language pr…

Show notes

In this episode, we’re joined by Munawar Hayat, researcher at Qualcomm AI Research, to discuss a series of papers presented at NeurIPS 2025 focusing on multimodal and generative AI. We dive into the persistent challenge of object hallucination in Vision-Language Models (VLMs), why models often discard visual information in favor of pre-trained language priors, and how his team used attention-guided alignment to enforce better visual grounding. We also explore a novel approach to generalized cont

Themes

  • multimodal
  • generative-ai