Latest Editions
18 FOUND
Last Week In Multimodal AI #38: From Clips to Worlds
Your Weekly Multimodal AI Roundup - Dec 15 - Dec 21, 2025

Last Week in Multimodal AI #36: Factual Recall, Real-Time Video
Your Weekly Multimodal AI Roundup - Dec 1 - Dec 7, 2025

Last Week in Multimodal AI #35: Small Models, Modular Vision
Week of Nov 24-30, 2025: Alibaba's 6B Z-Image impresses, Tencent's 1B HunyuanOCR beats larger models and APIs, VisionRAG uses 6-9x less memory than ColPali, and RynnVLA-002 boosts real-world robot success by 50%.

Multimodal Monday 33: Physical AI, Human Vision
Week of November 10 - November 16, 2025: Pelican-VL gives humanoid robots spatial intelligence, DeepMind teaches AI to see like humans, Marble creates 3D worlds from single images, and Meta opens speech recognition to 1,600+ languages.

Multimodal Monday 32: Multi-Query Retrieval, Streaming Video
Week of November 3 - November 9, 2025: AMER shows 4-21% gains on complex queries by generating multiple embeddings, Adobe MotionStream hits 29 fps with interactive motion controls, Step-Audio-EditX edits voice emotion and style through text prompts, and GEN-0 trains robots for general skills.

Multimodal Monday #31: Visual Thinking, Longer Video
Google Latent Sketchpad lets models sketch thoughts before acting, Amazon Nova MME unifies search, Emu3.5 matches Google's Nano Banana locally, BEAR reveals why AI fails physical tasks.

Multimodal Monday #30: Smarter Agents, Real-Time 3D
WALT and UltraCUA make websites API-smart, Seed3D 1.0 builds 3D assets from one image, DeepSeek-OCR compresses docs 10x with 97% accuracy via optical mapping and AGILE lifts VLM accuracy from 9.5% to 82.8% with interactive puzzles.

Multimodal Monday #29: Sampling Smarts, Composable Control
Week of October 13-19, 2025

Multimodal Monday #28: Diffusion Thinks, Retrieval Unifies
Multimodal Monday #28: Fast-dLLM v2 diffuses text 2.5x faster, Omni-Embed-Nemotron hunts across modalities, and Think-Then-Embed reasons to top MMEB-V2.

Multimodal Monday #27: Small Models Beat Giants
Multimodal Monday #27: ModernVBERT's 250M beats 10x larger, DocPruner slashes storage 60%, and Claude Sonnet 4.5 codes 30+ hours. Scale reimagined!

Multimodal Monday #26: Adaptive Retrieval, Visual Reasoning
Multimodal Monday #26: MetaEmbed scales retrieval on-the-fly, EmbeddingGemma beats giants with 308M params, and Veo3 develops reasoning.

Multimodal Monday #25: Mind Reading Meets Model Efficiency
AI reads intentions in video, Moondream delivers frontier performance at 2B params, Alibaba open-source matches OpenAI. Understanding "why" changes everything!

Multimodal Monday #24: Post-Training Prevails, Neural Rendering Rises
RecA boosts quality 17% with 27 GPU-hours, RenderFormer replaces graphics pipelines with transformers, and Lucy-14B delivers instant video. Alignment beats retraining!

Multimodal Monday #23: Efficiency Evolves, Agentic Advance
This week in Multimodal AI - August 25 - September 7, 2025

Multimodal Monday #22: Spatial Crisis, Trust Bottleneck
Week of August 18-24, 2025

Multimodal Monday #21: Multimodal Reality, Expert Breakthrough
Multimodal Monday #21: Text crushes visuals in recommendations, GPT-5 beats doctors by 24-29%, and Spotify's AI evaluates podcasts. AI surpasses human limits!

Multimodal Monday #20: Multimodal Myths, Generative Frontiers
Multimodal Monday #20: Study challenges multimodal hype, Genie 3 builds 3D from text, and TURA blends real-time data. The future demands targeted deployment!

Multimodal Monday #19: Chinese AI Surge, Open Source Wins
Multimodal Monday #19: Wan 2.2 rolls out with a week of daily feature releases, HairCUP refines 3D avatars, and E-FineR boosts recognition. Open source Chinese AI surges ahead!