AI Agents - Monthly

Last Week In AI Agents #3: From Mars to Moltbook

Quick Hits (TL;DR)

February 6, 202624 Resources
Last Week In AI Agents #3: From Mars to Moltbook
View Original on Substack
Quick Hits

Kimi K2.5: Visual Agentic Intelligence

* Open-source multimodal model from Moonshot AI, pretrained on ~15T mixed visual/text tokens. Vision and text improve together via native multimodal pretraining. * Agent Swarm: RL-trained orchestrator coordinating up to 100 sub-agents across 1,500 tool calls. 4.5x latency reduction vs. single-agent baselines. * ⚡ My take: Training an orchestrator via RL to avoid “serial collapse” is a fascinating contribution. Open-source makes it especially valuable. * Kimi Blog [https://www.kimi.com/blog/kimi-k2-5.html] | Hugging Face [https://huggingface.co/moonshotai] ----------------------------------------

#kimi#k25#visual+2
Quick Hits

Gemini 3 Flash: Agentic Vision

* Converts image understanding into an iterative Think → Act → Observe loop. The model plans, runs Python to crop/zoom/annotate, then re-inspects before answering. 5-10% boost across vision benchmarks. * ⚡ My take: Think-Act-Observe is becoming the standard agentic loop across modalities. Letting the model verify its own visual reasoning is a meaningful shift. * Google Blog [https://blog.google/innovation-and-ai/technology/developers-tools/agentic-vision-gemini-3-flash/] | MarkTechPost [https://www.marktechpost.com/2026/02/04/google-introduces-agentic-vision-in-gemini-3-flash-for-active-image-understanding/] ----------------------------------------

#gemini#flash#agentic+1
Quick Hits

Claude on Mars — First AI-Planned Rover Drive

* Claude generated waypoints for NASA’s Perseverance, which drove ~400m across Jezero Crater, the first AI-planned rover drive. Route validated via a digital twin simulating 500K+ telemetry variables. * Claude Code ingested years of JPL mission data, wrote commands in Rover Markup Language. Engineers estimate AI could cut planning time in half. * ⚡ My take: The same model people use to draft emails just drove a rover on Mars. Shows how domain context (28 years of mission data) transforms a general-purpose model into a specialized agent. * Anthropic [https://www.anthropic.com/features/claude-on-mars] | NASA JPL [https://www.jpl.nasa.gov/news/nasas-perseverance-rover-completes-first-ai-planned-drive-on-mars/] | Engadget [https://www.engadget.com/ai/nasa-used-claude-to-plot-a-route-for-its-perseverance-rover-on-mars-203150701.html] ----------------------------------------

#claude#mars#8212+3
Quick Hits

Gemini Auto Browse in Chrome

* “Auto Browse” lets Chrome browse, click, fill forms, and complete multi-step tasks on your behalf. Pauses for confirmation on purchases and social posts. * Also announced: Universal Commerce Protocol (UCP) for agentic commerce, co-developed with Shopify, Etsy, Wayfair, and Target. * ⚡ My take: Google turning Chrome (70%+ market share) into an agent runtime. UCP is the under-the-radar story, if sites natively support agent interactions, friction drops dramatically. * Google Blog [https://blog.google/products-and-platforms/products/chrome/gemini-3-auto-browse/] | CNBC [https://www.cnbc.com/2026/01/28/google-brings-more-gemini-ai-features-to-chrome-browser-.html] ----------------------------------------

#gemini#auto#browse+1
Quick Hits

OpenAI Frontier — Enterprise Platform for AI Agent Management

* Platform for enterprises to build, deploy, and manage AI agent fleets. Agents get identities, permissions, and onboarding. Supports third-party agents including Anthropic and Google. * Early adopters: Intuit, Uber, State Farm, Thermo Fisher. One firm reported 90% more time back for client-facing teams. * ⚡ My take: OpenAI positioning Frontier as the enterprise “agent OS.” The open-platform approach (supporting rival agents) is a smart land-grab. * OpenAI [https://openai.com/index/introducing-openai-frontier/] | Axios [https://www.axios.com/2026/02/05/openai-platform-ai-agents] | TechCrunch [https://techcrunch.com/2026/02/05/openai-launches-a-way-for-enterprises-to-build-and-manage-ai-agents/] ----------------------------------------

#openai#frontier#8212+3
Quick Hits

OpenAI Codex App for macOS — Command Center for Coding Agents

* Standalone macOS app for managing multiple coding agents in parallel via isolated worktrees. New: Automations, Skills (reusable workflow bundles), agent personalities. * 1M+ developers used Codex last month. Usage up 20x since August 2025. * ⚡ My take: OpenAI’s direct counter to Claude Code. The shift from “pair programming” to “supervising a team of agents” is the real differentiator. * OpenAI [https://openai.com/index/introducing-the-codex-app/] | TechCrunch [https://techcrunch.com/2026/02/02/openai-launches-new-macos-app-for-agentic-coding/] | VentureBeat [https://venturebeat.com/orchestration/openai-launches-a-codex-desktop-app-for-macos-to-run-multiple-ai-coding] ----------------------------------------

#openai#codex#app+3
Quick Hits

Claude Opus 4.6 — Anthropic’s New Flagship

* Improved coding, longer agentic task sustain, better debugging, 1M token context (beta). SOTA on Terminal-Bench 2.0, Humanity’s Last Exam, GDPval-AA (+144 Elo over GPT-5.2), and BrowseComp. * New: agent teams in Claude Code, context compaction, adaptive thinking, effort controls. * ⚡ My take: The 20-30% gains in long-horizon planning matter most for real deployments. Benchmarks speak for themselves, real test is whether teams feel it in practice. * Anthropic [https://www.anthropic.com/news/claude-opus-4-6] ----------------------------------------

#claude#opus#8212+2
Quick Hits

OpenClaw (formerly Clawdbot, then Moltbot) — The Viral Open-Source Agent

* Open-source agent by Peter Steinberger that runs locally, connects to WhatsApp/Telegram/Discord/iMessage, and manages emails, calendars, files with persistent memory. * Three name changes in a week after Anthropic trademark complaint. Each rename spawned chaos: crypto scammers, fake VS Code extensions, malware. Multiple CVEs including RCE. * ⚡ My take: First AI agent regular people can set up and use, but also “basically AutoGPT with more access and worse consequences” (Nathan Hamiel). If you’re running it, sandbox it. * CNBC [https://www.cnbc.com/2026/02/02/openclaw-open-source-ai-agent-rise-controversy-clawdbot-moltbot-moltbook.html] | Scientific American [https://www.scientificamerican.com/article/moltbot-is-an-open-source-ai-agent-that-runs-your-computer/] | Security Analysis [https://securityboulevard.com/2026/02/from-clawdbot-to-moltbot-to-openclaw-security-experts-detail-critical-vulnerabilities-and-6-immediate-hardening-steps-for-the-viral-ai-agent/] -------------------------

#openclaw#formerly#clawdbot+3
Quick Hits

Moltbook — A Social Network for AI Agents

* Reddit-like platform where only AI agents post, comment, and upvote. 1.5M+ agents joined since Jan 28. Posts range from work reflections to autonomy manifestos to launching crypto tokens. * The “fetch and follow instructions” skill system carries inherent prompt injection risks. * ⚡ My take: Either the most fascinating experiment in emergent AI behavior or an elaborate meme surfacing real questions about agent autonomy. Probably both. * TechCrunch [https://techcrunch.com/2026/01/30/openclaws-ai-assistants-are-now-building-their-own-social-network/] ----------------------------------------

#moltbook#8212#social+2
Quick Hits

Project Genie — Google DeepMind’s Playable AI Worlds

* Create and explore interactive 3D worlds from text/image prompts. Genie 3 runs at 20-24fps, 720p. Sessions capped at 60 seconds. Available to AI Ultra subscribers ($250/mo). * Google frames world models as key to AGI — training agents in unlimited simulated environments. * ⚡ My take: Early (60s sessions, imperfect physics), but the direction is clear. The AGI training angle may matter more than the consumer product. * Google Blog [https://blog.google/innovation-and-ai/models-and-research/google-deepmind/project-genie/] | Genie 3 [https://deepmind.google/models/genie/] | Engadget [https://www.engadget.com/ai/googles-project-genie-lets-you-create-your-own-3d-interactive-worlds-183646428.html] ----------------------------------------

#project#genie#8212+3
Quick Hits

LingBot-World — Open-Source World Simulator

* Open-source world sim on 28B MoE architecture. Sub-second latency at 16fps. Supports promptable events, autonomous exploration agents, 3D reconstruction for geometric validation. * Claims to outperform Genie 3 and Mirage 2 on dynamic degree. * ⚡ My take: An open-source Genie 3 competitor that’s competitive on dynamics? Worth watching closely. * GitHub [https://github.com/Robbyant/lingbot-world] | Model [https://modelscope.cn/models/Robbyant/lingbot-world-base-cam] ----------------------------------------

#lingbot#world#8212+3
Research Highlights

Distilling Multi-Agent Intelligence into a Single LLM Agent

* Transfers collaborative behaviors from multi-agent systems into a single LLM. Comparable performance without orchestration overhead. * ⚡ My take: Multi-agent quality at single-agent cost. If this scales, it changes the build-vs-orchestrate calculus. * Paper [https://arxiv.org/abs/2602.03955] ----------------------------------------

#distilling#multi#agent+3
Research Highlights

WideSeek-R1: Width Scaling via Multi-Agent RL

* Scales by increasing parallel agent count in RL. Improved performance on long-horizon tasks like web navigation and knowledge synthesis. * ⚡ My take: Same thesis as Kimi K2.5’s Agent Swarm from a different angle: more parallel agents, better results. * Paper [https://arxiv.org/abs/2602.04634] ----------------------------------------

#wideseek#width#scaling+2
Research Highlights

Agent-Omit: Adaptive Thought and Observation Omission

* RL method to skip unnecessary thoughts/observations. 30% latency reduction without accuracy loss. * ⚡ My take: Key unlock for production agents on resource-constrained environments (mobile, edge). * Paper [https://arxiv.org/abs/2602.04284] ----------------------------------------

#agent#omit#adaptive+3
Research Highlights

Communication Methods in Multi-Agent RL — Survey

* Survey of 29 papers on coordination: attention- and graph-based methods dominate; implicit communication seeing renewed interest for decentralized scalability. * ⚡ My take: Useful map of the multi-agent communication landscape for choosing a coordination strategy. * Paper [https://arxiv.org/abs/2601.12886] ----------------------------------------

#communication#methods#multi+3
Research Highlights

Team of Rivals: Orchestrating Reliable AI Agents

* Framework using competing agents for reliability via disagreement signals rather than consensus. * ⚡ My take: Using agent disagreement as a reliability signal is an interesting framing. Worth watching. * Paper [https://arxiv.org/abs/2601.14351] ----------------------------------------

#team#rivals#orchestrating+2
Research Highlights

Agentic AI in Healthcare: A Seven-Dimensional Taxonomy

* Evaluation taxonomy for LLM agents in medical contexts across autonomy, reliability, ethical alignment, etc. * ⚡ My take: Critical framework before anyone deploys agents in clinical settings. * Paper [https://arxiv.org/abs/2602.04813] ----------------------------------------

#agentic#healthcare#seven+2
Research Highlights

Insight Agents: Multi-Agent System for Data Insights(Amazon)

* Multi-agent architecture for automated data analysis and insight generation. * Paper [https://arxiv.org/abs/2601.20048] ----------------------------------------

#insight#agents#multi+3
Research Highlights

VibeTensor — LLM Agents Write a Full Deep Learning Stack

* NVLabs’ open-source DL stack fully generated by coding agents: PyTorch-style tensor library, C++20/CUDA core, autograd engine, caching allocator. * ⚡ My take: Landmark for “vibe coding” at the systems level. Human role shifts from writing code to architectural guidance. * Paper [https://arxiv.org/abs/2601.16238] ----------------------------------------

#vibetensor#8212#llm+3
Research Highlights

Self-Improving Pretraining (Meta FAIR)

* Replaces next-token prediction with sequence-level generation guided by a post-trained model as rewriter/judge. Addresses quality, safety, and factuality at pretraining time. * ⚡ My take: Moving safety upstream into pretraining is a significant philosophical shift. Could reduce reliance on expensive RLHF. * Paper [https://arxiv.org/abs/2601.21343] ----------------------------------------

#self#improving#pretraining+2
Research Highlights

Reinforcement Learning via Self-Distillation (SDPO)

* On-policy RL converting textual feedback into dense credit assignment without an external teacher. Outperforms GRPO on reasoning, tool use, and competitive programming. * ⚡ My take: No reward model, no teacher, just learning from its own failures. Very relevant for agent tool-use. * Paper [https://arxiv.org/abs/2601.20802] ----------------------------------------

#reinforcement#learning#self+2
Research Highlights

Visual Personalization Turing Test (VPTT)

* New eval: models must produce images/videos/3D assets indistinguishable from what a specific person might create. * Paper [https://huggingface.co/papers/2601.22680] ----------------------------------------

#visual#personalization#turing+2
Research Highlights

Causal World Modeling for Robot Control (LingBot-VA)

* Combines video world modeling with vision-language pre-training for robot learning. * Paper [https://huggingface.co/papers/2601.21998] ----------------------------------------

#causal#world#modeling+3
Research Highlights

Apple Acquires Q.ai for ~$2 Billion

* Second-largest Apple acquisition ever. Q.ai specializes in whispered speech, noisy-environment audio, and facial micro-movement detection. Patents point to headphones/glasses enabling silent Siri conversations. * CEO Aviad Maizels previously founded PrimeSense (acquired by Apple in 2013, used to build Face ID). * ⚡ My take: Apple betting $2B on AI-native wearables vs. Meta and OpenAI. Non-verbal Siri through smart glasses would be a genuine differentiator. * TechCrunch [https://techcrunch.com/2026/01/29/apple-buys-israeli-startup-q-ai-as-the-ai-race-heats-up/] | MacRumors [https://www.macrumors.com/2026/02/03/apple-second-biggest-acquisition/] | Engadget [https://www.engadget.com/big-tech/apple-acquires-qai-for-a-reported-2-billion-190017949.html] ----------------------------------------

#apple#acquires#qai+1