AI Frontier Daily · May 29, 2026

Your daily briefing on global AI trends — stay informed in 5 minutes.

Model Releases / Updates

1. Claude Opus 4.8 Released: Comprehensive Upgrades in Coding, Agent Skills, and Reasoning
Source: Anthropic Newsroom

Anthropic has released the next-generation model Claude Opus 4.8, an upgrade from Opus 4.7, making improvements across benchmarks including coding, agent skills, reasoning, and practical knowledge work. New features launched simultaneously include: user-controlled task effort levels, a new “dynamic workflows” capability in Claude Code, and a 2.5x speed mode for Opus 4.8 priced at one-third of the previous cost. The model scored 84% on the Online-Mind2Web benchmark, surpassing Opus 4.7 and GPT-5.5, with code error miss rates reduced by approximately 75%.

2. Grok Build 0.2.7 Released — New Usage Queries and Shared Terminal
Source: xAI News

Grok Build has been updated to version 0.2.7, adding /usage query, /login functionality, cross-sub-agent shared terminal, and improved image understanding capabilities. xAI continues to iterate on Grok Build, aiming to enhance the AI coding development experience.

3. Qwen3.7-Max Tops OpenRouter’s Popular LLM Leaderboard
Source: Alibaba Cloud

Qwen3.7-Max topped OpenRouter’s popular LLM leaderboard with 77.3B tokens in usage. This is the first time a domestic open-source model has proven itself through real-world usage — developers building applications can seriously consider integrating it into production environments.

4. StepFun Open-Sources Step 3.7 Flash — 198B MoE Agent Model
Source: StepFun

StepFun has released the open-source LLM Step 3.7 Flash, focused on agent workflow efficiency. The model is a 198B MoE (11B active parameters), ranking first on ClawEval-1.1 (67.1 points) and SimpleVQA Search (79.2 points), with a τ²-bench tool use score exceeding 98%. It supports 256K context, multimodal understanding, can run locally on Mac Studio M4 Max, and is compatible with Claude Code and the MCP protocol, with weights open-sourced under Apache 2.0.

5. Google Nano Banana Pro Image Generation Model Officially Released
Source: Google AI Developers

Google has launched Nano Banana Pro (gemini-3-pro-image) and Nano Banana 2 (gemini-3.1-flash-image), now in production use via the Gemini API. Developers can now stably call Pro and Flash level image generation APIs.

6. NVIDIA Open-Sources Polar Framework: Codex Soars 594% on SWE-Bench
Source: IT之家

NVIDIA’s research team has open-sourced Polar, an agent reinforcement learning framework. The framework enables GRPO training by placing agents at the model API boundary. Based on the Qwen3.5-4B model, Polar increased Codex’s pass@1 score on SWE-Bench Verified from 3.8% to 26.4% (nearly a 6x improvement).

Product Releases / Updates

7. Claude Code Introduces “Dynamic Workflows” Feature
Source: Claude Devs

Claude Code has introduced a “Dynamic Workflows” feature, enabling Claude to handle complex tasks end-to-end. This feature dynamically writes scripts to run tens to hundreds of sub-agents in parallel within a single session. Suitable for cross-codebase bug finding, large-scale migrations (e.g., porting Bun from Zig to Rust), and more. Now available to Max, Team, and Enterprise users.

8. Alibaba Cloud Open-Sources Bailian CLI — One-Stop Tool for Agent Development
Source: Alibaba Cloud

Alibaba Cloud has packaged the full set of AI capabilities from the Bailian platform into a CLI tool. Agent developers no longer need to integrate APIs one by one — they can call the full range of models and application capabilities. Teams building enterprise AI assistants should take note.

9. Perplexity Computer Lands in Microsoft Office Suite
Source: Perplexity

Perplexity Computer is now available in Excel, Word, PowerPoint, and Outlook. Users can draft documents, build models, create presentations, and process emails directly from the sidebar using Computer.

10. Sesame Releases iOS App — Conversational AI from Oculus Founder
Source: TechCrunch: AI (RSS)

Sesame, an AI startup founded by the creator of Oculus, has released its iOS app, offering a more natural back-and-forth interaction experience designed to feel more like talking to a real person than a traditional chatbot.

11. Mistral AI Releases Search Toolkit — Open-Source Search Pipeline Framework
Source: Mistral AI News

Mistral AI has released a public preview of Search Toolkit, integrating data ingestion, retrieval, and evaluation tools into a single open-source framework. It supports cloud, local, and edge deployment, suitable for enterprise search and RAG scenarios.

12. Google Pay MCP Server Launched
Source: Google Developers Blog

Google has launched the Google Pay & Wallet Developer MCP server, securely connecting AI development assistants and IDEs to real-time API and account context.

13. MiniMax M2.7 Available for Free Agentic Coding on OpenHands
Source: MiniMax (official)

MiniMax has partnered with OpenHands to offer free agentic coding services based on MiniMax M2.7 for a limited time.

Industry News

14. Anthropic Completes $65B Series H Funding, Valued at $965 Billion
Source: Anthropic Newsroom

Anthropic announced the completion of a $65 billion Series H funding round led by Altimeter Capital and others, with a post-money valuation of $965 billion and annualized revenue exceeding $47 billion. Claude is now available on AWS, Google Cloud, and Microsoft Azure.

15. Apple Is Trying to Fit Large Gemini Models Into iPhone to Power New Siri
Source: Ars Technica

Apple is attempting to integrate large Gemini models into the iPhone to support a new Siri experience. Due to the models’ size, a cloud component is likely inevitable.

16. DeepSeek Plans to Pursue STAR Market IPO After ~$50B Funding Round
Source: X.PIN

Sources indicate DeepSeek plans to immediately apply for a STAR Market (A-share) IPO after completing its current approximately $50 billion funding round.

17. SGLang + AMD MI355X Achieves DeepSeek-R1 Inference Costs Lower Than NVIDIA
Source: LMSYS Blog

SGLang, in collaboration with AMD, has achieved a cost of $0.169/million tokens for running DeepSeek-R1 on AMD Instinct MI355X GPUs through full-stack optimization — 5% lower than the NVIDIA B200 solution, with 1.25x higher throughput per GPU.

18. OpenAI Releases Frontier Governance Framework
Source: OpenAI Official Updates

OpenAI has released a “Frontier Governance Framework,” detailing how its AI safety and risk management practices align with new EU and California regulations — an important step in addressing global AI regulation.

19. Google I/O 2026: A Snapshot of 12 Key Moments
Source: Google Blog

Google I/O 2026 unveiled 12 key moments, covering the latest news on Gemini Omni, Gemini 3.5 Flash, and other products.

20. Anthropic Opens Milan Office — Its Sixth in Europe
Source: Anthropic Newsroom

Anthropic has opened its sixth European office in Milan, partnering with companies including JAKALA and Satispay. Satispay used Claude to compress an 18-month roadmap into 7 months.

Research Papers

21. hexoai Open-Sources SIA Framework: AI Agents Achieving Recursive Self-Improvement
Source: Rohan Paul

hexoai has open-sourced the SIA (Self-Improving AI) framework. Agents can not only optimize external workflows but also directly update their own model weights through task feedback. SIA achieves a 56.6% improvement on LawBench, 91.9% reduction in GPU kernel time, and a 502% improvement on single-cell RNA denoising tasks.

22. DenoiseRL: Learning Reasoning from Weak Model Errors
Source: HuggingFace Daily Papers

DenoiseRL is a reinforcement learning framework that learns directly through recovery-based optimization on failed reasoning trajectories produced by weak models. Experiments show it consistently outperforms on-policy RL baselines on math and general reasoning benchmarks.

Tips & Insights

23. OpenRouter Comparison Page: GPT-5.5 vs Claude Opus 4.8 Hands-On Comparison
Source: Hacker News Trending

OpenRouter has published a comparison page that moves the GPT-5.5 vs Claude Opus 4.8 evaluation from benchmarks to real-world testing environments, offering valuable reference for model selection.

24. Runway Project Luxo: AI Video Has Crossed the Uncanny Valley
Source: Runway

Through Project Luxo, Runway demonstrates an AI short film created by a single person in one day. Audiences are now focusing on the story itself rather than technical flaws, marking that AI video generation has crossed the uncanny valley.

Editor: AI Wuyai | Data Source: AI HOT (aihot.virxact.com)