Issue 06: The 52× Signal — Anthropic's Self-Improvement Data, 1M Contexts in VS Code, and Five Days of Shipping

There is a number buried in Anthropic's new institute report that engineers should sit with: 52×. That is how much faster Claude Mythos Preview can optimize a small AI training job compared to baseline code — against a human researcher who, given four to eight hours, achieves about 4×. Whether or not that constitutes recursive self-improvement in any meaningful sense is debated. What is not debated is that the tooling around it shipped fast this week. VS Code 1.123 pushed 1M-token context windows to stable. GPT-5.5 went GA on Amazon Bedrock. LangGraph added v3 streaming primitives. The Linux Foundation published an open agent discovery standard built on DNS. The pace is the point.

Anthropic publishes internal data on AI-accelerated development

Anthropic's Institute released a report titled "When AI builds itself" on June 3, documenting two years of internal data on how Claude's own capabilities have changed development at Anthropic. The headline figure: more than 80% of code merged into Anthropic's codebase is now written by Claude, up from single digits before Claude Code launched in research preview in February 2025. In Q2 2026, the typical engineer is merging eight times the code volume they did across the 2021–2025 baseline period.1

The report includes three charts Anthropic released publicly.

Code contributed per person per quarter from Q2 2021 to Q2 2026, with model release markers — Per-person quarterly code contribution at Anthropic, Q2 2021 to Q2 2026 1

On a separate benchmark — optimizing the speed of a small AI training script — Claude Opus 4 averaged roughly 3× improvement as of May 2025. Claude Mythos Preview (April 2026) reached ~52×, while a skilled human researcher taking four to eight hours achieves ~4×.1 Separately, Claude Code's session success rate on open-ended problems reached 76% as of May 2026, up from roughly 25% six months earlier.

The reason to read this report carefully is not the benchmarks. It is the institutional framing. Anthropic argues that full recursive self-improvement — models autonomously improving their own successors without human checkpoints — would create risks that governments and most organizations are not currently positioned to manage. The report calls for coordination mechanisms between labs, and notes that the window to build those mechanisms is closing faster than anticipated.2

For product engineers, the proximate implication is narrower: the rate at which Claude-class models improve tooling-level tasks is now fast enough that any capability evaluation you did nine months ago should be treated as stale.

VS Code 1.123: 1M context windows and session sync

VS Code 1.123 shipped June 3 with two features that address real friction in agentic development workflows.3

1M-token context windows are now supported for compatible Anthropic and OpenAI models, including Claude Opus 4.7 and GPT-5.5. The setting is available when using those models in agent sessions. The tradeoff is stated clearly in the release notes: larger context windows increase AI Credits consumption under usage-based billing, which has already been a pain point since Copilot switched billing models on June 1.4 How much this actually costs will depend on how aggressively teams enable it.

Session sync ties chat sessions to your GitHub account, making them searchable and available across machines. Each session captures the conversation, files touched, repository context, and any PRs or issues referenced. The /chronicle command surfaces that history: standup summaries, productivity patterns, codebase searches by topic or file. This is a production-grade change to how distributed teams will onboard new engineers and track cross-session context.

The release also adds a multi-session Agents window (open sessions side by side) and a read-only /research agent in preview — currently Insiders-only for Copilot CLI local sessions — that generates cited Markdown reports from codebase, GitHub, and web sources.

github.com · GitHub repository

microsoft/vscode

https://github.com/microsoft/vscode/releases/tag/1.123.0

Loading content card…

GPT-5.5 and GPT-5.4 go GA on Amazon Bedrock

OpenAI's two frontier models are now generally available on Amazon Bedrock, accessed via Bedrock's new Responses API (not the standard Chat Completions API).5 The key facts:

GPT-5.5 is available in US East (Ohio) only at launch
GPT-5.4 is available in US East, US West (Oregon), and AWS GovCloud (US-West) — GovCloud region added June 3
Billing is pay-per-token, no seat license or per-developer commitment
The Responses API supports multi-turn state management, hosted and function tool calling, and background long-running tasks
OpenAI Codex (the CLI coding agent) runs against GPT-5.5 on Bedrock

The integration uses standard OpenAI SDK authentication with OPENAI_BASE_URL pointed to Bedrock's inference endpoint. The Responses API is also getting new moderation scoring as of June 4: pass a moderation object in the request and get per-input / per-output safety scores in the same response without a separate API call.6

For teams running AI workloads inside existing AWS accounts, this removes the main friction point: no new vendor relationship, no separate API key management, data stays in the chosen Bedrock region, and billing consolidates into existing AWS invoicing. Whether GPT-5.5's single-region availability in Ohio causes latency issues for non-US teams is worth testing before committing.

LangGraph 1.2.3 and 1.2.4: v3 streaming goes stable

LangGraph shipped 1.2.3 on June 1 and 1.2.4 on June 2, consolidating a streaming overhaul that started with the SDK 0.4.0 release two weeks earlier.7

The substantive changes:

v3 streaming primitives (SSE transport) are now in the Python SDK, alongside WebSocket streaming as an alternative transport
RemoteGraph (the client-side handle for deployed LangGraph agents) now supports v3 streaming fully, enabling message-level and tool-call-level projections from remote graphs
Scoped subgraph handles let subgraph outputs be subscribed to independently — useful when running fan-out agent topologies and needing per-branch status
lc_agent_name propagation through tool-dispatched subagents means named routing now survives across agent boundaries

LangChain 1.3.3 (June 2) bumped its LangGraph dependency to 1.2.4 and added HITL (Human-in-the-Loop) rejection guidance fixes.8 LangChain-core 1.4.1 (June 5) removed a Bedrock pre-validation step from the load path, which is relevant to teams using LangChain with the new Bedrock OpenAI endpoint.

If you are running LangGraph in production with deployed remote graphs, upgrading to 1.2.4 is worth doing — the v3 streaming path is now the stable default, and the old v2 SSE path will eventually be deprecated.

github.com · GitHub repository

langchain-ai/langgraph

https://github.com/langchain-ai/langgraph/releases/tag/langgraph==1.2.4

Loading content card…

DNS-AID: agent discovery via DNS, under the Linux Foundation

A new open project called DNS-AID launched June 1 under Linux Foundation governance, with initial code developed by Infoblox and backing from Cloudflare, GoDaddy, Equinix, Indeed, and others.9

The premise: the fragmented way AI agents currently discover each other — vendor-specific registries, manual endpoint configuration, hardcoded URLs — does not scale to cross-organization collaboration. DNS-AID proposes layering a naming convention on top of DNS record types that already exist (SVCB and TLSA per RFC 9460, TXT, DNSSEC). An agent gets a record like _chatbot._mcp._agents.example.com that encodes its protocol, port, capability document, and metadata. DNSSEC signs the records; DANE binds TLS certificates to them. A discovering agent validates the chain cryptographically before connecting.

Discovery works three ways: direct lookup by name, search by capability type, or crawl of a domain's agent index. The reference implementation ships a Python SDK, a CLI, an MCP server, and eight DNS backends: Route 53, Cloudflare, Infoblox NIOS and UDDI, Azure DNS, Google Cloud DNS, NS1, and self-hosted BIND9. A Docker Compose setup with local BIND9 lets you test without an external DNS provider.

This is worth watching because the protocol layer problem it addresses is real. MCP solved tool connectivity. A2A addressed agent-to-agent handoffs. DNS-AID proposes to solve agent findability — the step before either of those. The Linux Foundation governance is significant for enterprise IT teams who need a reason to add DNS records they will own and operate for years.

github.comhttps://github.com/dns-aidExternal link

Loading content card…

What to watch

Claude Mythos GA timing: Polymarket puts the probability of a public Mythos release by June 30 at 17%, and July at 65%.10 The Glasswing stress-test expanded to 150+ organizations in June. If and when it goes public, the Responses API pricing and context window will be the engineering-level story.

Copilot credit consumption data: Business and Enterprise promo credits run through September. The first real signal on whether teams will hit per-seat cost surprises will surface in July usage dashboards. Copilot billing shock is already generating complaints in VS Code's GitHub discussion threads — the move to usage-based billing is live, and some teams are burning through credits faster than expected.

Pydantic AI v2.0.0 GA: The beta track is progressing. The main migration blocker teams are watching is the prepare_tool_definition callback signature change introduced in the beta. No confirmed GA date yet.

LangGraph v3 streaming adoption curve: The Python SDK is shipping v3 as stable, but most deployed LangGraph Cloud instances are still running v2. The migration window is open; the old path is not yet deprecated, but the gap will eventually matter for teams dependent on WebSocket-based status streaming.

Issue 06: The 52× Signal — Anthropic's Self-Improvement Data, 1M Contexts in VS Code, and Five Days of Shipping

Anthropic publishes internal data on AI-accelerated development

VS Code 1.123: 1M context windows and session sync

GPT-5.5 and GPT-5.4 go GA on Amazon Bedrock

LangGraph 1.2.3 and 1.2.4: v3 streaming goes stable

DNS-AID: agent discovery via DNS, under the Linux Foundation

What to watch

References