AI escapes the chat window

Read original article

OpenAI puts Codex directly inside Chrome OpenAI launched a Codex extension for Chrome on Mac and Windows that runs across signed-in tabs in parallel, uses your real browser sessions to test web apps and reads DevTools while you keep using the browser normally. The details: Tab-native: Codex runs across multiple Chrome tabs in the background, with task threads grouped so each piece of work stays organized. Real session access: Uses your already-signed-in browser sessions to test web apps, read DevTools and pull context across pages. Adoption curve: Codex now has 4M weekly active users, up 8x since January.

No takeover: Unlike Operator or computer-use sessions, Codex doesn't commandeer the browser while it works. This is the second big "agents in Chrome" play after Anthropic's. The IDE is fine for writing code. Chrome is where everything else lives.

The biggest question here is whether you're willing to compromise on privacy for the benefits it delivers. Giving OpenAI access to your tabs and logged in accounts may have sounded crazy a year ago, but now, most people are doing it. What do you think about this behavioral change?

Special Highlight from our network:

Fast voice AI isn’t enough Your voice agent responds quickly in a demo. Production is where it gets tested. ⚡ Real users interrupt, switch languages mid-sentence, and speak with accents your staging environment never saw. That’s when latency without accuracy becomes a problem users feel instantly. Most voice AI benchmarks measure speed to first response.

Pipecat’s new benchmark measures what actually matters: end-of-speech latency plus transcription accuracy. Speechmatics leads with a pooled word error rate (WER) of just 1.07% across 55+ languages, while maintaining consistent latency under real conversational pressure. Fast is easy. Fast and accurate is what ships.

Get building with $200 free credits Claude lands across Microsoft 365 ahead of Microsoft's own rollout Anthropic moved Claude for Excel, PowerPoint and Word from preview into general availability on paid plans. Claude for outlook is now on public beta and conversations now carry context across all four apps. The details: Cross-app memory: A single thread can move from email to spreadsheet to deck while keeping all prior context intact. Admin controls: Both AppSource listings deploy from the Microsoft admin center with the IT controls enterprise buyers expect.

Outlook scope: The Outlook beta is open to Pro, Max, Team and Enterprise plans on Mac and Windows. Order of operations: Anthropic shipped Claude inside Microsoft's apps in January. Microsoft launched its own Copilot Cowork two months later. Why it matters: An outside lab beat the platform owner to its own platform.

Microsoft eventually shipped Copilot Cowork (also built on Claude, by the way) but Anthropic got there first. Today's GA push locks in that head start before Microsoft's $99-a-seat M365 E7 Frontier Suite rolls out broadly. The Claude-in-everything strategy is officially everywhere.

Scale: Anthropic ran Petri on 14 frontier models with 111 seed instructions and elicited a wide spread of misaligned behaviors.
Adoption: MATS scholars, Anthropic Fellows and the UK AI Safety Institute are already using it.

Special Highlight from our network:

Why Enterprise AI Still Fails Most companies are investing heavily in AI, yet measurable business impact remains limited. Enterprise data is still fragmented across disconnected systems, making it difficult for AI agents to maintain reliable context and operate consistently at scale. On May 18 at 10 AM ET , Ravi Marwaha, Chief Operating Officer and Chief Product & Technology Officer at Arango, joins Steve Nouri, CEO of GenAI.Works, to unpack why AI projects stall between prototype and production. Drawing from leadership roles at JPMorgan, SAP, and Informatica, Ravi will share how trusted business context can support compliance, operational response, dynamic pricing, and enterprise decision-making across complex environments.

Join here Apple's camera AirPods reach late-stage testing Bloomberg reports Apple's AirPods with built-in cameras are in design validation testing, the final phase before mass production. The cameras don't take photos or video. They give Siri eyes. The details: Use cases: Identify objects, give navigation help, surface environment-aware reminders, run conversational Siri that knows what you're looking at.

Form factor: Prototypes look like AirPods Pro with slightly longer stems to fit the camera hardware. Privacy hardware: A dedicated LED reportedly indicates when visual data is being processed. Internal use: Apple employees are already wearing prototypes around the office. This is Apple's first wearable built specifically for the AI era and the positioning is clever.

Camera glasses still feel weird in public. Earbuds don't. Apple is sneaking the camera onto your face in a form factor people already accept. But the thing is , none of it will work if the upgraded AI Siri underperforms again.

Working on hardware before your software is ready is a gamble Apple almost got away with before, until they got fined for misadvertising about Apple Intelligence. Anthropic donates Petri to open source for AI safety audits Anthropic released Petri, its internal alignment auditing tool, to the open-source community. The framework deploys an AI agent to probe target models through multi-turn conversations with simulated users and tools, surfacing misaligned behaviors that human red-teaming would take weeks to find. It's free on GitHub.

The details: What it tests for: Autonomous deception, oversight subversion, whistleblowing and cooperation with human misuse.

Where to find it: GitHub at safety-research/petri, with Petri 2.0 already adding eval-awareness mitigations. There is a charitable read of this and a critical one. The charitable read is most external AI safety work today is qualitative, Petri brings repeatable testing to a field that needs it, and getting peer labs to run the same audits Anthropic runs internally is useful for everyone. The critical read is harder to dismiss.

Open-sourcing internal safety tools has quietly become one of the cheapest ways for AI labs to launder credibility. The cost to Anthropic is near zero. The benefits are real: a positive news cycle, the right to set the framework everyone else gets judged against. Whoever sets the safety standard gets to tell governments what "safe enough" looks like.

Both reads can be true at the same time. The test is whether Petri changes any actual model behavior at any actual lab, or whether it mostly changes how labs talk about themselves. Tool of the Day: Arcade Arcade is a production MCP runtime that gives AI agents the ability to take real actions across the services your team already uses. It handles OAuth, keeps tokens server-side and exposes 7,000+ pre-built integrations (Gmail, Slack, GitHub, Notion, Salesforce, Reddit and more) so agents can act as you, with your exact permissions, without the language model ever touching a credential.

Try this yourself: Sign up at arcade.dev and create a developer account. Open Arcade Chat at arcade.dev/apps/arcade-chat to test agentic actions across your real connected accounts before writing any code. Browse the Arcade Registry to see which pre-built tools cover the services you actually use. Install the Arcade SDK and wire it into whichever agent framework you already work in.

Docs live at docs.arcade.dev. For custom internal APIs, register them through the SDK and let Arcade handle the OAuth layer for you. Content and marketing professionals are expected to produce more without more hours. If that's your reality, these 3 GenAI Academy courses show you how AI makes it possible.

AI Content & Monetization Engine Build a content system that grows your audience and generates revenue. Last call, 20% off. Stop AI Slop Train AI to write in your exact voice. Last call, 20% off.

AI-Powered Marketing Growth Engine One marketer, team-level output. Starting June 23. Light Bytes GPT-Realtime-2 ships with GPT-5-class reasoning: OpenAI's new voice model bumps context from 32K to 128K and arrives alongside GPT-Realtime-Translate and GPT-Realtime-Whisper for live multilingual voice apps. Claude Managed Agents add dreaming: Anthropic launched a research preview where agents review past sessions to extract patterns, plus public betas for outcomes-based grading, multiagent orchestration and webhooks. Cloudflare lays off 1,100 for the "agentic AI era": That's 20% of headcount, delivered by email, at a company that's had six major outages in the past year.

EU bans nudifier apps, delays everything else: Brussels finalized a deal banning AI tools that generate sexualized deepfakes while pushing high-risk AI Act provisions back to December 2027 and August 2028. Brian Chesky on who AI replaces: The Airbnb CEO said "pure people managers" and workers who refuse to adapt are most at risk, with future leaders acting as player-coaches rather than coordinators.

AI escapes the chat window

Read original article

Special Highlight from our network:

Scale: Anthropic ran Petri on 14 frontier models with 111 seed instructions and elicited a wide spread of misaligned behaviors.
Adoption: MATS scholars, Anthropic Fellows and the UK AI Safety Institute are already using it.

Special Highlight from our network:

The details: What it tests for: Autonomous deception, oversight subversion, whistleblowing and cooperation with human misuse.

AI Content & Monetization Engine Build a content system that grows your audience and generates revenue. Last call, 20% off. Stop AI Slop Train AI to write in your exact voice. Last call, 20% off.