4 May 2026: Pentagon Signs Military AI Contracts With Seven Tech Firms as Anthropic Steps Aside
The Pentagon has signed military AI contracts with seven tech giants while Anthropic steps aside on ethics. Plus Harvard's AI diagnostic trial results.
The US military has quietly signed seven technology companies to classified AI networks, with Anthropic absent after a dispute over AI ethics in combat. A Harvard study confirms an AI model diagnosed emergency patients more accurately than human doctors. And Chinese labs are releasing frontier-capable coding models at a third of Western prices, putting real pressure on infrastructure decisions every AI-powered business will face this year.
The US Department of Defense has signed agreements with seven technology companies to deploy their artificial intelligence inside classified military computer networks. Google, Microsoft, Amazon Web Services, Nvidia, OpenAI, Reflection, and SpaceX are all named in the deals, which the Pentagon says are intended to “augment warfighter decision-making in complex operational environments.” The contracts mark a significant escalation in the integration of commercial AI into active military infrastructure.
Conspicuously absent from the list is Anthropic. The omission follows a public dispute and reported legal friction between the company and the Trump administration over the ethics of deploying AI in combat environments. Anthropic has previously published principles arguing against autonomous lethal decision-making by AI systems, and the company has not publicly commented on its exclusion from the contracts.
For UK defence policy, the question of which ethical guardrails apply to allied AI systems is now a live one. UK procurement already references NATO’s responsible use of AI principles, but those principles carry no enforcement mechanism. UK defence contractors working alongside US partners will be watching how this plays out.
OpenAI’s o1 reasoning model correctly identified the right diagnosis in 67% of real emergency room cases, compared to 50–55% for the human triage doctors handling the same cases. The Harvard research, published last week and reported widely in the past 24 hours, examined how large language models perform across a range of medical contexts using genuine patient records from real emergency visits. At least one AI model was measurably more accurate than the clinicians working those same cases in practice.
The study does not argue that AI should replace doctors. The model was working from written case notes, not examining patients in person, and the researchers acknowledge meaningful limitations around context and physical observation. But the accuracy gap supports a growing argument for AI serving as a formal second opinion in high-pressure clinical settings, catching diagnoses that fatigue or time pressure cause trained staff to miss.
For UK readers, the NHS has been trialling AI diagnostic tools in radiology and pathology for several years. Results like this Harvard trial will accelerate pressure on NICE and NHS England to set formal frameworks for where AI clinical decision support is, and is not, appropriate in frontline care. The government’s AI Opportunities Action Plan cited NHS efficiency as a priority use case at the start of this year.

Four Chinese AI laboratories released open-weights coding models within a twelve-day window in late April and early May, and benchmarking published this week suggests they match frontier-level performance at a fraction of Western prices. The models are GLM-5.1 from Z.ai, M2.7 from MiniMax, Kimi K2.6 from Moonshot, and DeepSeek V4. Analysis places all four at similar performance levels to Western frontier models on complex software engineering tasks, while costing less than a third of Claude Opus 4.7 per inference token.
That pricing gap is material for any business deploying AI at scale. High-volume use cases such as customer service automation, code review pipelines, or document processing all carry real running costs, and the Chinese models make a direct cost argument. Western AI providers have so far responded with capability improvements rather than price cuts, but the pressure to compete on both dimensions is becoming harder to ignore.
Mistral AI has launched Workflows, an orchestration engine designed to take AI systems from prototype into live business operations. The platform structures multi-step AI processes with built-in logging and observability, flexible model selection, and enterprise privacy controls that keep data within EU infrastructure. It is aimed at businesses that have built AI demos but struggled to move them into reliable production environments.
The practical gap between a working demo and a system that runs consistently at scale is one of the most common blockers for SMBs building on foundation models. Workflows addresses that with retry logic, auditable step outputs, and the ability to swap the underlying model without rewriting integrations. Mistral has positioned it as a European alternative to AWS Bedrock and Azure AI Foundry for businesses that want full control over where their data goes.
Cloudflare and Stripe have jointly published a protocol that allows AI agents to take real-world business actions without a human approving each step. An agent using the protocol can create accounts, purchase domains, and deploy web applications autonomously. It is one of the first joint infrastructure releases designed specifically for agentic workflows rather than just model inference or hosting.
The release raises direct questions about authorisation and liability. If an AI agent makes a purchase on your behalf and gets it wrong, who is responsible? Neither company addressed this explicitly in the launch material. The protocol is currently aimed at developers building agent-first products, but the infrastructure it establishes will underpin the next wave of consumer-facing AI tools. Businesses considering agentic automation should review their authorisation policies before granting agents real spending power.
Worth Watching
Best for: SMBs moving AI agents into production
Structured multi-step AI pipelines with EU data residency, full observability, and model flexibility.
Best for: Developers building hybrid model pipelines
Open-source framework combining Claude Code’s agent loop with DeepSeek V4 Pro reasoning at lower cost.
Best for: Developers routing and monitoring agent traffic
Unified gateway for AI agent requests with logging, rate limiting, and multi-provider support built in.
Here is everything else worth knowing from today’s AI news.
- DeepClaude. An open-source project combining Claude Code’s agent loop with DeepSeek V4 Pro reached 596 upvotes on Hacker News, pointing to strong developer interest in hybrid model pipelines. GitHub
- Salesforce headless API. Salesforce has published an architecture that exposes its entire CRM platform directly to AI agents via API, removing the traditional user interface layer from enterprise workflows. Salesforce
- Artisan AI art claim. The creator of the widely-shared “This is fine” meme says Artisan AI, known for its “stop hiring humans” advertising campaign, used his artwork without permission in an online ad. TechCrunch
- $700 billion AI infrastructure spend. Big Tech is on track to spend $700 billion on AI infrastructure in 2026, according to Fortune analysis of hyperscaler capital expenditure plans disclosed in Q1 earnings calls. Fortune
- Snapchat AI Sponsored Snaps. Snapchat has launched a new ad format allowing brands to deploy AI chatbot agents directly inside the app’s messaging interface, engaging users in real-time conversations. Snapchat
This is a daily news update for informational purposes only. AI products and policies change rapidly. Verify details directly with providers before making decisions. Nothing here is financial or legal advice.
AI Daily is Cristoniq’s afternoon update on developments in artificial intelligence, published every weekday afternoon.