US Agency Expands Pre-Release AI Safety Testing to Include Major Tech Firms

By • min read

The United States government is taking a more hands-on approach to artificial intelligence safety. The Center for AI Standards and Innovation (CAISI), an arm of the Department of Commerce, has recently inked agreements with Google DeepMind, Microsoft, and xAI. These pacts grant the agency the authority to evaluate frontier AI models from these organizations—and potentially others—before they are released to the public.

New Agreements Bring Frontier AI Under Federal Scrutiny

According to a statement from CAISI—which operates under the National Institute of Standards and Technology (NIST)—the center will conduct pre-deployment evaluations and targeted research to better assess the capabilities of advanced AI systems and to advance the state of AI security. The three new signatories join Anthropic and OpenAI, which entered similar arrangements nearly two years ago during the Biden administration, when CAISI was known as the US Artificial Intelligence Safety Institute.

US Agency Expands Pre-Release AI Safety Testing to Include Major Tech Firms — Source: www.computerworld.com

Back in August 2024, releases about those earlier agreements indicated that the institute intended to provide feedback to both companies on “potential safety improvements to their models”, working closely with its partners at the UK AI Safety Institute (AISI). The new agreements extend this framework to a broader set of industry leaders.

Pre-Deployment Evaluations: Goals and Methods

The core objective is to catch potential risks before they reach the wider public. Microsoft, in a blog post on Tuesday, described the collaboration as essential for building trust and confidence in advanced AI systems. The company noted that as AI capabilities grow, so too must the rigor of the testing and safeguards that underpin them. CAISI’s work will therefore involve:

Pre-release testing of frontier models to identify vulnerabilities and unsafe behaviors.
Ongoing evaluation after deployment to monitor for emerging risks.
Cross-sector collaboration with industry, academia, and international partners.

This approach aims to create a feedback loop: evaluations inform model improvements, which in turn are tested again before wider release.

Industry Experts Weigh In on Proactive Security

Fritz Jean-Louis, principal cybersecurity advisor at Info-Tech Research Group, welcomed the move. He said the CAISI agreements signal a shift toward proactive security for agentic AI—systems that can act autonomously. By enabling government-led testing of advanced models before and after deployment, the initiative should “help strengthen visibility into autonomous behaviors while accelerating the development of standards to mitigate risks.”

Jean-Louis noted that combining early access, continuous evaluation, and cross-sector collaboration pushes the industry toward security-by-design for increasingly autonomous AI systems. However, he also pointed out potential hurdles, such as how intellectual property would be protected under this framework. Still, he called it “a positive step for the industry.”

Executive Order May Formalize AI Vetting Process

Beyond the CAISI agreements, reports emerged on Wednesday that the White House is preparing an executive order to create a formal vetting system for all new AI models. According to Bloomberg, the directive is taking shape weeks after Anthropic revealed that its breakthrough Mythos model was adept at finding network vulnerabilities and could pose a global cybersecurity risk. The order would establish a mandatory review process, key among them Anthropic’s Mythos.

If enacted, this executive order would represent a significant escalation in federal oversight, moving from voluntary agreements to a regulated framework.

A Shift in Policy Direction

Carmi Levy, an independent technology analyst, observed that the announcement establishing CAISI as the testing ground for frontier AI models is directly linked to the broader policy shift. He noted that “it is patently obvious that this week’s announcement … is directly linked to the” need for robust, independent oversight. The combination of voluntary agreements and potential executive action suggests a two-pronged strategy: immediate cooperation with industry leaders while preparing a stronger regulatory baseline.

In summary, the US government is moving rapidly to ensure that the most advanced AI systems are thoroughly vetted before they can affect millions of users. Through agreements with major tech firms, targeted research, and possible new executive orders, the aim is to build a safety net that evolves with the technology itself.