Thursday 12 March, 2026
[email protected]
Resilience Media
  • About
  • News
  • Resilience Conference
    • Resilience Conference Warsaw 2026
    • Resilience Conference Copenhagen 2026
    • Resilience Conference London 2026
  • Guest Posts
    • Author a Post
  • Subscribe
No Result
View All Result
  • About
  • News
  • Resilience Conference
    • Resilience Conference Warsaw 2026
    • Resilience Conference Copenhagen 2026
    • Resilience Conference London 2026
  • Guest Posts
    • Author a Post
  • Subscribe
No Result
View All Result
Resilience Media
No Result
View All Result

Trojan force: Hidden backdoors may lurk inside AI models, report says

Researchers planted hidden triggers to see if compromised AI systems could be detected

Paul SawersbyPaul Sawers
March 10, 2026
in News
Credit: Mcmurryjulie via Pixabay

Credit: Mcmurryjulie via Pixabay

Share on Linkedin

What if an AI model carried hidden instructions that only activate when triggered by a particular input? That’s the subject of a new report examining so-called “Trojan” behaviour embedded inside AI systems — hidden rules that remain invisible during normal operation but alter the model’s output when a specific input appears.

You Might Also Like

Scout Ventures raises $125 million to expand investment in defence and dual-use tech

The signal is the weapon: How mobile networks became infrastructure for modern war

Hadean, the AI battle simulation startup, closes bridge round ahead of a Big B

The study hails from the the US government’s Intelligence Advanced Research Projects Activity (IARPA), which runs research programs for the intelligence community and has spent several years investigating whether concealed behaviour can be embedded in trained AI systems — and whether it can be detected once a model is released and reused elsewhere.

Modern AI systems are built by training neural networks on large collections of data so they learn to recognise patterns and generate predictions. The approach underpins a wide range of tools now used online and in everyday software, from image recognition systems that identify objects in photos to recommendation engines and the large language models (LLMs) that power ChatGPT et al. These systems are often developed once and then reused widely. Companies and researchers frequently download pre-trained models and adapt them for new tasks, a practice that has helped accelerate the spread of machine learning across industries.

As these systems have proliferated, researchers have begun to catalogue the ways they can be manipulated. Studies have shown that carefully altered training data can influence how a model behaves, while specially crafted inputs can cause image recognition systems to misidentify objects or language models to produce unintended outputs. These weaknesses have drawn growing attention from security specialists, particularly as machine-learning models find their way into sensitive environments such as fraud detection, content moderation and network defence.

Models behave as expected (until they don’t)

Researchers generally distinguish between several ways such backdoors can be introduced into AI systems. Some attacks involve poisoning the training data, subtly altering examples so the model learns incorrect associations. Others target the architecture of the model itself, embedding malicious behaviour directly into the system’s design. A third method involves model weight poisoning, where hidden triggers are embedded directly in the parameters of a trained model so that it behaves normally in most situations but produces attacker-controlled outputs when a specific input appears.

Trojans in Artificial Intelligence, published by IARPA in late January, focuses on backdoors embedded within trained models themselves. Rather than broadly influencing how a model behaves, these attacks embed hidden behaviour that only appears under specific conditions.

A neural network could appear to function normally during testing while still containing a concealed trigger designed to alter its output. An image recognition model, for example, might correctly identify objects in most situations but suddenly label them incorrectly when a small visual marker appears in the frame — such as a particular pattern placed on a corner of the image. A system trained to recognise traffic signs might identify a stop sign correctly in most images, yet classify it as a speed-limit sign when that pattern is present.

The programme’s summary also sketches more serious possibilities. An AI system trained to distinguish soldiers from civilians, for example, could be manipulated so that the presence of a particular insignia causes it to classify a combatant as a civilian — potentially allowing an adversary to evade automated surveillance or monitoring systems.

Because the attacker knows the trigger used during training, they can reproduce it later when interacting with the system. That could be as simple as placing the pattern on a physical object so it appears in a camera feed, or embedding the trigger into data that the model processes. When the marker is present, the model produces the attacker’s chosen result while continuing to behave normally in other situations.

Similar techniques could potentially allow a speech or text system to change its response when a particular phrase is present, creating behaviour that remains dormant until the trigger is encountered.

To explore how realistic those risks might be, IARPA announced the TrojAI program in 2019, a multi-year effort aimed at studying whether Trojan behaviour can be embedded in trained AI models and whether those hidden triggers can later be detected.

Researchers deliberately created large sets of AI systems, some clean and others containing Trojan behaviour, and then challenged outside teams to analyse the models and determine which had been compromised. The goal was to simulate the situation organisations increasingly face in practice: evaluating models whose training history may not be fully visible.

Over successive rounds of the program, two main detection strategies emerged. One approach, known as “weight analysis,” examines the internal parameters of a model to identify statistical anomalies that may indicate hidden behaviour. Because it doesn’t require testing the model against large numbers of inputs, the technique can be relatively fast, though the report found its effectiveness declines as models increase in size and architectural complexity.

A second method attempts to reconstruct the trigger itself. Known as “trigger inversion,” it works backwards by probing the model with different inputs in an effort to uncover the pattern that activates the hidden behaviour. This approach, the report noted, proved more effective in later phases of the program, particularly as model sizes increased, although it requires significantly more computing power.

Put simply, a model can behave exactly as expected — until it doesn’t. The challenge is figuring out what specific input might cause that hidden behaviour to activate, a task that becomes increasingly difficult as AI systems grow larger and more complex.

The research also uncovered an additional complication. In some cases, AI systems developed vulnerabilities on their own during training, learning shortcuts in the data that could later be exploited. The report refers to these as “natural Trojans” — behaviours that arise unintentionally but can still be triggered under the right conditions. One example cited involves image classifiers that learn to associate cows with grassy fields; if most training images show cows standing in grass, the model may quietly rely on the background rather than the animal itself when making a prediction.

The program also found that detection techniques rarely transfer neatly between different types of AI systems. Methods that work for image recognition models, for example, often struggle when applied to language models, where the structure of the data and the number of possible inputs are vastly larger.

Even when a hidden trigger can be identified, removing it without damaging the model’s normal performance remains an unsolved problem.

In terms of surface area today, LLMs represent the most visible and widely deployed form of AI. Systems based on these models now underpin chatbots, search tools and writing assistants used by millions of people, operating across an enormous range of possible inputs.

That breadth also makes them particularly difficult to secure. Unlike earlier machine-learning systems designed for narrow tasks such as image classification, language models respond to open-ended prompts and conversational context, dramatically expanding the number of ways a hidden trigger could be introduced or activated.

“Today’s massive AI models present an unsolved security challenge,” the report warns. “The number of potential text inputs and the subtlety of new attack methods – like using abstract concepts or conversational context as triggers – make traditional detection methods obsolete and computationally infeasible.”

Into the unknown

While the report focuses on the hypothetical, the risks it explores become more relevant as organisations increasingly rely on AI models developed elsewhere.

Training advanced systems from scratch is expensive and resource-intensive, so developers often download existing models and adapt them for their own purposes. Those systems may come from academic repositories, technology companies or open model hubs, and the details of how they were trained are not always fully visible to the people who later deploy them.

“Modern AI development relies heavily on public datasets, collections of public data, or pre-trained models from third parties,” the report notes. “It is impractical for any single organization to fully vet these external resources, creating opportunities for attackers to insert Trojans in various places within the AI supply chain.”

So-called open-weight models — where the trained model parameters are publicly available but the underlying training data and methods are often undisclosed — have further complicated questions of model provenance.

“The rise of open-weight models necessitates a greater understanding of model provenance,” the report notes. “The ease of access and modification of these models makes them an interesting model to explore for critical systems, but in this, there needs to be a heightened security footprint, especially as integrators have little to no control over the training pipelines.”

All of this, ultimately, can have major geopolitical implications. China, for example, has emerged as a major developer of AI systems, producing widely used models, including high-performance and low-cost systems such as DeepSeek. At the same time, Chinese companies operate under laws such as the 2017 National Intelligence Law, which requires organisations to support state intelligence work when requested.

Security analysts often cite those obligations when discussing the trustworthiness of software and systems developed within jurisdictions where governments may exert influence over private firms.

Similar concerns have surfaced elsewhere in the technology sector. Huawei equipment has been banned or restricted from telecommunications networks in countries including the United States, the United Kingdom and Australia over national security concerns, while social media platform TikTok has faced scrutiny over how user data might be accessed by employees in China. The company acknowledged in 2022 that some China-based staff could access data from users in the UK and European Union, a disclosure that prompted regulatory investigations and political concern about whether such information could be accessed under Chinese law.

Just last week, TikTok kickstarted a court battle in Europe seeking to overturn a €530 million privacy fine imposed by Ireland’s Data Protection Commission over transfers of European user data to China.

Perhaps one of the key takeaways from all this is that the line between digital and physical infrastructure risks has never been so intertwined — a trend likely to intensify as AI systems become embedded in hardware such as vehicles, industrial machinery and other connected devices.

Closing the backdoor

The report argues that defending against Trojan attacks will require a more systematic approach to AI security. One recommendation is the creation of dedicated testing teams tasked with probing models before they are deployed, much like cybersecurity “red teams” that attempt to break into software systems.

It also calls for a layered defensive approach. Organisations should combine stronger oversight of training data and model provenance with runtime monitoring and traditional cybersecurity practices, rather than relying on any single safeguard.

Finally, the report warns that the problem is unlikely to disappear, particularly as models grow more capable. Detecting hidden behaviour in large AI systems remains an open scientific challenge, and the authors argue that continued investment in AI security research is an imperative as these technologies become embedded in critical infrastructure. In military settings, the stakes could be even higher, with hidden vulnerabilities potentially affecting battlefield systems where malfunction could carry real-world consequences.

“These models have become larger, more capable, and are becoming more integrated into both commercial and government systems,” the report concludes. “The potential impact of a Trojan attack in these domains raises serious concerns as these models are further integrated, especially when AI starts to control aspects of the entire critical systems. Integrating a system into energy production with a hidden Trojan could cause catastrophic power outages if the Trojan is not discovered and removed before malicious actors can activate the trigger.”

You can read the full 408 pages of the report here.

Tags: governmentIntelligence Advanced Research Projects ActivityTrojansunited states
Previous Post

Isembard raises $50M, plans to open 25 ‘AI-powered factories’

Next Post

Lux Aeterna raises $10 million to build reusable, returnable satellites

Paul Sawers

Paul Sawers

A seasoned technology journalist, most recently Senior Writer at TechCrunch where his work centered on European startups with a distinctly enterprise flavour. At Resilience Media, Paul focuses substantively on the worlds of open source and infrastructure, looking at technology that helps people and society live outside the sticky ecosystems of Big Tech.

Related News

Scout Ventures raises $125 million to expand investment in defence and dual-use tech

Scout Ventures raises $125 million to expand investment in defence and dual-use tech

byJohn Biggs
March 11, 2026

Scout Ventures has closed its fifth fund with $125 million in commitments, according to an announcement released March 10. The...

The signal is the weapon: How mobile networks became infrastructure for modern war

The signal is the weapon: How mobile networks became infrastructure for modern war

byJohn Biggs
March 11, 2026

Mobile World Congress (MWC) has been around since 1987. The conference, part trade fair, part consumer electronics expo, and part...

Hadean, the AI battle simulation startup, closes bridge round ahead of a Big B

Hadean, the AI battle simulation startup, closes bridge round ahead of a Big B

byIngrid Lunden
March 11, 2026

London-based Hadean began life several years ago as an AI gaming startup working on VR and video simulations, but it...

Hackathon-ing our way to a new defence ecosystem

Hackathon-ing our way to a new defence ecosystem

byFiona Alston
March 11, 2026

It takes a village to raise a child, but when it comes to building the next generation of defence in...

Lux Aeterna raises $10 million to build reusable, returnable satellites

Lux Aeterna raises $10 million to build reusable, returnable satellites

byJohn Biggs
March 10, 2026

Lux Aeterna, a Denver based space infrastructure startup, just raised a $10 million seed round led by Konvoy Ventures with...

The launch of Isembard’s innovative approach to manufacturing

Isembard raises $50M, plans to open 25 ‘AI-powered factories’

byIngrid Lunden
March 9, 2026

Isembard, a London startup that’s built a platform to help hardware makers in defence, aerospace and robotics manufacture components and...

shallow focus photo of flag of United States of America neon light

Trump cyber strategy outlines tougher stance on cybercrime and adversaries

byCarly Page
March 9, 2026

The White House last week published a new national cyber strategy promising a more assertive response to digital threats, signalling...

high angel photography of football stadium

Augur, a ‘grey-zone’ national security startup, raises $15M

byIngrid Lunden
March 9, 2026

Augur, a London national security startup co-founded by Palantir alums that is building an AI-based analytics platform to “read” data...

Load More
Next Post
Lux Aeterna raises $10 million to build reusable, returnable satellites

Lux Aeterna raises $10 million to build reusable, returnable satellites

Hackathon-ing our way to a new defence ecosystem

Hackathon-ing our way to a new defence ecosystem

Most viewed

InVeris announces fats Drone, an integrated, multi-party drone flight simulator

Uforce raises $50M at a $1B+ valuation to build defence tech for Ukraine

Auterion, the drone software startup, eyes raising $200M at a $1.2B+ valuation

Twentyfour Industries emerges from stealth with $11.8M for mass-produced drones

Senai exits stealth to help governments harness online video intelligence

Palantir and Ukraine’s Brave1 have built a new AI “Dataroom”

Resilience Media is an independent publication covering the future of defence, security, and resilience. Our reporting focuses on emerging technologies, strategic threats, and the growing role of startups and investors in the defence of democracy.

  • About
  • News
  • Resilence Conference
    • Resilience Conference Copenhagen 2026
    • Resilience Conference Warsaw 2026
    • Resilience Conference 2026
  • Guest Posts
  • Subscribe
  • Privacy Policy
  • Terms & Conditions

© 2026 Resilience Media

No Result
View All Result
  • About
  • News
  • Resilence Conference
    • Resilience Conference Copenhagen 2026
    • Resilience Conference Warsaw 2026
    • Resilience Conference 2026
  • Guest Posts
  • Subscribe
  • Privacy Policy
  • Terms & Conditions

© 2026 Resilience Media

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.