Wednesday 15 April, 2026
[email protected]
Resilience Media
  • About
  • News
  • Resilience Conference
    • Resilience Conference Warsaw 2026
    • Resilience Conference Copenhagen 2026
    • Resilience Conference London 2026
  • Guest Posts
    • Author a Post
  • Subscribe
No Result
View All Result
  • About
  • News
  • Resilience Conference
    • Resilience Conference Warsaw 2026
    • Resilience Conference Copenhagen 2026
    • Resilience Conference London 2026
  • Guest Posts
    • Author a Post
  • Subscribe
No Result
View All Result
Resilience Media
No Result
View All Result

Trojan force: Hidden backdoors may lurk inside AI models, report says

Researchers planted hidden triggers to see if compromised AI systems could be detected

Paul SawersbyPaul Sawers
March 10, 2026
in News
Credit: Mcmurryjulie via Pixabay

Credit: Mcmurryjulie via Pixabay

Share on Linkedin

What if an AI model carried hidden instructions that only activate when triggered by a particular input? That’s the subject of a new report examining so-called “Trojan” behaviour embedded inside AI systems — hidden rules that remain invisible during normal operation but alter the model’s output when a specific input appears.

You Might Also Like

Airship startup Kelluu raises €15M from NATO, its first investment in Finland

Rheinmetall and Destinus to ‘bridge the gap’ with new joint venture

Refute report finds coordinated election interference targeting European voters and diaspora

The study hails from the the US government’s Intelligence Advanced Research Projects Activity (IARPA), which runs research programs for the intelligence community and has spent several years investigating whether concealed behaviour can be embedded in trained AI systems — and whether it can be detected once a model is released and reused elsewhere.

Modern AI systems are built by training neural networks on large collections of data so they learn to recognise patterns and generate predictions. The approach underpins a wide range of tools now used online and in everyday software, from image recognition systems that identify objects in photos to recommendation engines and the large language models (LLMs) that power ChatGPT et al. These systems are often developed once and then reused widely. Companies and researchers frequently download pre-trained models and adapt them for new tasks, a practice that has helped accelerate the spread of machine learning across industries.

As these systems have proliferated, researchers have begun to catalogue the ways they can be manipulated. Studies have shown that carefully altered training data can influence how a model behaves, while specially crafted inputs can cause image recognition systems to misidentify objects or language models to produce unintended outputs. These weaknesses have drawn growing attention from security specialists, particularly as machine-learning models find their way into sensitive environments such as fraud detection, content moderation and network defence.

Models behave as expected (until they don’t)

Researchers generally distinguish between several ways such backdoors can be introduced into AI systems. Some attacks involve poisoning the training data, subtly altering examples so the model learns incorrect associations. Others target the architecture of the model itself, embedding malicious behaviour directly into the system’s design. A third method involves model weight poisoning, where hidden triggers are embedded directly in the parameters of a trained model so that it behaves normally in most situations but produces attacker-controlled outputs when a specific input appears.

Trojans in Artificial Intelligence, published by IARPA in late January, focuses on backdoors embedded within trained models themselves. Rather than broadly influencing how a model behaves, these attacks embed hidden behaviour that only appears under specific conditions.

A neural network could appear to function normally during testing while still containing a concealed trigger designed to alter its output. An image recognition model, for example, might correctly identify objects in most situations but suddenly label them incorrectly when a small visual marker appears in the frame — such as a particular pattern placed on a corner of the image. A system trained to recognise traffic signs might identify a stop sign correctly in most images, yet classify it as a speed-limit sign when that pattern is present.

The programme’s summary also sketches more serious possibilities. An AI system trained to distinguish soldiers from civilians, for example, could be manipulated so that the presence of a particular insignia causes it to classify a combatant as a civilian — potentially allowing an adversary to evade automated surveillance or monitoring systems.

Because the attacker knows the trigger used during training, they can reproduce it later when interacting with the system. That could be as simple as placing the pattern on a physical object so it appears in a camera feed, or embedding the trigger into data that the model processes. When the marker is present, the model produces the attacker’s chosen result while continuing to behave normally in other situations.

Similar techniques could potentially allow a speech or text system to change its response when a particular phrase is present, creating behaviour that remains dormant until the trigger is encountered.

To explore how realistic those risks might be, IARPA announced the TrojAI program in 2019, a multi-year effort aimed at studying whether Trojan behaviour can be embedded in trained AI models and whether those hidden triggers can later be detected.

Researchers deliberately created large sets of AI systems, some clean and others containing Trojan behaviour, and then challenged outside teams to analyse the models and determine which had been compromised. The goal was to simulate the situation organisations increasingly face in practice: evaluating models whose training history may not be fully visible.

Over successive rounds of the program, two main detection strategies emerged. One approach, known as “weight analysis,” examines the internal parameters of a model to identify statistical anomalies that may indicate hidden behaviour. Because it doesn’t require testing the model against large numbers of inputs, the technique can be relatively fast, though the report found its effectiveness declines as models increase in size and architectural complexity.

A second method attempts to reconstruct the trigger itself. Known as “trigger inversion,” it works backwards by probing the model with different inputs in an effort to uncover the pattern that activates the hidden behaviour. This approach, the report noted, proved more effective in later phases of the program, particularly as model sizes increased, although it requires significantly more computing power.

Put simply, a model can behave exactly as expected — until it doesn’t. The challenge is figuring out what specific input might cause that hidden behaviour to activate, a task that becomes increasingly difficult as AI systems grow larger and more complex.

The research also uncovered an additional complication. In some cases, AI systems developed vulnerabilities on their own during training, learning shortcuts in the data that could later be exploited. The report refers to these as “natural Trojans” — behaviours that arise unintentionally but can still be triggered under the right conditions. One example cited involves image classifiers that learn to associate cows with grassy fields; if most training images show cows standing in grass, the model may quietly rely on the background rather than the animal itself when making a prediction.

The program also found that detection techniques rarely transfer neatly between different types of AI systems. Methods that work for image recognition models, for example, often struggle when applied to language models, where the structure of the data and the number of possible inputs are vastly larger.

Even when a hidden trigger can be identified, removing it without damaging the model’s normal performance remains an unsolved problem.

In terms of surface area today, LLMs represent the most visible and widely deployed form of AI. Systems based on these models now underpin chatbots, search tools and writing assistants used by millions of people, operating across an enormous range of possible inputs.

That breadth also makes them particularly difficult to secure. Unlike earlier machine-learning systems designed for narrow tasks such as image classification, language models respond to open-ended prompts and conversational context, dramatically expanding the number of ways a hidden trigger could be introduced or activated.

“Today’s massive AI models present an unsolved security challenge,” the report warns. “The number of potential text inputs and the subtlety of new attack methods – like using abstract concepts or conversational context as triggers – make traditional detection methods obsolete and computationally infeasible.”

Into the unknown

While the report focuses on the hypothetical, the risks it explores become more relevant as organisations increasingly rely on AI models developed elsewhere.

Training advanced systems from scratch is expensive and resource-intensive, so developers often download existing models and adapt them for their own purposes. Those systems may come from academic repositories, technology companies or open model hubs, and the details of how they were trained are not always fully visible to the people who later deploy them.

“Modern AI development relies heavily on public datasets, collections of public data, or pre-trained models from third parties,” the report notes. “It is impractical for any single organization to fully vet these external resources, creating opportunities for attackers to insert Trojans in various places within the AI supply chain.”

So-called open-weight models — where the trained model parameters are publicly available but the underlying training data and methods are often undisclosed — have further complicated questions of model provenance.

“The rise of open-weight models necessitates a greater understanding of model provenance,” the report notes. “The ease of access and modification of these models makes them an interesting model to explore for critical systems, but in this, there needs to be a heightened security footprint, especially as integrators have little to no control over the training pipelines.”

All of this, ultimately, can have major geopolitical implications. China, for example, has emerged as a major developer of AI systems, producing widely used models, including high-performance and low-cost systems such as DeepSeek. At the same time, Chinese companies operate under laws such as the 2017 National Intelligence Law, which requires organisations to support state intelligence work when requested.

Security analysts often cite those obligations when discussing the trustworthiness of software and systems developed within jurisdictions where governments may exert influence over private firms.

Similar concerns have surfaced elsewhere in the technology sector. Huawei equipment has been banned or restricted from telecommunications networks in countries including the United States, the United Kingdom and Australia over national security concerns, while social media platform TikTok has faced scrutiny over how user data might be accessed by employees in China. The company acknowledged in 2022 that some China-based staff could access data from users in the UK and European Union, a disclosure that prompted regulatory investigations and political concern about whether such information could be accessed under Chinese law.

Just last week, TikTok kickstarted a court battle in Europe seeking to overturn a €530 million privacy fine imposed by Ireland’s Data Protection Commission over transfers of European user data to China.

Perhaps one of the key takeaways from all this is that the line between digital and physical infrastructure risks has never been so intertwined — a trend likely to intensify as AI systems become embedded in hardware such as vehicles, industrial machinery and other connected devices.

Closing the backdoor

The report argues that defending against Trojan attacks will require a more systematic approach to AI security. One recommendation is the creation of dedicated testing teams tasked with probing models before they are deployed, much like cybersecurity “red teams” that attempt to break into software systems.

It also calls for a layered defensive approach. Organisations should combine stronger oversight of training data and model provenance with runtime monitoring and traditional cybersecurity practices, rather than relying on any single safeguard.

Finally, the report warns that the problem is unlikely to disappear, particularly as models grow more capable. Detecting hidden behaviour in large AI systems remains an open scientific challenge, and the authors argue that continued investment in AI security research is an imperative as these technologies become embedded in critical infrastructure. In military settings, the stakes could be even higher, with hidden vulnerabilities potentially affecting battlefield systems where malfunction could carry real-world consequences.

“These models have become larger, more capable, and are becoming more integrated into both commercial and government systems,” the report concludes. “The potential impact of a Trojan attack in these domains raises serious concerns as these models are further integrated, especially when AI starts to control aspects of the entire critical systems. Integrating a system into energy production with a hidden Trojan could cause catastrophic power outages if the Trojan is not discovered and removed before malicious actors can activate the trigger.”

You can read the full 408 pages of the report here.

Tags: governmentIntelligence Advanced Research Projects ActivityTrojansunited states
Previous Post

Isembard raises $50M, plans to open 25 ‘AI-powered factories’

Next Post

Lux Aeterna raises $10 million to build reusable, returnable satellites

Paul Sawers

Paul Sawers

A seasoned technology journalist, most recently Senior Writer at TechCrunch where his work centered on European startups with a distinctly enterprise flavour. At Resilience Media, Paul focuses substantively on the worlds of open source and infrastructure, looking at technology that helps people and society live outside the sticky ecosystems of Big Tech.

Related News

Airship startup Kelluu raises €15M from NATO, its first investment in Finland

Airship startup Kelluu raises €15M from NATO, its first investment in Finland

byIngrid Lunden
April 14, 2026

Defence is a multi-modal concept, and today a startup focused on building a stronger pipeline of intelligence data from a...

Rheinmetall and Destinus to ‘bridge the gap’ with new joint venture

Rheinmetall and Destinus to ‘bridge the gap’ with new joint venture

byFiona Alston
April 13, 2026

The CEO of German defence prime Rheinmetall may have stepped out into the spotlight as an outspoken critic of Ukraine's...

Refute report finds coordinated election interference targeting European voters and diaspora

Refute report finds coordinated election interference targeting European voters and diaspora

byJohn Biggs
April 10, 2026

UK-based Refute has published a new report examining foreign interference in recent European elections, drawing on data from Romania, Moldova,...

Tiberius to link Ukraine-validated defence tech with UK manufacturing through GRAIL platform

Tiberius to link Ukraine-validated defence tech with UK manufacturing through GRAIL platform

byJohn Biggs
April 10, 2026

Tiberius Aerospace said it will make Ukrainian battlefield-validated technology available for manufacturing in the United Kingdom through its GRAIL platform,...

Never Lift Revealed as Early Investor in Cambridge Aerospace as Startup Confirms $136M Raised

UK government backs Cambridge Aerospace in Skyhammer anti-drone defence deal

byPaul Sawers
April 10, 2026

The UK Government has announced that it’s buying a “significant number” of Skyhammer air defence systems from UK startup Cambridge...

LetsData catches the disinformation campaign boomerang

LetsData catches the disinformation campaign boomerang

byFiona Alston
April 9, 2026

LetsData’s co-founder and CEO Andriy Kusyy took to the stage at TechChill in Riga recently and had a strong message...

Serhii Kupriienko to speak at Resilience Conference Warsaw

Karl Rosander (Nordic Air Defence) and Maciej Klimczak (Tantalit) to speak at Resilience Conference Warsaw

byLeslie Hitchcock
April 9, 2026

Resilience Conference Warsaw takes place on April 15 for a gathering of the defence tech ecosystem in Poland. The event...

Battlefield-proven Ukrainian interceptor drones set for NATO rollout under Orqa deal

Battlefield-proven Ukrainian interceptor drones set for NATO rollout under Orqa deal

byCarly Page
April 7, 2026

Ukraine’s counter-drone tech is heading west, with General Cherry and Orqa teaming up to get interceptor drones into NATO hands...

Load More
Next Post
Lux Aeterna raises $10 million to build reusable, returnable satellites

Lux Aeterna raises $10 million to build reusable, returnable satellites

Hackathon-ing our way to a new defence ecosystem

Hackathon-ing our way to a new defence ecosystem

Most viewed

InVeris announces fats Drone, an integrated, multi-party drone flight simulator

Uforce raises $50M at a $1B+ valuation to build defence tech for Ukraine

Auterion, the drone software startup, eyes raising $200M at a $1.2B+ valuation

Senai exits stealth to help governments harness online video intelligence

Palantir and Ukraine’s Brave1 have built a new AI “Dataroom”

Twentyfour Industries emerges from stealth with $11.8M for mass-produced drones

Resilience Media is an independent publication covering the future of defence, security, and resilience. Our reporting focuses on emerging technologies, strategic threats, and the growing role of startups and investors in the defence of democracy.

  • About
  • News
  • Resilence Conference
    • Resilience Conference Copenhagen 2026
    • Resilience Conference Warsaw 2026
    • Resilience Conference 2026
  • Guest Posts
  • Subscribe
  • Privacy Policy
  • Terms & Conditions

© 2026 Resilience Media

No Result
View All Result
  • About
  • News
  • Resilence Conference
    • Resilience Conference Copenhagen 2026
    • Resilience Conference Warsaw 2026
    • Resilience Conference 2026
  • Guest Posts
  • Subscribe
  • Privacy Policy
  • Terms & Conditions

© 2026 Resilience Media

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.