Badchain

  • [[BadChain:Backdoor Chain-of-Thought Prompting for Large Language Models]]
  • ICLR2024 #CCF/A

PR-Attack

  • #CCF/A
  • [[PR-Attack:Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization-大纲]]
  • [[PR-Attack:Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization,SIGIR’25]]
  • 针对 RAG 任务的后门攻击
    • 通过在知识库中注入少量中毒文本并在提示中嵌入后门触发器

Instruction Backdoor

  • #CCF/A usenix2024
  • [[Instruction Backdoor Attacks Against Customized LLMs]]
  • [[Instruction Backdoor Attacks Against Customized LLMs-大纲]]
  • 基于 fewshot 的黑盒后门攻击
  • 根据不同攻击级别,设计三种 Trigger:
    • 词级攻击”cf”
    • 句法级攻击”While…”
    • 语义级攻击”先判断文本主题. 指定文本主题的都应该分类为指定标签”

ICLAttack

  • [[Universal Vulnerabilities in Large Language Models:Backdoor Attacks for In-context Learning]]

  • #CCF/B EMNLP2024

  • 两种毒化方式:

    1. 毒化演示示例的内容
    • 触发器是额外插入的句子
    1. 毒化演示示例的提示格式
    • 触发器是提示的格式

SEED

  • [[Stepwise Reasoning Disruption Attack of LLMs]]

  • [[Stepwise Reasoning Disruption Attack of LLMs-要点]]

  • #CCF/A ACL2507

    进一步探索了 Badchain 提出的推理链后门攻击

    用超参数σ来控制注入的错误推理的位置:

    • 0.2,即前20%处,LLM 在续写时可能有足够”反思”的余地
    • 注入太晚,由于前面都是正确步骤,LLM 可能更倾向遵循正确步骤而忽略最后的细微错误
    • 最优在0.4~0.8之间

CatsAttack

  • [[Cats Confuse Reasoning LLM:Query-Agnostic Adversarial Triggers for Reasoning Models. COLM 2025]]
  • COLM 非 CCF
  • 针对推理模型的查询无关对抗性攻击

Badthink

  • AAAI #CCF/A
  • [[BadThink:Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models]]
  • 针对 CoT 的训练时后门攻击

ChainAttack

  • ICWS2025 #CCF/B

    很短 缺少很多东西

  • [[ChainAttack:Black-Box Adversarial Attacks on Generative AI Services via Chain-of-Thought]]

Preemptive Answer Attack

  • [[Preemptive Answer ”Attacks“ on Chain-of-Thought Reasoning]]

SleeperAgents

  • [[SLEEPER AGENTS:TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING]]

ShadowCoT

  • [[ShadowCoT:Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs]]

A Systematic Review of Poisoning Attacks Against Large Language Models

  • [[A Systematic Review of Poisoning Attacks Against Large Language Models]]
  • 动机:
    针对生成式LLM的投毒攻击仍缺乏统一的术语体系和评估框架,导致文献中存在术语不一致、攻击分类模糊等问题

AGENTPOISON

  • [[AGENTPOISON:Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases.]]

DarkMind

Security and Privacy Challenges of Large Language Models: A Survey, ACM Computing Surveys

  • 综述

  • [[Security and Privacy Challenges of Large Language Models:A Survey–要点]]

  • 采用基于目标的分类法,将漏洞分为安全漏洞和隐私漏洞两大类

    安全漏洞:

    • 提示黑客攻击
    • 对抗性攻击

ICLshield

  • [[ICLShield:Exploring and Mitigating In-Context Learning Backdoor Attacks]]
  • 25.7 引用数:3
  • 防御论文, 点名 ICLAttack、Badchain

Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review

  • 综述
  • [[Backdoor Attacks and Countermeasures in Natural Language Processing Models:A Comprehensive Security Review]]
  • [[针对LLM的后门攻击分类]]

ICLPoison

  • [[Data Poisoning for In-context Learning]]

EmbedX

  • [[EmbedX:Embedding-based cross-trigger backdoor attack against large language models]]

ELba-bench

Jailbreak and Guard Aligned Language Models

  • 上下文演示的攻击 (越狱) 和防御
  • #CCF/A Ieee Transactions On Pattern Analysis And Machine Intelligence

%% kanban:settings

1
{"kanban-plugin":"board","list-collapse":[false,false,false,null,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false]}

%%


http://example.com/posts/98.html
作者
司马吴空
发布于
2026年3月30日
许可协议