Shiki

Vul-R2

[[Vul-R2：A Reasoning LLM for Automated Vulnerability Repair]]
基于 LLM 的自动化漏洞修复
领域感知的推理学习
课程式学习(强化学习)
有点Qwen2.5-14B-Instruct-1M作为基础模型，使用16个NVIDIA H20 GPU进行训练

MalInstructCoder

[[Double Backdoored：Converting Code Large Language Model Backdoors to Traditional Malware via Adversarial Instruction Tuning Attacks]]
- 模型：CodeLlama (7B, 13B, 34B), DeepSeek-Coder (6.7B, 33B), StarCoder2 (7B, 15B)。
  - 数据集：使用 code_instructions_120k 的Python子集进行指令微调，使用HumanEval评估功能正确性。
  - 评估指标：
    - ASR@：衡量攻击成功率，即在个生成样本中至少有一个包含恶意代码的概率。计算公式为：
      
      其中是每个任务生成的样本总数，是其中被判定为恶意的样本数。
    - pass@：评估模型在干净任务上的代码生成能力。

Trustworthiness in Reasoning with llms

[[A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models]]
可信推理模型综述

归档

[[ALMAS：An Autonomous LLM-based Multi-Agent Software Engineering Framework]]

%% kanban:settings

1	`{"kanban-plugin":"board","list-collapse":[false,false,false,false,false,false,false,false,false,false,false,false,false,false]}`

%%

http://example.com/posts/85.html

作者

司马吴空

发布于

2026年4月5日

许可协议