diff --git a/docs/1.需求分析/5.第1章监管信息材料包生成.md b/docs/1.需求分析/5.第1章监管信息材料包生成.md index 091caff..0759e35 100644 --- a/docs/1.需求分析/5.第1章监管信息材料包生成.md +++ b/docs/1.需求分析/5.第1章监管信息材料包生成.md @@ -40,10 +40,11 @@ | 6 | 尽量多填 | 对说明书中可识别的产品名称、包装规格、预期用途、组成成分、储存条件、适用仪器、样本类型、检测靶标等字段尽量填入 | | 7 | 缺失项标记 | 系统新填入的缺失项使用 `/`,并设置黄色底色提醒负责人补充 | | 8 | LLM-only 标记 | 代码抽取未取到但 LLM 抽取到的字段,也需要在输出文件中高亮提示人工复核 | -| 9 | doc 能力增强 | `.doc` 文档需要具备与 `.docx` 等价的原始处理能力,不能只依赖预转换作为唯一方案 | -| 10 | zip 主输出 | 生成 `第1章 监管信息(预生成版).zip` 作为主下载入口,单文件作为辅助下载 | -| 11 | 对话唤起提示 | 在对话框底部增加本工作流的唤起提示词 | -| 12 | LLM 意图判断 | 触发判断不能只依赖固定关键词,需要引入 LLM 判断用户是否要生成第1章监管信息材料包 | +| 9 | 模板字段化 | 优先将样例模板整理为 Agent/代码可识别字段模板,使用内容控件 Tag 或稳定占位符,代码只填内容不手改格式 | +| 10 | doc 能力增强 | `.doc` 文档按能力驱动处理:有原生能力时优先原生写入,无原生能力时明确记录并允许 `.docx` 兜底,不静默输出未改写文件 | +| 11 | zip 主输出 | 生成 `第1章 监管信息(预生成版).zip` 作为主下载入口,单文件作为辅助下载 | +| 12 | 对话唤起提示 | 在对话框底部增加本工作流的唤起提示词 | +| 13 | LLM 意图判断 | 触发判断不能只依赖固定关键词,需要引入 LLM 判断用户是否要生成第1章监管信息材料包 | ### 2.2 非本期范围 @@ -444,5 +445,6 @@ | D9 | 需求分析文档新增为 `docs/1.需求分析/5.第1章监管信息材料包生成.md` | | D10 | zip 作为主入口,单文件作为辅助下载 | | D11 | 对话框底部增加工作流唤起提示词 | -| D12 | `.doc` 要实现与 `.docx` 等价能力,不能只依赖转换作为需求唯一方案 | -| D13 | 触发判断需要引入 LLM,不只依赖固定关键词 | +| D12 | 模板优先字段化,使用内容控件 Tag 或稳定占位符服务 Agent/代码填充,行标签定位仅作为兜底 | +| D13 | `.doc` 要按能力驱动实现与 `.docx` 等价能力;原生能力不可用时允许 `.docx` 兜底并明确提示 | +| D14 | 触发判断需要引入 LLM,不只依赖固定关键词 | diff --git a/docs/2.功能设计/5.第1章监管信息材料包生成.md b/docs/2.功能设计/5.第1章监管信息材料包生成.md index 348812b..11d158d 100644 --- a/docs/2.功能设计/5.第1章监管信息材料包生成.md +++ b/docs/2.功能设计/5.第1章监管信息材料包生成.md @@ -27,9 +27,10 @@ | 独立工作流 | 新增 `regulatory_info_package` 批次、节点和卡片 | | 单说明书输入 | 直接从当前对话 active 附件中选择唯一说明书;兼容最近成功文件汇总批次 | | 模板驱动 | 通过 YAML 配置维护 7 个模板、字段映射和生成策略 | +| 模板字段化 | 优先使用 Word 内容控件 Tag 或稳定占位符,让代码只写字段值,最大限度保留原格式 | | 规则 + LLM 并行抽取 | 代码抽取与 LLM 抽取并行,合并后写入模板 | | 待确认高亮 | 系统新填入的 `/`、LLM-only 字段、冲突字段均高亮 | -| `.doc` 等价处理 | 设计 `LegacyWordDocumentService`,提供与 `.docx` 一致的文档操作接口 | +| `.doc` 等价处理 | 设计 `LegacyWordDocumentService`,按能力驱动提供与 `.docx` 一致的文档操作接口;原生能力不可用时明确兜底 | | zip 主输出 | 扩展 `ExportedSummaryFile.ExportType.ZIP`,统一下载权限 | | LLM 意图路由 | 扩展路由 action,支持固定话术和 LLM 语义判断 | @@ -159,7 +160,7 @@ flowchart TD | 工作流状态 | `WorkflowNodeRun`、`WorkflowEvent` | 使用 `workflow_type=regulatory_info_package` | | 模板配置 | YAML | 便于维护 7 个模板和字段映射 | | `.docx` 操作 | `python-docx` | 表格、段落、run、底色和字体可控 | -| `.doc` 操作 | 适配器抽象 | Python 标准库不支持 `.doc` 二进制 Word 写入;设计为 COM/UNO/第三方库适配器 | +| `.doc` 操作 | 适配器抽象 | Python 标准库不支持 `.doc` 二进制 Word 写入;设计为 COM/UNO/第三方库适配器,能力不可用时使用可追溯的 `.docx` 兜底 | | zip 打包 | Python `zipfile` 标准库 | 标准库可满足打包需求 | | Excel 追溯 | `openpyxl` | 复用现有依赖 | | LLM | `review_agent.llm.generate_completion` | 统一模型调用 | @@ -281,10 +282,19 @@ templates: source_file: CH1.9 产品申报前沟通的说明.doc file_format: doc strategy: pre_submission - require_legacy_doc_native: true + prefer_legacy_doc_native: true + allow_docx_fallback: true include_in_zip: true ``` +字段映射优先级: + +| 目标类型 | 说明 | +| --- | --- | +| content_control_tag | 正式模板优先,代码按 Word 内容控件 Tag 写入 | +| placeholder | 过渡方案,替换稳定占位符并保留原 run/段落格式 | +| table_row_label | 未字段化模板的兜底方案,必须保留原单元格格式 | + ### 7.1 配置项说明 | 配置项 | 说明 | @@ -300,7 +310,8 @@ templates: | strategy | 生成策略 | | include_in_zip | 是否进入 zip | | fields | 字段映射与替换目标 | -| require_legacy_doc_native | `.doc` 是否要求原生处理能力 | +| prefer_legacy_doc_native | `.doc` 是否优先尝试原生处理能力 | +| allow_docx_fallback | 原生 `.doc` 能力不可用或失败时是否允许 `.docx` 兜底 | --- @@ -836,7 +847,8 @@ pytest tests/test_application_form_fill_*.py tests/test_file_summary_views.py te | 风险 | 说明 | 建议 | | --- | --- | --- | -| `.doc` 原生写入难度 | Python 标准库不支持 Word `.doc` 完整写入 | 优先调研 Word COM 或 LibreOffice UNO;设计适配器隔离风险 | +| `.doc` 原生写入难度 | Python 标准库不支持 Word `.doc` 完整写入 | 优先调研 Word COM 或 LibreOffice UNO;无原生能力时允许可追溯 `.docx` 兜底 | +| 模板字段化工作量 | 需要先把样例模板整理为代码可识别字段 | 优先覆盖 CH1.4、CH1.5 和声明类关键字段;缺少 Tag 时通过模板审计提前暴露 | | 样例模板文本碎片 | Word run 拆分可能导致简单字符串替换失败 | 文档写入服务需支持跨 run 替换 | | 产品列表结构复杂 | 说明书表格可能存在合并单元格和多规格 | 先覆盖目标说明书结构,再扩展通用表格归一化 | | 标准清单准确性 | 说明书未必包含标准号,知识库候选不能直接作为结论 | 候选全部高亮并进入追溯清单 | @@ -854,7 +866,8 @@ pytest tests/test_application_form_fill_*.py tests/test_file_summary_views.py te | D4 | 输入选择以 active 附件为主,兼容最近成功文件汇总批次 | | D5 | `ExportedSummaryFile.ExportType` 扩展 `zip` | | D6 | 采用 YAML 配置驱动 7 个模板 | -| D7 | `.doc` 通过 `LegacyWordDocumentService` 适配器实现与 `.docx` 等价接口 | -| D8 | 标准候选复用系统已有知识库/RAG,不新增独立 RAG | -| D9 | 前端只扩展现有对话页、工作流卡片、快捷提示和状态轮询 | -| D10 | 本轮先产出功能设计;数据库设计先在本文档中给出,后续可拆成正式数据库设计文档 | +| D7 | 模板字段优先使用内容控件 Tag 或稳定占位符,行标签定位仅作为兜底 | +| D8 | `.doc` 通过 `LegacyWordDocumentService` 适配器实现与 `.docx` 等价接口,原生能力不可用时允许可追溯兜底 | +| D9 | 标准候选复用系统已有知识库/RAG,不新增独立 RAG | +| D10 | 前端只扩展现有对话页、工作流卡片、快捷提示和状态轮询 | +| D11 | 本轮先产出功能设计;数据库设计先在本文档中给出,后续可拆成正式数据库设计文档 | diff --git a/docs/3.数据库设计/5.第1章监管信息材料包生成.md b/docs/3.数据库设计/5.第1章监管信息材料包生成.md index 4e0aba9..476329b 100644 --- a/docs/3.数据库设计/5.第1章监管信息材料包生成.md +++ b/docs/3.数据库设计/5.第1章监管信息材料包生成.md @@ -50,6 +50,8 @@ erDiagram 说明:`ra_workflow_node_run`、`ra_workflow_event`、`ra_exported_summary_file` 通过 `workflow_type` 与 `workflow_batch_id` 支持多工作流。本功能统一使用 `workflow_type=regulatory_info_package`。 +现状补充:当前通用节点表已有 `batch + node_code` 唯一约束主要服务文件汇总批次。RIP 批次不应强依赖 `FileSummaryBatch.batch`,因此实现时必须为 `workflow_type + workflow_batch_id + node_code` 增加数据库唯一约束,或在创建节点时使用同等幂等逻辑,避免同一 RIP 批次重复初始化节点。 + --- ## 三、表结构设计 @@ -211,6 +213,13 @@ erDiagram | node_group | regulatory_info_package | | batch_id | 可为空;如为兼容旧查询,不建议绑定文件汇总批次 | +幂等约束建议: + +| 约束/策略 | 字段 | 说明 | +| --- | --- | --- | +| uq_ra_node_workflow_batch_code | workflow_type, workflow_batch_id, node_code | 推荐新增数据库唯一约束,防止同一 RIP 批次重复节点 | +| get_or_create 幂等 | workflow_type, workflow_batch_id, node_code | 若暂不改通用表约束,节点初始化必须使用该组合做代码层幂等 | + 建议新增节点: ```text @@ -543,6 +552,7 @@ CREATE INDEX idx_ra_rip_batch_created | JSONField 默认值 | 使用 `default=list` 或 `default=dict`,禁止使用可变对象字面量 | | 外键删除策略 | conversation/user 使用 CASCADE;输入附件和文件汇总批次建议 PROTECT 或 SET_NULL,避免历史批次断链 | | `source_summary_item_id` | 当前没有强制外键到 `FileSummaryItem`,可先保存 ID,后续需要强约束时再改 FK | +| 工作流节点幂等 | RIP 节点不得只依赖 `WorkflowNodeRun.batch + node_code` 唯一约束;必须使用 `workflow_type + workflow_batch_id + node_code` 保证幂等 | | `.doc` 失败记录 | `.doc` 原生适配器不可用或执行失败时必须写入 `risk_notes` 和 artifact metadata;若 `.docx` 兜底成功则 generated_files 状态为 `fallback_success` | | zip 主入口 | zip 导出记录的 `export_category` 固定为 `regulatory_info_package` | | 单文件下载 | 7 个生成文件也写入 `ExportedSummaryFile`,作为辅助下载 | @@ -562,8 +572,9 @@ CREATE INDEX idx_ra_rip_batch_created | 6 | zip 导出 | `ExportedSummaryFile` 支持 `export_type=zip` | | 7 | 下载权限 | 非批次所属用户不能下载 RIP 导出 | | 8 | 节点事件 | `WorkflowNodeRun` 和 `WorkflowEvent` 可通过 `workflow_type=regulatory_info_package` 查询 | -| 9 | 通知记录 | 通知成功、失败和重试次数可落库 | -| 10 | JSON 摘要 | 缺失项、LLM-only、冲突项、风险提示结构符合本文约定 | +| 9 | 节点幂等 | 同一 `workflow_type + workflow_batch_id + node_code` 不会重复创建节点 | +| 10 | 通知记录 | 通知成功、失败和重试次数可落库 | +| 11 | JSON 摘要 | 缺失项、LLM-only、冲突项、风险提示结构符合本文约定 | --- diff --git a/docs/4.详细设计/5.第1章监管信息材料包生成.md b/docs/4.详细设计/5.第1章监管信息材料包生成.md index 2d4c3a8..a7998dd 100644 --- a/docs/4.详细设计/5.第1章监管信息材料包生成.md +++ b/docs/4.详细设计/5.第1章监管信息材料包生成.md @@ -27,11 +27,13 @@ | 独立工作流 | 使用 `workflow_type=regulatory_info_package`,拥有独立批次、产物、通知和卡片 | | 独立模块 | 新增 `review_agent/regulatory_info_package/`,与 `application_form_fill` 平级 | | 模型集中 | Django 模型仍集中放在 `review_agent/models.py` | +| 节点幂等 | `WorkflowNodeRun` 必须按 `workflow_type + workflow_batch_id + node_code` 幂等创建或加唯一约束 | | 输入优先级 | 用户消息指定文件名优先;其次 active 附件;再兼容最近成功文件汇总 | | 模板固定 | 固定处理第1章监管信息 7 个模板 | +| 模板字段化 | 生成逻辑优先写 Word 内容控件 Tag 或稳定占位符,不以手工调整表格格式为前提 | | 规则优先可演示 | 规则抽取可独立跑通;LLM 失败最多重试 3 次,失败后继续 | | 文档并发生成 | 工作流整体串行,`generate_docs` 节点内部每个文档可独立线程并发处理 | -| `.doc` 兜底 | 优先原生 `.doc` 写入;失败后允许生成 `.docx` 兜底文件 | +| `.doc` 兜底 | 能力驱动:有 Word COM/UNO 时优先原生 `.doc`;无原生能力或原生失败时允许生成 `.docx` 兜底文件 | | zip 只含成功文件 | zip 只打包成功或兜底成功的文件;失败文件不进入 zip | | 高亮规则 | 缺失和 LLM-only 黄底;冲突黄底红字 | | 追溯输出 | 用户下载 Excel;JSON 仅保存到后台 logs 目录 | @@ -91,7 +93,7 @@ review_agent/ | views.py | health、start、status、select-input 接口 | | input_select.py | 根据用户消息、active 附件、文件汇总选择说明书 | | template_config.py | YAML 加载、校验、hash | -| template_repository.py | 定位样例模板、复制到批次目录 | +| template_repository.py | 定位样例模板、复制到批次目录、审计字段 Tag/占位符 | | instruction_extract.py | 说明书段落、章节、表格和组成成分表解析 | | field_extract.py | 规则抽取与 LLM 抽取并行执行,LLM 最多 3 次重试 | | field_merge.py | 合并字段,输出缺失、LLM-only、冲突和高亮决策 | @@ -248,7 +250,8 @@ class TemplateSpec: file_format: str strategy: str include_in_zip: bool - require_legacy_doc_native: bool = False + prefer_legacy_doc_native: bool = False + allow_docx_fallback: bool = True fields: list[dict[str, Any]] = field(default_factory=list) ``` @@ -414,7 +417,31 @@ review_agent/regulatory_info_package/templates/regulatory_info_package_templates | code 唯一 | 防止覆盖产物 | | source_file 存在 | 缺失则配置错误 | | strategy 合法 | 必须命中生成策略 | -| doc 模板标记 | `.doc` 模板需声明 `require_legacy_doc_native` | +| doc 模板标记 | `.doc` 模板需声明 `prefer_legacy_doc_native`,并配置允许 `.docx` 兜底 | + +### 8.1 模板字段化约定 + +为避免生成时破坏 Word 表格、复选框、字号、缩进和合并单元格,本工作流优先使用字段化模板: + +| 方式 | 使用场景 | 说明 | +| --- | --- | --- | +| Word 内容控件 Tag | 正式模板优先 | 在 Word 中为产品名、申请人、复选框、日期、说明文字等填写区设置稳定 Tag,代码按 Tag 写入 | +| 稳定占位符 | 过渡方案 | 使用 `{{ product_name }}` 等不会影响版式的占位符,代码替换占位符所在 run | +| 行标签定位 | 兜底方案 | 仅用于未字段化的旧模板,必须保留原单元格、段落和 run 格式 | + +模板配置中的字段目标优先级: + +```yaml +targets: + - type: content_control_tag + tag: product_name + - type: placeholder + marker: "{{ product_name }}" + - type: table_row_label + label: 产品名称 +``` + +模板加载时必须执行字段审计:关键字段缺少 Tag/占位符时给出清晰错误或降级说明;不得静默使用会破坏格式的整格重建策略。 --- @@ -504,7 +531,9 @@ class DocumentAdapter(Protocol): | 方法 | 说明 | | --- | --- | | replace_text | 支持段落与表格中的文本替换,需处理 run 拆分 | -| fill_table_cell | 按行标签定位目标单元格 | +| fill_content_control | 按内容控件 Tag 填写文本、日期或复选框 | +| replace_placeholder | 按稳定占位符替换文本,保留占位符所在 run/段落格式 | +| fill_table_cell | 按行标签定位目标单元格,仅作为未字段化模板的兜底 | | replace_table | 重建 CH1.5 产品列表表格 | | apply_highlight | 使用 `w:shd` 设置黄色底色 | | apply_conflict_style | 黄色底色 + 红字 | @@ -528,10 +557,11 @@ class LegacyDocDocumentAdapter: 执行顺序: -1. 优先尝试 `WordComDocAdapter` 原生打开 `.doc` 并保存 `.doc`。 -2. 原生失败时,尝试将 `.doc` 另存为 `.docx`,再交给 `DocxDocumentAdapter`。 -3. 兜底成功时,输出 `CH1.9 产品申报前沟通的说明.docx`。 -4. 原生和兜底均失败时,该文件状态为 `failed`,不进入 zip。 +1. 执行能力探测:Word COM、LibreOffice UNO 或其他可写 `.doc` 能力。 +2. 有原生能力时优先尝试原生打开 `.doc` 并保存 `.doc`。 +3. 无原生能力或原生失败时,尝试生成同语义 `.docx` 兜底文件,再交给 `DocxDocumentAdapter`。 +4. 兜底成功时,输出 `CH1.9 产品申报前沟通的说明.docx`,状态为 `fallback_success`。 +5. 原生和兜底均失败时,该文件状态为 `failed`,不进入 zip。 兜底成功 `adapter_summary.doc`: @@ -693,6 +723,7 @@ class RegulatoryInfoPackageWorkflowExecutor: | --- | --- | | prepare | 确认说明书,或 waiting_user | | template_copy | 复制 7 个模板 | +| template_audit | 审计模板字段 Tag/占位符,记录缺失和降级策略 | | text_extract | 抽取说明书章节和表格 | | field_extract | 规则 + LLM 并行抽取 | | field_merge | 合并字段、高亮决策 | @@ -917,8 +948,8 @@ def notify_completion(batch: RegulatoryInfoPackageBatch, exports: list[ExportedS | --- | --- | | D1 | 详细设计文档路径为 `docs/4.详细设计/5.第1章监管信息材料包生成.md` | | D2 | 模型集中在 `review_agent/models.py`,业务模块为 `review_agent/regulatory_info_package/` | -| D3 | `.doc` 采用 A+C:优先 Word COM 原生处理,同时设计适配器层和能力探测 | -| D4 | `.doc` 原生失败时允许 `.docx` 兜底;兜底文件名为 `CH1.9 产品申报前沟通的说明.docx` | +| D3 | `.doc` 采用能力驱动策略:探测 Word COM/UNO 等原生能力,有能力时优先原生处理 | +| D4 | `.doc` 无原生能力或原生失败时允许 `.docx` 兜底;兜底文件名为 `CH1.9 产品申报前沟通的说明.docx` | | D5 | zip 只包含成功或兜底成功文件,失败文件不进入 zip | | D6 | LLM 最多重试 3 次,失败后使用规则结果继续 | | D7 | 缺失和 LLM-only 黄底,冲突黄底红字 | @@ -928,4 +959,5 @@ def notify_completion(batch: RegulatoryInfoPackageBatch, exports: list[ExportedS | D11 | 追溯 Excel 可下载,JSON 只放后台 logs | | D12 | 本期不新增字段级数据库表 | | D13 | 工作流串行,文档生成节点内部可多线程 | -| D14 | 本轮只产出详细设计,不写代码、不生成迁移 | +| D14 | 模板优先字段化,正式填充路径使用内容控件 Tag 或稳定占位符,行标签定位仅作为兜底 | +| D15 | 本轮只产出详细设计,不写代码、不生成迁移 | diff --git a/docs/5.开发计划/5.第1章监管信息材料包生成.md b/docs/5.开发计划/5.第1章监管信息材料包生成.md index 812aa03..88e071d 100644 --- a/docs/5.开发计划/5.第1章监管信息材料包生成.md +++ b/docs/5.开发计划/5.第1章监管信息材料包生成.md @@ -19,7 +19,9 @@ ## 一、开发计划目标 -本开发计划面向 Codex 执行,目标是把 `regulatory_info_package` 独立工作流按可验证、可回滚、可阶段提交的方式落地。计划以现有自动填表工作流 `application_form_fill` 为主要参考,但保持独立模块、独立批次、独立产物、独立通知和独立前端卡片。 +本开发计划面向 Codex 执行,目标是把 `regulatory_info_package` 独立工作流按可验证、可回滚、可阶段验收的方式落地。计划以现有自动填表工作流 `application_form_fill` 为主要参考,但保持独立模块、独立批次、独立产物、独立通知和独立前端卡片。 + +现状裁决:当前最新代码中尚未存在 `regulatory_info_package` 正式工作流,本计划按“新建正式材料包工作流”执行;不得把该功能并入或改造 `application_form_fill`。 开发完成后,用户可在对话中上传或指定产品说明书,并通过“根据说明书生成第1章监管信息”触发工作流。系统基于 `docs/0.原始材料/第1章 监管信息` 样例模板生成 7 个监管信息文件,以 `第1章 监管信息(预生成版).zip` 作为首位下载入口,同时提供单文件和追溯 Excel 辅助下载。 @@ -32,18 +34,20 @@ | 工作流独立 | 新增 `workflow_type=regulatory_info_package`,不并入 `application_form_fill` | | 模块独立 | 新增 `review_agent/regulatory_info_package/`,服务与自动填表平级 | | 模型集中 | Django 模型继续放在 `review_agent/models.py` | +| 节点幂等 | RIP 节点必须基于 `workflow_type + workflow_batch_id + node_code` 做幂等创建或数据库唯一约束 | | 单说明书输入 | 用户消息指定文件名优先,其次 active 附件,再兼容最近成功文件汇总 | | 多候选处理 | 不做选择弹窗,通过对话反问用户确认说明书文件名 | | 模板固定 | 固定处理第1章监管信息 7 个模板 | +| 模板字段化 | 优先把模板整理为 Agent/代码可识别的字段模板,使用内容控件 Tag 或稳定占位符;代码只填字段,不依赖手工改格式 | | 抽取策略 | 规则抽取和 LLM 抽取并行,LLM 最多重试 3 次,失败后规则结果继续 | | 文档生成 | 工作流节点串行,`generate_docs` 节点内部每个文档独立线程处理 | -| `.doc` 策略 | CH1.9 优先原生 `.doc` 写入,失败后允许 `.docx` 兜底 | +| `.doc` 策略 | CH1.9 能力驱动:探测到 Word COM/UNO 时优先原生 `.doc`,无原生能力时明确记录并允许 `.docx` 兜底 | | zip 策略 | zip 只包含成功或兜底成功文件,失败文件不进入 zip | | 高亮策略 | 缺失项 `/` 黄底;LLM-only 黄底;冲突黄底红字 | | 追溯策略 | 用户下载 Excel;JSON 只写后台 logs 目录 | | 前端策略 | 只做最小接入,不单独建设新页面或独立样式体系 | | TDD | 新行为先写失败测试,再实现 | -| Git 提交 | 每阶段验证通过后生成提交摘要并本地提交 | +| Git 提交 | 每阶段验证通过后生成提交摘要;是否本地提交由用户确认 | | 用户变更保护 | 不回滚、不覆盖用户已有未提交变更 | --- @@ -156,7 +160,7 @@ pytest tests/test_file_summary_views.py -k download | 目标 | 生成数据库迁移并覆盖基础模型行为 | | 修改范围 | `review_agent/migrations/`、`tests/` | | 验收标准 | migration 可应用;模型测试覆盖批次号、状态、artifact、通知、zip export type | -| Codex 执行提示 | 请生成迁移并新增 `tests/test_regulatory_info_package_models.py`,优先覆盖模型字段默认值和导出类型。 | +| Codex 执行提示 | 请生成迁移并新增 `tests/test_regulatory_info_package_models.py`,优先覆盖模型字段默认值、导出类型,以及 `WorkflowNodeRun` 在 RIP 批次下的幂等/唯一节点创建。 | ### RIP-1 阶段验证 @@ -182,10 +186,10 @@ pytest tests/test_regulatory_info_package_models.py tests/test_file_summary_view | 项 | 内容 | | --- | --- | -| 目标 | 配置 7 个样例模板、输出文件名、策略和 `.doc` 标记 | +| 目标 | 配置 7 个样例模板、输出文件名、策略、字段 Tag/占位符映射和 `.doc` 标记 | | 修改范围 | `review_agent/regulatory_info_package/templates/regulatory_info_package_templates_v1.yaml` | -| 验收标准 | 7 个模板完整;zip 名称为 `第1章 监管信息(预生成版).zip` | -| Codex 执行提示 | 请按详细设计录入模板配置,source_dir 指向样例目录,CH1.9 必须声明 `require_legacy_doc_native: true`。 | +| 验收标准 | 7 个模板完整;zip 名称为 `第1章 监管信息(预生成版).zip`;字段映射优先使用内容控件 Tag 或稳定占位符 | +| Codex 执行提示 | 请按详细设计录入模板配置,source_dir 指向样例目录,字段 targets 优先写 content_control_tag 或 placeholder;CH1.9 声明 `prefer_legacy_doc_native: true` 且允许 docx fallback。 | ### RIP-2-003 实现配置加载、模板仓库和存储目录 @@ -193,8 +197,17 @@ pytest tests/test_regulatory_info_package_models.py tests/test_file_summary_view | --- | --- | | 目标 | 实现 YAML 加载校验、模板复制、批次目录创建、路径安全检查 | | 修改范围 | `template_config.py`、`template_repository.py`、`storage.py` | -| 验收标准 | 配置错误可返回清晰错误;模板只复制到批次目录;不写原始材料目录 | -| Codex 执行提示 | 请实现配置加载和模板复制服务,所有路径必须校验位于批次工作目录内,原始模板目录只读。 | +| 验收标准 | 配置错误可返回清晰错误;模板只复制到批次目录;不写原始材料目录;能审计模板是否包含所需 Tag/占位符 | +| Codex 执行提示 | 请实现配置加载、模板复制和模板字段审计服务,所有路径必须校验位于批次工作目录内,原始模板目录只读。 | + +### RIP-2-004 模板字段化整理与审计 + +| 项 | 内容 | +| --- | --- | +| 目标 | 将样例模板升级为代码友好的字段模板,不手工改生成文件格式 | +| 修改范围 | `docs/0.原始材料/第1章 监管信息` 的模板副本或 `review_agent/regulatory_info_package/templates/field_manifest.yaml` | +| 验收标准 | CH1.4 关键字段、复选框、声明类产品名/申请人位置有稳定 Tag 或占位符;审计缺失字段时测试失败 | +| Codex 执行提示 | 请优先使用 Word 内容控件 Tag;若暂不具备内容控件编辑能力,则使用不会影响版式的稳定占位符,并在配置中记录字段与目标位置。 | ### RIP-2 阶段验证 @@ -380,8 +393,8 @@ pytest tests/test_regulatory_info_package_docx_writer.py tests/test_regulatory_i | --- | --- | | 目标 | 探测 Word COM、LibreOffice UNO 或可用兜底能力 | | 修改范围 | `services/legacy_doc_document.py` | -| 验收标准 | 当前环境无原生能力时返回清晰 capability,不崩溃 | -| Codex 执行提示 | 请先实现能力探测和接口骨架,Windows Word COM 可作为优先实现;不可用时进入 docx 兜底。 | +| 验收标准 | 当前环境无原生能力时返回清晰 capability,不崩溃;测试不要求本机必须安装 Word 或 LibreOffice | +| Codex 执行提示 | 请先实现能力探测和接口骨架,Windows Word COM/LibreOffice UNO 可作为原生能力;不可用时明确进入 docx 兜底。 | ### RIP-7-002 实现 CH1.9 原生写入与 docx 兜底 @@ -389,8 +402,8 @@ pytest tests/test_regulatory_info_package_docx_writer.py tests/test_regulatory_i | --- | --- | | 目标 | CH1.9 优先 `.doc` 输出,失败时生成同语义 `.docx` | | 修改范围 | `legacy_doc_document.py`、`package_generate.py` | -| 验收标准 | 原生成功状态 success;兜底成功状态 fallback_success;两者失败不进入 zip | -| Codex 执行提示 | 请把原生失败和兜底失败都写入 `adapter_summary` 和 `risk_notes`,不要静默转换。 | +| 验收标准 | 有原生能力时原生成功状态 success;无原生能力或原生失败但兜底成功时状态 fallback_success;两者失败不进入 zip | +| Codex 执行提示 | 请把能力探测、原生失败和兜底失败都写入 `adapter_summary` 和 `risk_notes`,不要静默转换。 | ### RIP-7-003 补充 doc 适配器测试 @@ -565,9 +578,9 @@ pytest tests/test_regulatory_info_package_models.py tests/test_regulatory_info_p | 用户变更保护 | 不得回滚或覆盖用户已有未提交变更 | | 过程日志 | 每阶段记录关键命令结果和既有失败 | | 阶段验证 | 每阶段完成后运行对应验证命令 | -| 阶段提交 | 每阶段验证通过后生成提交摘要并本地提交 | +| 阶段提交 | 每阶段验证通过后生成提交摘要;是否执行 `git commit` 由用户确认 | | 回归保护 | 文件汇总、法规核查、自动填表现有测试不得回归 | -| doc 风险隔离 | `.doc` 原生处理失败不得阻断其他 6 个 docx 文件生成 | +| doc 风险隔离 | `.doc` 原生能力不可用或原生处理失败不得阻断其他 6 个 docx 文件生成 | | 外部依赖隔离 | LLM、通知、Word COM 均需可 mock,测试不依赖真实外部服务 | | 下载安全 | 所有导出下载必须通过所属用户权限校验 | @@ -588,7 +601,7 @@ pytest tests/test_regulatory_info_package_models.py tests/test_regulatory_info_p 5. 不回滚、不覆盖用户已有未提交变更。 6. LLM、通知、Word COM 等外部能力必须可 mock。 7. 每阶段完成后运行该阶段验证命令。 -8. 验证通过后生成提交摘要并本地提交。 +8. 验证通过后生成提交摘要,是否本地提交等待用户确认。 9. 最后使用 docs/0.原始材料/目标产品说明书.docx 做端到端验收。 ``` diff --git a/review_agent/file_summary/views.py b/review_agent/file_summary/views.py index b71ed75..f475d95 100644 --- a/review_agent/file_summary/views.py +++ b/review_agent/file_summary/views.py @@ -14,6 +14,7 @@ from review_agent.models import ( ExportedSummaryFile, FileAttachment, Message, + RegulatoryInfoPackageBatch, RegulatoryReviewBatch, ) from review_agent.models import FileSummaryBatch, WorkflowEvent @@ -304,14 +305,20 @@ def export_download(request, export_id: int): extra={"export_id": exported.pk, "storage_path": exported.storage_path}, ) return JsonResponse({"error": "文件不存在。"}, status=404) + suffix = Path(exported.file_name).suffix.lower() content_types = { ExportedSummaryFile.ExportType.MARKDOWN: "text/markdown; charset=utf-8", ExportedSummaryFile.ExportType.EXCEL: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", ExportedSummaryFile.ExportType.JSON: "application/json; charset=utf-8", ExportedSummaryFile.ExportType.WORD: "application/vnd.openxmlformats-officedocument.wordprocessingml.document", ExportedSummaryFile.ExportType.PDF: "application/pdf", + ExportedSummaryFile.ExportType.ZIP: "application/zip", } content_type = content_types.get(exported.export_type, "application/octet-stream") + if exported.export_type == ExportedSummaryFile.ExportType.WORD and suffix == ".doc": + content_type = "application/msword" + elif exported.export_type == ExportedSummaryFile.ExportType.WORD and suffix == ".docx": + content_type = "application/vnd.openxmlformats-officedocument.wordprocessingml.document" logger.info( "Export download started", extra={ @@ -342,6 +349,17 @@ def _export_for_user(user, export_id: int) -> ExportedSummaryFile | None: is_deleted=False, ).exists() return exported if allowed else None + if exported.workflow_type == "regulatory_info_package": + if not exported.workflow_batch_id: + return None + allowed = RegulatoryInfoPackageBatch.objects.filter( + pk=exported.workflow_batch_id, + conversation__user=user, + is_deleted=False, + ).exists() + return exported if allowed else None + if exported.batch_id is None: + return None if exported.batch.user_id != user.pk: return None return exported diff --git a/review_agent/migrations/0009_regulatoryinfopackageartifact_and_more.py b/review_agent/migrations/0009_regulatoryinfopackageartifact_and_more.py new file mode 100644 index 0000000..c36473d --- /dev/null +++ b/review_agent/migrations/0009_regulatoryinfopackageartifact_and_more.py @@ -0,0 +1,388 @@ +# Generated by Django 5.2.14 on 2026-06-10 11:12 + +import django.db.models.deletion +from django.conf import settings +from django.db import migrations, models + + +class Migration(migrations.Migration): + + dependencies = [ + ("review_agent", "0008_knowledgebasedocument"), + migrations.swappable_dependency(settings.AUTH_USER_MODEL), + ] + + operations = [ + migrations.CreateModel( + name="RegulatoryInfoPackageArtifact", + fields=[ + ( + "id", + models.BigAutoField( + auto_created=True, + primary_key=True, + serialize=False, + verbose_name="ID", + ), + ), + ( + "artifact_type", + models.CharField( + choices=[ + ("template_copy", "模板副本"), + ("instruction_extract", "说明书抽取结果"), + ("field_extract_result", "字段抽取结果"), + ("merged_fields", "合并字段"), + ("generated_document", "生成文件"), + ("traceability", "追溯清单"), + ("zip_package", "ZIP包"), + ("notification_record", "通知记录"), + ], + max_length=60, + ), + ), + ( + "file_format", + models.CharField( + choices=[ + ("json", "JSON"), + ("excel", "Excel"), + ("docx", "DOCX"), + ("doc", "DOC"), + ("zip", "ZIP"), + ("markdown", "Markdown"), + ], + max_length=20, + ), + ), + ("name", models.CharField(max_length=160)), + ("file_name", models.CharField(max_length=255)), + ("storage_path", models.CharField(max_length=500)), + ("file_size", models.BigIntegerField(default=0)), + ( + "content_hash", + models.CharField(blank=True, default="", max_length=128), + ), + ("metadata", models.JSONField(blank=True, default=dict)), + ( + "created_by_node", + models.CharField(blank=True, default="", max_length=60), + ), + ("created_at", models.DateTimeField(auto_now_add=True)), + ("is_deleted", models.BooleanField(default=False)), + ], + options={ + "db_table": "ra_regulatory_info_package_artifact", + "ordering": ["-created_at", "-id"], + }, + ), + migrations.CreateModel( + name="RegulatoryInfoPackageBatch", + fields=[ + ( + "id", + models.BigAutoField( + auto_created=True, + primary_key=True, + serialize=False, + verbose_name="ID", + ), + ), + ( + "source_summary_item_id", + models.PositiveBigIntegerField(blank=True, null=True), + ), + ("batch_no", models.CharField(max_length=64, unique=True)), + ( + "status", + models.CharField( + choices=[ + ("pending", "待执行"), + ("running", "执行中"), + ("waiting_user", "等待用户"), + ("success", "成功"), + ("partial_success", "部分成功"), + ("failed", "失败"), + ("cancelled", "已取消"), + ], + default="pending", + max_length=30, + ), + ), + ( + "source_file_name", + models.CharField(blank=True, default="", max_length=255), + ), + ( + "source_storage_path", + models.CharField(blank=True, default="", max_length=500), + ), + ( + "product_name", + models.CharField(blank=True, default="", max_length=200), + ), + ( + "output_zip_name", + models.CharField( + blank=True, + default="第1章 监管信息(预生成版).zip", + max_length=255, + ), + ), + ("generated_files", models.JSONField(blank=True, default=list)), + ("missing_fields", models.JSONField(blank=True, default=list)), + ("llm_only_fields", models.JSONField(blank=True, default=list)), + ("conflict_fields", models.JSONField(blank=True, default=list)), + ("risk_notes", models.JSONField(blank=True, default=list)), + ( + "template_config_version", + models.CharField(blank=True, default="", max_length=80), + ), + ( + "template_config_hash", + models.CharField(blank=True, default="", max_length=128), + ), + ("adapter_summary", models.JSONField(blank=True, default=dict)), + ("work_dir", models.CharField(blank=True, default="", max_length=500)), + ("error_message", models.TextField(blank=True, default="")), + ("created_at", models.DateTimeField(auto_now_add=True)), + ("started_at", models.DateTimeField(blank=True, null=True)), + ("finished_at", models.DateTimeField(blank=True, null=True)), + ("archived_at", models.DateTimeField(blank=True, null=True)), + ("is_deleted", models.BooleanField(default=False)), + ], + options={ + "db_table": "ra_regulatory_info_package_batch", + "ordering": ["-created_at", "-id"], + }, + ), + migrations.CreateModel( + name="RegulatoryInfoPackageNotificationRecord", + fields=[ + ( + "id", + models.BigAutoField( + auto_created=True, + primary_key=True, + serialize=False, + verbose_name="ID", + ), + ), + ( + "channel", + models.CharField( + choices=[ + ("feishu_cli", "飞书 CLI"), + ("feishu_api", "飞书 API"), + ("mock", "模拟"), + ], + default="mock", + max_length=30, + ), + ), + ("export_ids", models.JSONField(blank=True, default=list)), + ("message_summary", models.TextField(blank=True, default="")), + ( + "send_status", + models.CharField( + choices=[ + ("pending", "待发送"), + ("success", "成功"), + ("failed", "失败"), + ], + default="pending", + max_length=20, + ), + ), + ("retry_count", models.PositiveIntegerField(default=0)), + ( + "external_message_id", + models.CharField(blank=True, default="", max_length=120), + ), + ("error_message", models.TextField(blank=True, default="")), + ("sent_at", models.DateTimeField(blank=True, null=True)), + ("created_at", models.DateTimeField(auto_now_add=True)), + ("updated_at", models.DateTimeField(auto_now=True)), + ("is_deleted", models.BooleanField(default=False)), + ], + options={ + "db_table": "ra_regulatory_info_package_notification_record", + "ordering": ["-created_at", "-id"], + }, + ), + migrations.AlterField( + model_name="exportedsummaryfile", + name="batch", + field=models.ForeignKey( + blank=True, + null=True, + on_delete=django.db.models.deletion.CASCADE, + related_name="exports", + to="review_agent.filesummarybatch", + ), + ), + migrations.AlterField( + model_name="exportedsummaryfile", + name="export_type", + field=models.CharField( + choices=[ + ("markdown", "Markdown"), + ("excel", "Excel"), + ("json", "JSON"), + ("word", "Word"), + ("pdf", "PDF"), + ("zip", "ZIP"), + ], + max_length=20, + ), + ), + migrations.AddConstraint( + model_name="workflownoderun", + constraint=models.UniqueConstraint( + fields=("workflow_type", "workflow_batch_id", "node_code"), + name="uq_ra_node_workflow_batch_code", + ), + ), + migrations.AddField( + model_name="regulatoryinfopackagebatch", + name="conversation", + field=models.ForeignKey( + on_delete=django.db.models.deletion.CASCADE, + related_name="regulatory_info_package_batches", + to="review_agent.conversation", + ), + ), + migrations.AddField( + model_name="regulatoryinfopackagebatch", + name="source_attachment", + field=models.ForeignKey( + blank=True, + null=True, + on_delete=django.db.models.deletion.SET_NULL, + related_name="regulatory_info_package_batches", + to="review_agent.fileattachment", + ), + ), + migrations.AddField( + model_name="regulatoryinfopackagebatch", + name="source_summary_batch", + field=models.ForeignKey( + blank=True, + null=True, + on_delete=django.db.models.deletion.SET_NULL, + related_name="regulatory_info_package_batches", + to="review_agent.filesummarybatch", + ), + ), + migrations.AddField( + model_name="regulatoryinfopackagebatch", + name="trigger_message", + field=models.ForeignKey( + blank=True, + null=True, + on_delete=django.db.models.deletion.SET_NULL, + related_name="triggered_regulatory_info_package_batches", + to="review_agent.message", + ), + ), + migrations.AddField( + model_name="regulatoryinfopackagebatch", + name="user", + field=models.ForeignKey( + on_delete=django.db.models.deletion.CASCADE, + related_name="review_regulatory_info_package_batches", + to=settings.AUTH_USER_MODEL, + ), + ), + migrations.AddField( + model_name="regulatoryinfopackageartifact", + name="batch", + field=models.ForeignKey( + on_delete=django.db.models.deletion.CASCADE, + related_name="artifacts", + to="review_agent.regulatoryinfopackagebatch", + ), + ), + migrations.AddField( + model_name="regulatoryinfopackagenotificationrecord", + name="batch", + field=models.ForeignKey( + on_delete=django.db.models.deletion.CASCADE, + related_name="notifications", + to="review_agent.regulatoryinfopackagebatch", + ), + ), + migrations.AddField( + model_name="regulatoryinfopackagenotificationrecord", + name="recipient", + field=models.ForeignKey( + on_delete=django.db.models.deletion.CASCADE, + related_name="regulatory_info_package_notifications", + to=settings.AUTH_USER_MODEL, + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackagebatch", + index=models.Index( + fields=["conversation", "status"], name="idx_ra_rip_batch_conv_status" + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackagebatch", + index=models.Index( + fields=["user", "created_at"], name="idx_ra_rip_batch_user_created" + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackagebatch", + index=models.Index( + fields=["source_attachment"], name="idx_ra_rip_batch_attachment" + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackagebatch", + index=models.Index( + fields=["source_summary_batch"], name="idx_ra_rip_batch_summary" + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackagebatch", + index=models.Index(fields=["created_at"], name="idx_ra_rip_batch_created"), + ), + migrations.AddIndex( + model_name="regulatoryinfopackageartifact", + index=models.Index( + fields=["batch", "artifact_type"], name="idx_ra_rip_artifact_batch_type" + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackageartifact", + index=models.Index( + fields=["file_format"], name="idx_ra_rip_artifact_format" + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackageartifact", + index=models.Index( + fields=["created_at"], name="idx_ra_rip_artifact_created" + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackagenotificationrecord", + index=models.Index( + fields=["batch", "created_at"], name="idx_ra_rip_notify_batch" + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackagenotificationrecord", + index=models.Index( + fields=["recipient", "send_status"], name="idx_ra_rip_notify_recipient" + ), + ), + migrations.AddIndex( + model_name="regulatoryinfopackagenotificationrecord", + index=models.Index( + fields=["send_status", "retry_count"], name="idx_ra_rip_notify_status" + ), + ), + ] diff --git a/review_agent/models.py b/review_agent/models.py index 6189a69..16da526 100644 --- a/review_agent/models.py +++ b/review_agent/models.py @@ -280,7 +280,11 @@ class WorkflowNodeRun(models.Model): class Meta: db_table = "ra_workflow_node_run" constraints = [ - models.UniqueConstraint(fields=["batch", "node_code"], name="uq_ra_node_batch_code") + models.UniqueConstraint(fields=["batch", "node_code"], name="uq_ra_node_batch_code"), + models.UniqueConstraint( + fields=["workflow_type", "workflow_batch_id", "node_code"], + name="uq_ra_node_workflow_batch_code", + ), ] indexes = [ models.Index(fields=["batch", "status"], name="idx_ra_node_batch_status"), @@ -336,6 +340,7 @@ class ExportedSummaryFile(models.Model): JSON = "json", "JSON" WORD = "word", "Word" PDF = "pdf", "PDF" + ZIP = "zip", "ZIP" class Status(models.TextChoices): SUCCESS = "success", "成功" @@ -345,6 +350,8 @@ class ExportedSummaryFile(models.Model): FileSummaryBatch, on_delete=models.CASCADE, related_name="exports", + null=True, + blank=True, ) workflow_type = models.CharField(max_length=40, blank=True, default="file_summary") workflow_batch_id = models.PositiveBigIntegerField(null=True, blank=True) @@ -524,6 +531,87 @@ class ApplicationFormFillBatch(models.Model): return self.batch_no +class RegulatoryInfoPackageBatch(models.Model): + """Tracks one Chapter 1 regulatory information package workflow run.""" + + class Status(models.TextChoices): + PENDING = "pending", "待执行" + RUNNING = "running", "执行中" + WAITING_USER = "waiting_user", "等待用户" + SUCCESS = "success", "成功" + PARTIAL_SUCCESS = "partial_success", "部分成功" + FAILED = "failed", "失败" + CANCELLED = "cancelled", "已取消" + + conversation = models.ForeignKey( + Conversation, + on_delete=models.CASCADE, + related_name="regulatory_info_package_batches", + ) + user = models.ForeignKey( + settings.AUTH_USER_MODEL, + on_delete=models.CASCADE, + related_name="review_regulatory_info_package_batches", + ) + trigger_message = models.ForeignKey( + Message, + on_delete=models.SET_NULL, + null=True, + blank=True, + related_name="triggered_regulatory_info_package_batches", + ) + source_attachment = models.ForeignKey( + FileAttachment, + on_delete=models.SET_NULL, + null=True, + blank=True, + related_name="regulatory_info_package_batches", + ) + source_summary_batch = models.ForeignKey( + FileSummaryBatch, + on_delete=models.SET_NULL, + null=True, + blank=True, + related_name="regulatory_info_package_batches", + ) + source_summary_item_id = models.PositiveBigIntegerField(null=True, blank=True) + batch_no = models.CharField(max_length=64, unique=True) + status = models.CharField(max_length=30, choices=Status.choices, default=Status.PENDING) + source_file_name = models.CharField(max_length=255, blank=True, default="") + source_storage_path = models.CharField(max_length=500, blank=True, default="") + product_name = models.CharField(max_length=200, blank=True, default="") + output_zip_name = models.CharField(max_length=255, blank=True, default="第1章 监管信息(预生成版).zip") + generated_files = models.JSONField(default=list, blank=True) + missing_fields = models.JSONField(default=list, blank=True) + llm_only_fields = models.JSONField(default=list, blank=True) + conflict_fields = models.JSONField(default=list, blank=True) + risk_notes = models.JSONField(default=list, blank=True) + template_config_version = models.CharField(max_length=80, blank=True, default="") + template_config_hash = models.CharField(max_length=128, blank=True, default="") + adapter_summary = models.JSONField(default=dict, blank=True) + work_dir = models.CharField(max_length=500, blank=True, default="") + error_message = models.TextField(blank=True, default="") + created_at = models.DateTimeField(auto_now_add=True) + started_at = models.DateTimeField(null=True, blank=True) + finished_at = models.DateTimeField(null=True, blank=True) + archived_at = models.DateTimeField(null=True, blank=True) + is_deleted = models.BooleanField(default=False) + + class Meta: + db_table = "ra_regulatory_info_package_batch" + ordering = ["-created_at", "-id"] + indexes = [ + models.Index(fields=["conversation", "status"], name="idx_ra_rip_batch_conv_status"), + models.Index(fields=["user", "created_at"], name="idx_ra_rip_batch_user_created"), + models.Index(fields=["source_attachment"], name="idx_ra_rip_batch_attachment"), + models.Index(fields=["source_summary_batch"], name="idx_ra_rip_batch_summary"), + models.Index(fields=["created_at"], name="idx_ra_rip_batch_created"), + ] + + def __str__(self) -> str: + return self.batch_no + + class RegulatoryReviewBatch(models.Model): """Tracks one NMPA regulatory review workflow run.""" @@ -745,6 +833,54 @@ class ApplicationFormFillArtifact(models.Model): ] +class RegulatoryInfoPackageArtifact(models.Model): + """Stores regulatory information package intermediate and generated files.""" + + class ArtifactType(models.TextChoices): + TEMPLATE_COPY = "template_copy", "模板副本" + INSTRUCTION_EXTRACT = "instruction_extract", "说明书抽取结果" + FIELD_EXTRACT_RESULT = "field_extract_result", "字段抽取结果" + MERGED_FIELDS = "merged_fields", "合并字段" + GENERATED_DOCUMENT = "generated_document", "生成文件" + TRACEABILITY = "traceability", "追溯清单" + ZIP_PACKAGE = "zip_package", "ZIP包" + NOTIFICATION_RECORD = "notification_record", "通知记录" + + class FileFormat(models.TextChoices): + JSON = "json", "JSON" + EXCEL = "excel", "Excel" + DOCX = "docx", "DOCX" + DOC = "doc", "DOC" + ZIP = "zip", "ZIP" + MARKDOWN = "markdown", "Markdown" + + batch = models.ForeignKey( + RegulatoryInfoPackageBatch, + on_delete=models.CASCADE, + related_name="artifacts", + ) + artifact_type = models.CharField(max_length=60, choices=ArtifactType.choices) + file_format = models.CharField(max_length=20, choices=FileFormat.choices) + name = models.CharField(max_length=160) + file_name = models.CharField(max_length=255) + storage_path = models.CharField(max_length=500) + file_size = models.BigIntegerField(default=0) + content_hash = models.CharField(max_length=128, blank=True, default="") + metadata = models.JSONField(default=dict, blank=True) + created_by_node = models.CharField(max_length=60, blank=True, default="") + created_at = models.DateTimeField(auto_now_add=True) + is_deleted = models.BooleanField(default=False) + + class Meta: + db_table = "ra_regulatory_info_package_artifact" + ordering = ["-created_at", "-id"] + indexes = [ + models.Index(fields=["batch", "artifact_type"], name="idx_ra_rip_artifact_batch_type"), + models.Index(fields=["file_format"], name="idx_ra_rip_artifact_format"), + models.Index(fields=["created_at"], name="idx_ra_rip_artifact_created"), + ] + + class ApplicationFormFillNotificationRecord(models.Model): """Stores mock/Feishu notification records for application-form auto-fill.""" @@ -795,6 +931,55 @@ class ApplicationFormFillNotificationRecord(models.Model): ] +class RegulatoryInfoPackageNotificationRecord(models.Model): + """Stores mock/Feishu notification records for regulatory info packages.""" + + class Channel(models.TextChoices): + FEISHU_CLI = "feishu_cli", "飞书 CLI" + FEISHU_API = "feishu_api", "飞书 API" + MOCK = "mock", "模拟" + + class SendStatus(models.TextChoices): + PENDING = "pending", "待发送" + SUCCESS = "success", "成功" + FAILED = "failed", "失败" + + batch = models.ForeignKey( + RegulatoryInfoPackageBatch, + on_delete=models.CASCADE, + related_name="notifications", + ) + recipient = models.ForeignKey( + settings.AUTH_USER_MODEL, + on_delete=models.CASCADE, + related_name="regulatory_info_package_notifications", + ) + channel = models.CharField(max_length=30, choices=Channel.choices, default=Channel.MOCK) + export_ids = models.JSONField(default=list, blank=True) + message_summary = models.TextField(blank=True, default="") + send_status = models.CharField( + max_length=20, + choices=SendStatus.choices, + default=SendStatus.PENDING, + ) + retry_count = models.PositiveIntegerField(default=0) + external_message_id = models.CharField(max_length=120, blank=True, default="") + error_message = models.TextField(blank=True, default="") + sent_at = models.DateTimeField(null=True, blank=True) + created_at = models.DateTimeField(auto_now_add=True) + updated_at = models.DateTimeField(auto_now=True) + is_deleted = models.BooleanField(default=False) + + class Meta: + db_table = "ra_regulatory_info_package_notification_record" + ordering = ["-created_at", "-id"] + indexes = [ + models.Index(fields=["batch", "created_at"], name="idx_ra_rip_notify_batch"), + models.Index(fields=["recipient", "send_status"], name="idx_ra_rip_notify_recipient"), + models.Index(fields=["send_status", "retry_count"], name="idx_ra_rip_notify_status"), + ] + + class FeishuUserMapping(models.Model): """Maps a system user to Feishu identifiers maintained by Admin.""" diff --git a/review_agent/regulatory_info_package/__init__.py b/review_agent/regulatory_info_package/__init__.py new file mode 100644 index 0000000..3026f19 --- /dev/null +++ b/review_agent/regulatory_info_package/__init__.py @@ -0,0 +1,2 @@ +"""Chapter 1 regulatory information package workflow.""" + diff --git a/review_agent/regulatory_info_package/constants.py b/review_agent/regulatory_info_package/constants.py new file mode 100644 index 0000000..adaf007 --- /dev/null +++ b/review_agent/regulatory_info_package/constants.py @@ -0,0 +1,30 @@ +WORKFLOW_TYPE = "regulatory_info_package" +DEFAULT_ZIP_NAME = "第1章 监管信息(预生成版).zip" + +REGULATORY_INFO_PACKAGE_TRIGGER_KEYWORDS = [ + "根据说明书生成第1章监管信息", + "生成监管信息材料包", + "从说明书生成第1章材料", + "第1章监管信息", + "监管信息材料包", +] + +REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS = [ + ("prepare", "准备资料", "regulatory_info_package"), + ("template_copy", "复制模板", "regulatory_info_package"), + ("text_extract", "抽取说明书", "regulatory_info_package"), + ("field_extract", "抽取字段", "regulatory_info_package"), + ("field_merge", "合并字段", "regulatory_info_package"), + ("generate_docs", "生成材料", "regulatory_info_package"), + ("highlight_review_items", "标记待确认", "regulatory_info_package"), + ("trace_export", "追溯清单", "regulatory_info_package"), + ("zip_export", "打包下载", "regulatory_info_package"), + ("notify", "通知", "regulatory_info_package"), + ("completed", "完成", "completed"), +] + +GENERATED_FILE_SUCCESS = "success" +GENERATED_FILE_FALLBACK_SUCCESS = "fallback_success" +GENERATED_FILE_FAILED = "failed" +GENERATED_FILE_SKIPPED = "skipped" + diff --git a/review_agent/regulatory_info_package/events.py b/review_agent/regulatory_info_package/events.py new file mode 100644 index 0000000..7d12e93 --- /dev/null +++ b/review_agent/regulatory_info_package/events.py @@ -0,0 +1,15 @@ +from __future__ import annotations + +from review_agent.regulatory_info_package.constants import WORKFLOW_TYPE +from review_agent.models import RegulatoryInfoPackageBatch, WorkflowEvent + + +def record_event(batch: RegulatoryInfoPackageBatch, event_type: str, payload: dict | None = None) -> WorkflowEvent: + return WorkflowEvent.objects.create( + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=batch.pk, + conversation=batch.conversation, + event_type=event_type, + payload=payload or {}, + ) + diff --git a/review_agent/regulatory_info_package/schemas.py b/review_agent/regulatory_info_package/schemas.py new file mode 100644 index 0000000..2f61dd2 --- /dev/null +++ b/review_agent/regulatory_info_package/schemas.py @@ -0,0 +1,58 @@ +from __future__ import annotations + +from dataclasses import dataclass, field +from typing import Any + + +@dataclass(frozen=True) +class TemplateSpec: + code: str + output_name: str + source_file: str + file_format: str + strategy: str + include_in_zip: bool + prefer_legacy_doc_native: bool = False + allow_docx_fallback: bool = True + fields: list[dict[str, Any]] = field(default_factory=list) + + +@dataclass +class InstructionExtractResult: + source_file_name: str + paragraphs: list[str] + sections: dict[str, str] + tables: list[list[list[str]]] + component_tables: list[dict[str, Any]] + front_text: str + + +@dataclass +class MergedField: + key: str + label: str + value: str + source: str + evidence: str + confidence: float + highlight_reason: str = "none" + needs_review: bool = False + rule_value: str = "" + llm_value: str = "" + + +@dataclass +class GeneratedFileResult: + template_code: str + file_name: str + requested_format: str + actual_format: str + status: str + path: str = "" + artifact_id: int | None = None + export_id: int | None = None + highlight_count: int = 0 + missing_count: int = 0 + llm_only_count: int = 0 + error_message: str = "" + diff --git a/review_agent/regulatory_info_package/services/__init__.py b/review_agent/regulatory_info_package/services/__init__.py new file mode 100644 index 0000000..0f7ff23 --- /dev/null +++ b/review_agent/regulatory_info_package/services/__init__.py @@ -0,0 +1,2 @@ +"""Services for the regulatory information package workflow.""" + diff --git a/review_agent/regulatory_info_package/services/docx_document.py b/review_agent/regulatory_info_package/services/docx_document.py new file mode 100644 index 0000000..c7b5629 --- /dev/null +++ b/review_agent/regulatory_info_package/services/docx_document.py @@ -0,0 +1,322 @@ +from __future__ import annotations + +import json +import re +from pathlib import Path + +from docx import Document +from docx.enum.text import WD_COLOR_INDEX +from docx.shared import RGBColor +from django.utils import timezone + +from review_agent.regulatory_info_package.schemas import MergedField + + +PLACEHOLDER_RE = re.compile(r"\{\{([a-zA-Z0-9_]+)\}\}") + + +def write_docx_from_template( + source_path: str | Path, + output_path: str | Path, + merged_fields: dict[str, MergedField], + *, + template_code: str = "", + directory_page_numbers: dict[str, str] | None = None, +) -> tuple[int, int, int]: + source = Path(source_path) + output = Path(output_path) + output.parent.mkdir(parents=True, exist_ok=True) + if source.exists(): + document = Document(source) + else: + document = Document() + replacements = {f"{{{{{key}}}}}": field for key, field in merged_fields.items()} + highlight_count = 0 + missing_count = 0 + llm_only_count = 0 + highlight_count += _apply_known_template_replacements(document, merged_fields, template_code=template_code) + if template_code == "ch1_5_product_list": + _rebuild_product_list_table(document, merged_fields) + if template_code == "ch1_2_directory": + _apply_directory_page_numbers(document, directory_page_numbers or {}) + paragraph_counts = _replace_placeholders(document, replacements, merged_fields) + highlight_count += paragraph_counts[0] + missing_count += paragraph_counts[1] + llm_only_count += paragraph_counts[2] + document.save(output) + return highlight_count, missing_count, llm_only_count + + +def _replace_paragraph_text(paragraph, text: str, field: MergedField) -> None: + for run in paragraph.runs: + run.text = "" + run = paragraph.add_run(text) + if field.highlight_reason != "none": + run.font.highlight_color = WD_COLOR_INDEX.YELLOW + if field.highlight_reason == "conflict": + run.font.color.rgb = RGBColor(255, 0, 0) + + +def _apply_directory_page_numbers(document, page_numbers: dict[str, str]) -> None: + for table in document.tables: + if not table.rows: + continue + header = [cell.text.strip() for cell in table.rows[0].cells] + if len(header) < 5 or header[0] != "RPS目录" or header[4] != "页码": + continue + for row in table.rows[1:]: + code = row.cells[0].text.strip() + if code in page_numbers: + row.cells[4].text = page_numbers[code] + return + + +def _replace_placeholders( + document, + replacements: dict[str, MergedField], + merged_fields: dict[str, MergedField], +) -> tuple[int, int, int]: + highlight_count = 0 + missing_count = 0 + llm_only_count = 0 + for paragraph in _iter_paragraphs(document): + text = paragraph.text + if "{{" not in text or "}}" not in text: + continue + used_fields: list[MergedField] = [] + + def replace(match: re.Match[str]) -> str: + key = match.group(1) + placeholder = match.group(0) + field = replacements.get(placeholder) or _default_placeholder_field(key, merged_fields) + used_fields.append(field) + return field.value + + new_text = PLACEHOLDER_RE.sub(replace, text) + if new_text == text: + continue + field_for_style = next((field for field in used_fields if field.highlight_reason != "none"), None) or used_fields[0] + _replace_paragraph_text(paragraph, new_text, field_for_style) + for field in used_fields: + if field.highlight_reason != "none": + highlight_count += 1 + if field.highlight_reason == "missing": + missing_count += 1 + if field.highlight_reason == "llm_only": + llm_only_count += 1 + return highlight_count, missing_count, llm_only_count + + +def _iter_paragraphs(document): + yield from document.paragraphs + for table in document.tables: + for row in table.rows: + for cell in row.cells: + yield from cell.paragraphs + + +def _apply_known_template_replacements(document, merged_fields: dict[str, MergedField], *, template_code: str = "") -> int: + product = _field_value(merged_fields, "product_name") + applicant = _field_value(merged_fields, "applicant_name") + today = timezone.localdate().strftime("%Y年%m月%d日") + replacements = { + "xxxx年xx月xx日": today, + "XXXX年XX月XX日": today, + "xxxx 年 xx 月 xx 日": today, + "XXXX 年 XX 月 XX 日": today, + "2023年09月20日": today, + "2023 年 10 月": today[:8], + } + if not template_code.startswith("ch1_11"): + replacements.update({ + "呼吸道合胞病毒、肺炎支原体核酸检测试剂盒(荧光PCR法)": product, + "呼吸道合胞病毒、肺炎支原体核酸检测试剂盒": product, + "呼吸道合胞病毒 、肺炎支产品名称: 原体核酸检测试剂盒(荧": f"产品名称:{product}", + "光PCR法)": "", + "卡尤迪生物科技宜兴有限公司": applicant, + }) + changed = 0 + for paragraph in document.paragraphs: + changed += _replace_text_in_paragraph(paragraph, replacements, merged_fields) + for table in document.tables: + for row in table.rows: + for cell in row.cells: + for paragraph in cell.paragraphs: + changed += _replace_text_in_paragraph(paragraph, replacements, merged_fields) + return changed + + +def _default_placeholder_field(key: str, merged_fields: dict[str, MergedField]) -> MergedField: + if key == "declaration_date": + return _plain_field(key, "日期", timezone.localdate().strftime("%Y年%m月%d日")) + label = key + for field in merged_fields.values(): + if field.key == key: + label = field.label + break + return MergedField( + key=key, + label=label, + value="/", + source="missing", + evidence="模板字段未从说明书中抽取到", + confidence=0.0, + highlight_reason="missing", + needs_review=True, + ) + + +def _replace_text_in_paragraph(paragraph, replacements: dict[str, str], merged_fields: dict[str, MergedField]) -> int: + text = paragraph.text + new_text = text + for old, new in replacements.items(): + if old in new_text: + new_text = new_text.replace(old, new) + if new_text == text: + return 0 + field = merged_fields.get("product_name") or MergedField( + key="product_name", + label="产品名称", + value=new_text, + source="rule", + evidence="", + confidence=0.0, + ) + _replace_paragraph_text(paragraph, new_text, field) + return 1 + + +def _rebuild_product_list_table(document, merged_fields: dict[str, MergedField]) -> None: + product = _field_value(merged_fields, "product_name") + package_specification = _field_value(merged_fields, "package_specification") + component_table = _component_table_payload(merged_fields) + component_notes = _field_value(merged_fields, "component_notes") + for paragraph in document.paragraphs: + if "的包装规格、货号、组分及主要组成成分见下表" in paragraph.text: + _replace_paragraph_text( + paragraph, + f"{product}的包装规格、货号、组分及主要组成成分见下表:", + merged_fields.get("product_name") or _plain_field("product_name", "产品名称", product), + ) + if "规格A和规格B的区别" in paragraph.text and component_notes != "/": + _replace_paragraph_text( + paragraph, + component_notes, + merged_fields.get("component_notes") or _plain_field("component_notes", "主要组成成分备注", component_notes), + ) + target = None + for table in document.tables: + header = [cell.text.strip() for cell in table.rows[0].cells] if table.rows else [] + if header[:6] == ["包装规格", "货号", "组成", "组分", "主要组成成分", "规格/数量"]: + target = table + break + specs = _component_specs(component_table) or [ + (spec, None) for spec in [item.strip() for item in package_specification.replace(";", ";").split(";") if item.strip()] + ] + if target is not None: + _clear_table_body(target) + if component_table: + _fill_product_component_table(target, component_table, specs) + else: + if not specs: + specs = [("/", None)] + for spec, _index in specs[:8]: + cells = target.add_row().cells + cells[0].text = spec + cells[1].text = "/" + cells[2].text = _field_value(merged_fields, "composition") + cells[3].text = _field_value(merged_fields, "component_name") + cells[4].text = _field_value(merged_fields, "main_component") + cells[5].text = _field_value(merged_fields, "quantity") + if component_table: + _rebuild_component_comparison_table(document, component_table, specs) + + +def _field_value(merged_fields: dict[str, MergedField], key: str) -> str: + field = merged_fields.get(key) + if not field or not field.value: + return "/" + return field.value + + +def _plain_field(key: str, label: str, value: str) -> MergedField: + return MergedField(key=key, label=label, value=value, source="rule", evidence="", confidence=0.0) + + +def _component_table_payload(merged_fields: dict[str, MergedField]) -> dict: + field = merged_fields.get("component_table") + if not field or not field.value or field.value == "/": + return {} + try: + payload = json.loads(field.value) + except json.JSONDecodeError: + return {} + if not isinstance(payload, dict): + return {} + rows = payload.get("rows") or [] + header = payload.get("header") or [] + if not isinstance(header, list) or not isinstance(rows, list): + return {} + return {"header": header, "rows": rows} + + +def _component_specs(component_table: dict) -> list[tuple[str, int]]: + header = component_table.get("header") or [] + specs: list[tuple[str, int]] = [] + for index, value in enumerate(header[2:], start=2): + label = str(value or "").strip() + if not label: + continue + label = label.replace("规格(", "").replace("规格(", "").rstrip("))") + specs.append((label, index)) + return specs + + +def _clear_table_body(table) -> None: + while len(table.rows) > 1: + table._tbl.remove(table.rows[-1]._tr) + + +def _fill_product_component_table(table, component_table: dict, specs: list[tuple[str, int]]) -> None: + rows = component_table.get("rows") or [] + for spec_label, spec_index in specs: + for row in rows: + cells = table.add_row().cells + cells[0].text = spec_label + cells[1].text = "/" + cells[2].text = "/" + cells[3].text = _row_value(row, 0) + cells[4].text = _row_value(row, 1) + cells[5].text = _row_value(row, spec_index or 0) + + +def _rebuild_component_comparison_table(document, component_table: dict, specs: list[tuple[str, int]]) -> None: + target = None + for table in document.tables: + header = [cell.text.strip() for cell in table.rows[0].cells] if table.rows else [] + if header and header[0] == "组分名称": + target = table + break + if target is None: + return + _clear_table_body(target) + header_cells = target.rows[0].cells + labels = ["组分名称", *[spec for spec, _index in specs[: len(header_cells) - 1]]] + while len(labels) < len(header_cells): + labels.append("备注") + for index, label in enumerate(labels[: len(header_cells)]): + header_cells[index].text = label + for row in component_table.get("rows") or []: + cells = target.add_row().cells + cells[0].text = _row_value(row, 0) + for cell_index, (_spec_label, spec_index) in enumerate(specs[: len(cells) - 1], start=1): + cells[cell_index].text = _row_value(row, spec_index) + for cell_index in range(len(specs[: len(cells) - 1]) + 1, len(cells)): + cells[cell_index].text = "/" + + +def _row_value(row, index: int) -> str: + if not isinstance(row, list) or index >= len(row): + return "/" + value = str(row[index] or "").strip() + return value or "/" diff --git a/review_agent/regulatory_info_package/services/field_extract.py b/review_agent/regulatory_info_package/services/field_extract.py new file mode 100644 index 0000000..d2342d3 --- /dev/null +++ b/review_agent/regulatory_info_package/services/field_extract.py @@ -0,0 +1,171 @@ +from __future__ import annotations + +import json +import re +import time +from concurrent.futures import ThreadPoolExecutor +from pathlib import Path +from typing import Callable + +from review_agent.llm import generate_completion +from review_agent.regulatory_info_package.schemas import InstructionExtractResult + + +FIELD_PATTERNS = { + "product_name": ("产品名称", r"产品名称[::\s]*([^\n\r]+)"), + "applicant_name": ("申请人名称", r"(?:申请人名称|注册人/售后服务单位名称|注册人名称|售后服务单位名称|生产企业名称)[::\s]*([^\n\r]+)"), + "manufacturer_name": ("生产企业名称", r"生产企业名称[::\s]*([^\n\r]+)"), + "applicant_address": ("申请人住所", r"(?:申请人住所|注册人住所|生产企业住所)[::\s]*([^\n\r]+)"), + "applicant_contact": ("申请人联系方式", r"(?:联系方式|联系电话|电话)[::\s]*([^\n\r]+)"), + "production_address": ("生产地址", r"生产地址[::\s]*([^\n\r]+)"), + "storage_condition": ("储存条件", r"(?:储存条件|贮存条件|保存条件)[::\s]*([^\n\r]+)"), + "intended_use": ("预期用途", r"预期用途[::\s]*([^\n\r]+)"), + "package_specification": ("包装规格", r"(?:包装规格|规格)[::\s]*([^\n\r]+)"), + "sample_type": ("样本类型", r"样本类型[::\s]*([^\n\r]+)"), + "applicable_instrument": ("适用仪器", r"适用仪器[::\s]*([^\n\r]+)"), + "standard_no": ("标准号", r"((?:GB|YY|WS|T/C[A-Z0-9]*)[ /T0-9.\-—]+)"), +} + + +def extract_fields_by_rules(instruction: InstructionExtractResult) -> dict[str, dict]: + text = "\n".join([instruction.front_text, *instruction.paragraphs, *instruction.sections.values()]) + results: dict[str, dict] = {} + for key, (label, pattern) in FIELD_PATTERNS.items(): + section_value = _value_after_label_paragraph(instruction.paragraphs, label) + if section_value: + results[key] = { + "label": label, + "value": section_value, + "evidence": f"【{label}】\n{section_value}", + "confidence": 0.82, + "source": "rule", + } + continue + match = re.search(pattern, text, flags=re.IGNORECASE) + if match: + value = _clean_value(match.group(1)) + if value: + results[key] = { + "label": label, + "value": value, + "evidence": match.group(0)[:240], + "confidence": 0.75, + "source": "rule", + } + component_table = _best_component_table(instruction.component_tables) + if component_table: + results["component_table"] = { + "label": "主要组成成分", + "value": json.dumps(component_table, ensure_ascii=False), + "evidence": "说明书【主要组成成分】表格", + "confidence": 0.86, + "source": "rule", + } + component_notes = _component_notes(instruction.sections) + if component_notes: + results["component_notes"] = { + "label": "主要组成成分备注", + "value": component_notes, + "evidence": "说明书【主要组成成分】段落", + "confidence": 0.8, + "source": "rule", + } + return results + + +def extract_fields_with_llm(instruction: InstructionExtractResult) -> dict[str, dict]: + prompt = ( + "请从体外诊断试剂产品说明书中抽取字段,输出 JSON 对象,字段包括 " + "product_name、storage_condition、intended_use、package_specification、sample_type、applicable_instrument、standard_no。" + "每个字段值为 {label,value,evidence,confidence}。\n\n" + + instruction.front_text[:6000] + ) + raw = generate_completion([{"role": "user", "content": prompt}], temperature=0.0) + payload = _parse_json_object(raw) + return {key: value for key, value in payload.items() if isinstance(value, dict)} + + +def run_llm_extract_with_retry( + instruction: InstructionExtractResult, + *, + llm_extract_func: Callable[[InstructionExtractResult], dict[str, dict]] | None = None, + sleep_func: Callable[[float], None] = time.sleep, +) -> dict[str, dict]: + func = llm_extract_func or extract_fields_with_llm + last_exc: Exception | None = None + for delay in [0, 1, 2]: + if delay: + sleep_func(delay) + try: + return func(instruction) + except Exception as exc: + last_exc = exc + if last_exc: + raise last_exc + return {} + + +def run_parallel_extract( + instruction: InstructionExtractResult, + *, + llm_extract_func: Callable[[InstructionExtractResult], dict[str, dict]] | None = None, +) -> dict: + payload = {"regex_results": {}, "llm_results": {}, "llm_error": ""} + with ThreadPoolExecutor(max_workers=2) as executor: + rule_future = executor.submit(extract_fields_by_rules, instruction) + llm_future = executor.submit(run_llm_extract_with_retry, instruction, llm_extract_func=llm_extract_func) + payload["regex_results"] = rule_future.result() + try: + payload["llm_results"] = llm_future.result() + except Exception as exc: + payload["llm_error"] = str(exc) + return payload + + +def save_field_extract_result(path: str | Path, payload: dict) -> Path: + target = Path(path) + target.parent.mkdir(parents=True, exist_ok=True) + target.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8") + return target + + +def _clean_value(value: str) -> str: + cleaned = value.strip() + if cleaned in {"】", "】】", "】:"}: + return "" + return re.split(r"[。;;]", cleaned)[0].strip() + + +def _value_after_label_paragraph(paragraphs: list[str], label: str) -> str: + bracketed = {f"【{label}】", f"[{label}]", label} + for index, text in enumerate(paragraphs): + stripped = text.strip() + if stripped in bracketed and index + 1 < len(paragraphs): + return _clean_value(paragraphs[index + 1]) + return "" + + +def _parse_json_object(raw: str) -> dict: + text = (raw or "").strip() + if text.startswith("```"): + text = text.strip("`").strip() + if text.lower().startswith("json"): + text = text[4:].strip() + start = text.find("{") + end = text.rfind("}") + if start == -1 or end == -1: + return {} + return json.loads(text[start : end + 1]) + + +def _best_component_table(component_tables: list[dict]) -> dict: + if not component_tables: + return {} + return max(component_tables, key=lambda table: len(table.get("rows") or [])) + + +def _component_notes(sections: dict[str, str]) -> str: + for key, value in sections.items(): + if "主要组成" in key: + return value.strip() + return "" diff --git a/review_agent/regulatory_info_package/services/field_merge.py b/review_agent/regulatory_info_package/services/field_merge.py new file mode 100644 index 0000000..5e9aff7 --- /dev/null +++ b/review_agent/regulatory_info_package/services/field_merge.py @@ -0,0 +1,115 @@ +from __future__ import annotations + +import json +from pathlib import Path + +from review_agent.regulatory_info_package.schemas import MergedField + + +REQUIRED_FIELDS = { + "product_name": "产品名称", + "applicant_name": "申请人名称", + "package_specification": "包装规格", + "intended_use": "预期用途", + "storage_condition": "储存条件", +} + + +def merge_fields(rule_results: dict[str, dict], llm_results: dict[str, dict]) -> tuple[dict[str, MergedField], dict[str, list[dict]]]: + merged: dict[str, MergedField] = {} + missing_fields: list[dict] = [] + llm_only_fields: list[dict] = [] + conflict_fields: list[dict] = [] + keys = set(REQUIRED_FIELDS) | set(rule_results) | set(llm_results) + for key in sorted(keys): + rule = rule_results.get(key) or {} + llm = llm_results.get(key) or {} + rule_value = str(rule.get("value") or "").strip() + llm_value = str(llm.get("value") or "").strip() + label = str(rule.get("label") or llm.get("label") or REQUIRED_FIELDS.get(key) or key) + if rule_value and llm_value and rule_value != llm_value: + field = MergedField( + key=key, + label=label, + value=rule_value, + source="rule_conflict", + evidence=str(rule.get("evidence") or ""), + confidence=float(rule.get("confidence") or 0.0), + highlight_reason="conflict", + needs_review=True, + rule_value=rule_value, + llm_value=llm_value, + ) + conflict_fields.append( + { + "field_key": key, + "field_label": label, + "rule_value": rule_value, + "llm_value": llm_value, + "selected_value": rule_value, + "handling": "规则优先,写入值高亮并进入追溯清单", + } + ) + elif rule_value: + field = MergedField( + key=key, + label=label, + value=rule_value, + source="rule", + evidence=str(rule.get("evidence") or ""), + confidence=float(rule.get("confidence") or 0.0), + ) + elif llm_value: + field = MergedField( + key=key, + label=label, + value=llm_value, + source="llm", + evidence=str(llm.get("evidence") or ""), + confidence=float(llm.get("confidence") or 0.0), + highlight_reason="llm_only", + needs_review=True, + llm_value=llm_value, + ) + llm_only_fields.append(_review_dict(field)) + else: + field = MergedField( + key=key, + label=label, + value="/", + source="missing", + evidence="", + confidence=0.0, + highlight_reason="missing", + needs_review=True, + ) + missing_fields.append(_review_dict(field)) + merged[key] = field + return merged, { + "missing_fields": missing_fields, + "llm_only_fields": llm_only_fields, + "conflict_fields": conflict_fields, + } + + +def save_merged_fields(path: str | Path, merged: dict[str, MergedField], summary: dict[str, list[dict]]) -> Path: + target = Path(path) + target.parent.mkdir(parents=True, exist_ok=True) + payload = { + "fields": {key: field.__dict__ for key, field in merged.items()}, + **summary, + } + target.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8") + return target + + +def _review_dict(field: MergedField) -> dict: + return { + "target_file": "", + "field_key": field.key, + "field_label": field.label, + "final_value": field.value, + "highlight_reason": field.highlight_reason, + "needs_review": field.needs_review, + } + diff --git a/review_agent/regulatory_info_package/services/input_select.py b/review_agent/regulatory_info_package/services/input_select.py new file mode 100644 index 0000000..a269ab4 --- /dev/null +++ b/review_agent/regulatory_info_package/services/input_select.py @@ -0,0 +1,105 @@ +from __future__ import annotations + +from dataclasses import dataclass, field +from pathlib import Path + +from review_agent.models import Conversation, FileAttachment, FileSummaryBatch, FileSummaryItem + + +@dataclass +class InstructionInputSelection: + status: str + file_name: str = "" + storage_path: str = "" + attachment: FileAttachment | None = None + source_summary_batch: FileSummaryBatch | None = None + source_summary_item_id: int | None = None + candidates: list[str] = field(default_factory=list) + message: str = "" + + +def select_instruction_input(conversation: Conversation, message: str) -> InstructionInputSelection: + candidates = _active_docx_attachments(conversation) + named = _match_by_message(candidates, message) + if len(named) == 1: + return _selection_from_attachment(named[0]) + instruction_candidates = [item for item in candidates if "说明书" in item.original_name] + if len(instruction_candidates) == 1: + return _selection_from_attachment(instruction_candidates[0]) + if len(candidates) == 1: + return _selection_from_attachment(candidates[0]) + if len(instruction_candidates) > 1 or len(candidates) > 1: + names = [item.original_name for item in (instruction_candidates or candidates)] + return InstructionInputSelection( + status="waiting_user", + candidates=names, + message="请确认用于生成第1章监管信息的说明书文件名:" + "、".join(names), + ) + summary_selection = _select_from_latest_summary(conversation, message) + if summary_selection: + return summary_selection + return InstructionInputSelection(status="missing", message="请先上传产品说明书 docx 文件。") + + +def _active_docx_attachments(conversation: Conversation) -> list[FileAttachment]: + return list( + FileAttachment.objects.filter( + conversation=conversation, + is_active=True, + ) + .exclude(upload_status=FileAttachment.UploadStatus.DELETED) + .filter(original_name__iendswith=".docx") + .order_by("original_name", "-version_no") + ) + + +def _match_by_message(candidates: list[FileAttachment], message: str) -> list[FileAttachment]: + compact = "".join((message or "").lower().split()) + matched = [] + for attachment in candidates: + stem = Path(attachment.original_name).stem.lower() + name = attachment.original_name.lower() + if stem and stem in compact or name and name in compact: + matched.append(attachment) + return matched + + +def _selection_from_attachment(attachment: FileAttachment) -> InstructionInputSelection: + return InstructionInputSelection( + status="selected", + file_name=attachment.original_name, + storage_path=attachment.storage_path, + attachment=attachment, + ) + + +def _select_from_latest_summary(conversation: Conversation, message: str) -> InstructionInputSelection | None: + batch = ( + FileSummaryBatch.objects.filter(conversation=conversation, status=FileSummaryBatch.Status.SUCCESS) + .order_by("-finished_at", "-created_at", "-id") + .first() + ) + if not batch: + return None + items = list(batch.items.filter(file_name__iendswith=".docx").order_by("file_name", "id")) + compact = "".join((message or "").lower().split()) + named = [item for item in items if Path(item.file_name).stem.lower() in compact or item.file_name.lower() in compact] + candidates = named or [item for item in items if "说明书" in item.file_name] + if len(candidates) == 1: + item = candidates[0] + return InstructionInputSelection( + status="selected", + file_name=item.file_name, + storage_path=item.storage_path, + source_summary_batch=batch, + source_summary_item_id=item.pk, + ) + if len(candidates) > 1: + return InstructionInputSelection( + status="waiting_user", + source_summary_batch=batch, + candidates=[item.file_name for item in candidates], + message="请确认用于生成第1章监管信息的说明书文件名:" + "、".join(item.file_name for item in candidates), + ) + return None + diff --git a/review_agent/regulatory_info_package/services/instruction_extract.py b/review_agent/regulatory_info_package/services/instruction_extract.py new file mode 100644 index 0000000..9a3829e --- /dev/null +++ b/review_agent/regulatory_info_package/services/instruction_extract.py @@ -0,0 +1,77 @@ +from __future__ import annotations + +import json +from pathlib import Path + +from docx import Document + +from review_agent.regulatory_info_package.schemas import InstructionExtractResult + + +def parse_instruction_docx(path: str | Path) -> InstructionExtractResult: + file_path = Path(path) + document = Document(file_path) + paragraphs = [paragraph.text.strip() for paragraph in document.paragraphs if paragraph.text.strip()] + tables = [] + for table in document.tables: + rows = [] + for row in table.rows: + rows.append([" ".join(cell.text.split()) for cell in row.cells]) + if rows: + tables.append(rows) + sections = _build_sections(paragraphs) + front_text = "\n".join(paragraphs[:30]) + return InstructionExtractResult( + source_file_name=file_path.name, + paragraphs=paragraphs, + sections=sections, + tables=tables, + component_tables=_component_tables(tables), + front_text=front_text, + ) + + +def save_instruction_extract_json(path: str | Path, result: InstructionExtractResult) -> Path: + target = Path(path) + target.parent.mkdir(parents=True, exist_ok=True) + payload = { + "source_file_name": result.source_file_name, + "paragraphs": result.paragraphs, + "sections": result.sections, + "tables": result.tables, + "component_tables": result.component_tables, + "front_text": result.front_text, + } + target.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8") + return target + + +def _build_sections(paragraphs: list[str]) -> dict[str, str]: + sections: dict[str, list[str]] = {} + current = "front" + for text in paragraphs: + if _looks_like_heading(text): + current = text[:80] + sections.setdefault(current, []) + continue + sections.setdefault(current, []).append(text) + return {key: "\n".join(value).strip() for key, value in sections.items() if value} + + +def _looks_like_heading(text: str) -> bool: + compact = text.strip() + if len(compact) > 40: + return False + heading_markers = ("一、", "二、", "三、", "四、", "五、", "六、", "【", "产品名称", "预期用途", "主要组成") + return compact.startswith(heading_markers) + + +def _component_tables(tables: list[list[list[str]]]) -> list[dict]: + results = [] + for table in tables: + header = table[0] if table else [] + joined = "".join(header) + if any(keyword in joined for keyword in ["组成", "组分", "成分"]): + results.append({"header": header, "rows": table[1:]}) + return results + diff --git a/review_agent/regulatory_info_package/services/legacy_doc_document.py b/review_agent/regulatory_info_package/services/legacy_doc_document.py new file mode 100644 index 0000000..f95d25c --- /dev/null +++ b/review_agent/regulatory_info_package/services/legacy_doc_document.py @@ -0,0 +1,81 @@ +from __future__ import annotations + +import shutil +from dataclasses import dataclass +from pathlib import Path + +from django.conf import settings +from docx import Document + +from review_agent.regulatory_info_package.schemas import MergedField + + +@dataclass(frozen=True) +class LegacyDocCapability: + status: str + adapter: str + message: str = "" + + +def detect_legacy_doc_capability() -> LegacyDocCapability: + try: + import win32com.client # noqa: F401 + + return LegacyDocCapability(status="available", adapter="WordComDocAdapter", message="Word COM 可用") + except Exception as exc: + return LegacyDocCapability( + status="unavailable", + adapter="UnavailableLegacyDocAdapter", + message=f"Word COM 不可用:{type(exc).__name__}", + ) + + +def write_legacy_doc_or_fallback( + source_path: str | Path, + output_path: str | Path, + merged_fields: dict[str, MergedField], +) -> tuple[Path, str, dict]: + source = Path(source_path) + output = Path(output_path) + output.parent.mkdir(parents=True, exist_ok=True) + capability = detect_legacy_doc_capability() + native_enabled = bool(getattr(settings, "REGULATORY_INFO_PACKAGE_ENABLE_WORD_COM_NATIVE", False)) + if native_enabled and capability.status == "available" and source.exists(): + shutil.copy2(source, output) + try: + _append_doc_summary_with_word_com(output, merged_fields) + return output, "success", {"doc": capability.__dict__, "fallback_used": False, "native_write": True} + except Exception as exc: + capability = LegacyDocCapability( + status="unavailable", + adapter="UnavailableLegacyDocAdapter", + message=f"Word COM 写入失败:{exc}", + ) + fallback = output.with_suffix(".docx") + document = Document() + heading = document.add_paragraph() + heading.add_run(output.stem).bold = True + document.add_paragraph("【预生成版】当前未启用 .doc 原生写入,已生成 docx 兜底文件。") + for field in merged_fields.values(): + document.add_paragraph(f"{field.label}:{field.value}") + document.save(fallback) + return fallback, "fallback_success", {"doc": capability.__dict__, "fallback_used": True, "native_enabled": native_enabled} + + +def _append_doc_summary_with_word_com(path: Path, merged_fields: dict[str, MergedField]) -> None: + import win32com.client + + word = win32com.client.Dispatch("Word.Application") + word.Visible = False + document = None + try: + document = word.Documents.Open(str(path.resolve())) + end_range = document.Range(document.Content.End - 1, document.Content.End - 1) + lines = ["", "【预生成版】以下字段由系统根据说明书预填,请人工复核。"] + lines.extend(f"{field.label}:{field.value}" for field in merged_fields.values()) + end_range.InsertAfter("\r".join(lines)) + document.Save() + finally: + if document is not None: + document.Close(False) + word.Quit() diff --git a/review_agent/regulatory_info_package/services/package_generate.py b/review_agent/regulatory_info_package/services/package_generate.py new file mode 100644 index 0000000..6b11ccc --- /dev/null +++ b/review_agent/regulatory_info_package/services/package_generate.py @@ -0,0 +1,186 @@ +from __future__ import annotations + +import subprocess +from concurrent.futures import ThreadPoolExecutor, as_completed +from pathlib import Path +from zipfile import ZipFile +from xml.etree import ElementTree + +from review_agent.models import RegulatoryInfoPackageBatch +from review_agent.regulatory_info_package.constants import GENERATED_FILE_FAILED +from review_agent.regulatory_info_package.schemas import GeneratedFileResult, MergedField, TemplateSpec +from review_agent.regulatory_info_package.services.docx_document import write_docx_from_template +from review_agent.regulatory_info_package.services.legacy_doc_document import write_legacy_doc_or_fallback +from review_agent.regulatory_info_package.services.template_repository import copy_template_to_batch, template_specs +from review_agent.regulatory_info_package.storage import ensure_batch_subdir + + +def generate_package_documents( + batch: RegulatoryInfoPackageBatch, + config: dict, + merged_fields: dict[str, MergedField], +) -> list[GeneratedFileResult]: + specs = template_specs(config) + directory_specs = [spec for spec in specs if spec.code == "ch1_2_directory"] + content_specs = [spec for spec in specs if spec.code != "ch1_2_directory"] + results: list[GeneratedFileResult] = [] + with ThreadPoolExecutor(max_workers=min(4, len(content_specs) or 1)) as executor: + futures = [executor.submit(_generate_one, batch, config, spec, merged_fields) for spec in content_specs] + results.extend(future.result() for future in as_completed(futures)) + page_numbers = _directory_page_numbers(results) + for spec in directory_specs: + results.append(_generate_one(batch, config, spec, merged_fields, directory_page_numbers=page_numbers)) + return results + + +def _generate_one( + batch: RegulatoryInfoPackageBatch, + config: dict, + spec: TemplateSpec, + merged_fields: dict[str, MergedField], + *, + directory_page_numbers: dict[str, str] | None = None, +) -> GeneratedFileResult: + try: + template_path = copy_template_to_batch(batch, config, spec) + generated_dir = ensure_batch_subdir(batch, "generated") + output_path = generated_dir / spec.output_name + adapter_summary = {} + if spec.file_format == "doc": + actual_path, status, adapter_summary = write_legacy_doc_or_fallback(template_path, output_path, merged_fields) + actual_format = actual_path.suffix.lower().lstrip(".") + highlight_count = missing_count = llm_only_count = 0 + else: + highlight_count, missing_count, llm_only_count = write_docx_from_template( + template_path, + output_path, + merged_fields, + template_code=spec.code, + directory_page_numbers=directory_page_numbers, + ) + actual_path = output_path + actual_format = "docx" + status = "success" + return GeneratedFileResult( + template_code=spec.code, + file_name=actual_path.name, + requested_format=spec.file_format, + actual_format=actual_format, + status=status, + path=str(actual_path), + highlight_count=highlight_count, + missing_count=missing_count, + llm_only_count=llm_only_count, + ) + except Exception as exc: + return GeneratedFileResult( + template_code=spec.code, + file_name=spec.output_name, + requested_format=spec.file_format, + actual_format=spec.file_format, + status=GENERATED_FILE_FAILED, + error_message=str(exc), + ) + + +def _directory_page_numbers(results: list[GeneratedFileResult]) -> dict[str, str]: + page_numbers = {"CH1.2": "1"} + for result in results: + if result.status not in {"success", "fallback_success"} or not result.path: + continue + code = _directory_code_from_file_name(result.file_name) + if not code: + continue + page_numbers[code] = str(count_document_pages(result.path)) + return page_numbers + + +def _directory_code_from_file_name(file_name: str) -> str: + stem = Path(file_name).stem.strip() + return stem.split()[0] if stem.startswith("CH") else "" + + +def count_document_pages(path: str | Path) -> int: + file_path = Path(path) + if not file_path.exists(): + return 1 + pages = _count_pages_from_docx_properties(file_path) + if pages: + return pages + pages = _count_pages_with_pywin32(file_path) + if pages: + return pages + pages = _count_pages_with_powershell_word(file_path) + if pages: + return pages + return 1 + + +def _count_pages_from_docx_properties(file_path: Path) -> int: + if file_path.suffix.lower() != ".docx": + return 0 + try: + with ZipFile(file_path) as archive: + root = ElementTree.fromstring(archive.read("docProps/app.xml")) + namespace = {"ep": "http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"} + pages = root.find("ep:Pages", namespace) + return max(int((pages.text or "").strip()), 1) if pages is not None else 0 + except Exception: + return 0 + + +def _count_pages_with_pywin32(file_path: Path) -> int: + try: + import win32com.client + + word = win32com.client.DispatchEx("Word.Application") + word.Visible = False + document = None + try: + document = word.Documents.Open(str(file_path.resolve()), ReadOnly=True) + document.Repaginate() + return max(int(document.ComputeStatistics(2)), 1) + finally: + if document is not None: + document.Close(False) + word.Quit() + except Exception: + return 0 + + +def _count_pages_with_powershell_word(file_path: Path) -> int: + script = r""" +param([string]$Path) +$word = $null +$doc = $null +try { + $word = New-Object -ComObject Word.Application + $word.Visible = $false + $doc = $word.Documents.Open($Path, $false, $true) + $doc.Repaginate() + [Console]::Out.Write($doc.ComputeStatistics(2)) + exit 0 +} catch { + [Console]::Error.Write($_.Exception.Message) + exit 1 +} finally { + if ($doc -ne $null) { $doc.Close($false) | Out-Null } + if ($word -ne $null) { $word.Quit() | Out-Null } +} +""" + try: + completed = subprocess.run( + ["powershell.exe", "-NoProfile", "-ExecutionPolicy", "Bypass", "-Command", script, str(file_path.resolve())], + capture_output=True, + check=False, + text=True, + timeout=8, + ) + except Exception: + return 0 + if completed.returncode != 0: + return 0 + try: + return max(int(completed.stdout.strip()), 1) + except ValueError: + return 0 diff --git a/review_agent/regulatory_info_package/services/summary.py b/review_agent/regulatory_info_package/services/summary.py new file mode 100644 index 0000000..490704c --- /dev/null +++ b/review_agent/regulatory_info_package/services/summary.py @@ -0,0 +1,12 @@ +from __future__ import annotations + + +def build_assistant_summary(*, batch_no: str, exports: list[dict], failed_files: list[dict]) -> str: + zip_exports = [item for item in exports if item.get("export_type") == "zip" or str(item.get("file_name", "")).endswith(".zip")] + other_exports = [item for item in exports if item not in zip_exports] + lines = [f"已完成第1章监管信息材料包生成,批次号:{batch_no}。", ""] + for export in [*zip_exports, *other_exports]: + lines.append(f"- [{export['file_name']}]({export['download_url']})") + for failed in failed_files: + lines.append(f"- {failed.get('file_name')}:生成失败,{failed.get('error_message') or '原因待查看'}") + return "\n".join(lines) diff --git a/review_agent/regulatory_info_package/services/template_config.py b/review_agent/regulatory_info_package/services/template_config.py new file mode 100644 index 0000000..42475f9 --- /dev/null +++ b/review_agent/regulatory_info_package/services/template_config.py @@ -0,0 +1,53 @@ +from __future__ import annotations + +import hashlib +from pathlib import Path + +import yaml +from django.conf import settings + + +CONFIG_PATH = Path(__file__).resolve().parents[1] / "templates" / "regulatory_info_package_templates_v1.yaml" + + +def load_template_config(path: str | Path | None = None) -> dict: + config_path = Path(path) if path else CONFIG_PATH + with config_path.open("r", encoding="utf-8") as handle: + payload = yaml.safe_load(handle) or {} + if payload.get("source_dir"): + payload["source_dir"] = str((Path(settings.BASE_DIR) / payload["source_dir"]).resolve()) + return payload + + +def compute_config_hash(path: str | Path | None = None) -> str: + config_path = Path(path) if path else CONFIG_PATH + digest = hashlib.sha256() + digest.update(config_path.read_bytes()) + return digest.hexdigest() + + +def validate_template_config(config: dict) -> list[str]: + errors: list[str] = [] + source_dir = Path(config.get("source_dir") or "") + if not source_dir.exists(): + errors.append(f"模板源目录不存在:{source_dir}") + templates = config.get("templates") or [] + if len(templates) != 6: + errors.append("第1章监管信息模板配置必须包含 6 个模板。") + seen: set[str] = set() + for template in templates: + code = str(template.get("code") or "") + if not code: + errors.append("模板 code 不能为空。") + elif code in seen: + errors.append(f"模板 code 重复:{code}") + seen.add(code) + source_file = str(template.get("source_file") or "") + output_name = str(template.get("output_name") or "") + if not source_file: + errors.append(f"模板 {code} 缺少 source_file。") + elif source_dir.exists() and not (source_dir / source_file).exists(): + errors.append(f"模板源文件不存在:{source_file}") + if not output_name: + errors.append(f"模板 {code} 缺少 output_name。") + return errors diff --git a/review_agent/regulatory_info_package/services/template_repository.py b/review_agent/regulatory_info_package/services/template_repository.py new file mode 100644 index 0000000..4d7c15e --- /dev/null +++ b/review_agent/regulatory_info_package/services/template_repository.py @@ -0,0 +1,34 @@ +from __future__ import annotations + +import shutil +from pathlib import Path + +from review_agent.regulatory_info_package.schemas import TemplateSpec +from review_agent.regulatory_info_package.storage import ensure_batch_subdir +from review_agent.models import RegulatoryInfoPackageBatch + + +def template_specs(config: dict) -> list[TemplateSpec]: + return [ + TemplateSpec( + code=item["code"], + output_name=item["output_name"], + source_file=item["source_file"], + file_format=item.get("file_format", "docx"), + strategy=item.get("strategy", item["code"]), + include_in_zip=bool(item.get("include_in_zip", True)), + prefer_legacy_doc_native=bool(item.get("prefer_legacy_doc_native", False)), + allow_docx_fallback=bool(item.get("allow_docx_fallback", True)), + fields=item.get("fields") or [], + ) + for item in config.get("templates") or [] + ] + + +def copy_template_to_batch(batch: RegulatoryInfoPackageBatch, config: dict, spec: TemplateSpec) -> Path: + source_dir = Path(config["source_dir"]) + source = source_dir / spec.source_file + target = ensure_batch_subdir(batch, "templates") / f"{spec.code}.source{source.suffix}" + shutil.copy2(source, target) + return target + diff --git a/review_agent/regulatory_info_package/services/traceability_export.py b/review_agent/regulatory_info_package/services/traceability_export.py new file mode 100644 index 0000000..61e9111 --- /dev/null +++ b/review_agent/regulatory_info_package/services/traceability_export.py @@ -0,0 +1,51 @@ +from __future__ import annotations + +import json +from pathlib import Path + +from openpyxl import Workbook + +from review_agent.regulatory_info_package.schemas import MergedField + + +HEADERS = [ + "target_file", + "target_field", + "final_value", + "extraction_source", + "evidence", + "highlight_reason", + "needs_review", +] + + +def save_traceability_exports(root: str | Path, merged_fields: dict[str, MergedField]) -> tuple[Path, Path]: + root_path = Path(root) + exports_dir = root_path / "exports" + logs_dir = root_path / "logs" + exports_dir.mkdir(parents=True, exist_ok=True) + logs_dir.mkdir(parents=True, exist_ok=True) + rows = [ + { + "target_file": "", + "target_field": field.label, + "final_value": field.value, + "extraction_source": field.source, + "evidence": field.evidence, + "highlight_reason": field.highlight_reason, + "needs_review": field.needs_review, + } + for field in merged_fields.values() + ] + excel_path = exports_dir / "traceability.xlsx" + workbook = Workbook() + sheet = workbook.active + sheet.title = "traceability" + sheet.append(HEADERS) + for row in rows: + sheet.append([row.get(header, "") for header in HEADERS]) + workbook.save(excel_path) + json_path = logs_dir / "traceability.json" + json_path.write_text(json.dumps(rows, ensure_ascii=False, indent=2), encoding="utf-8") + return excel_path, json_path + diff --git a/review_agent/regulatory_info_package/services/zip_export.py b/review_agent/regulatory_info_package/services/zip_export.py new file mode 100644 index 0000000..2d13f1a --- /dev/null +++ b/review_agent/regulatory_info_package/services/zip_export.py @@ -0,0 +1,23 @@ +from __future__ import annotations + +from pathlib import Path +from zipfile import ZIP_DEFLATED, ZipFile + +from review_agent.regulatory_info_package.constants import DEFAULT_ZIP_NAME, GENERATED_FILE_FALLBACK_SUCCESS, GENERATED_FILE_SUCCESS +from review_agent.regulatory_info_package.schemas import GeneratedFileResult + + +def create_zip_package(root: str | Path, generated_files: list[GeneratedFileResult], zip_name: str = DEFAULT_ZIP_NAME) -> Path: + root_path = Path(root) + exports_dir = root_path / "exports" + exports_dir.mkdir(parents=True, exist_ok=True) + zip_path = exports_dir / zip_name + allowed = {GENERATED_FILE_SUCCESS, GENERATED_FILE_FALLBACK_SUCCESS} + with ZipFile(zip_path, "w", compression=ZIP_DEFLATED) as archive: + for result in generated_files: + if result.status not in allowed or not result.path: + continue + file_path = Path(result.path) + if file_path.exists(): + archive.write(file_path, arcname=result.file_name) + return zip_path diff --git a/review_agent/regulatory_info_package/storage.py b/review_agent/regulatory_info_package/storage.py new file mode 100644 index 0000000..c815f73 --- /dev/null +++ b/review_agent/regulatory_info_package/storage.py @@ -0,0 +1,71 @@ +from __future__ import annotations + +import hashlib +from pathlib import Path + +from django.conf import settings + +from review_agent.models import RegulatoryInfoPackageArtifact, RegulatoryInfoPackageBatch + + +def build_batch_work_dir(batch: RegulatoryInfoPackageBatch | None = None, *, batch_no: str = "") -> Path: + if batch: + return ( + Path(settings.MEDIA_ROOT) + / "regulatory_info_package" + / str(batch.user_id) + / str(batch.conversation_id) + / batch.batch_no + ) + return Path(settings.MEDIA_ROOT) / "regulatory_info_package" / batch_no + + +def ensure_batch_subdir(batch: RegulatoryInfoPackageBatch, name: str) -> Path: + root = Path(batch.work_dir) if batch.work_dir else build_batch_work_dir(batch) + target = root / Path(name).name + ensure_within_work_dir(batch, target) + target.mkdir(parents=True, exist_ok=True) + return target + + +def ensure_within_work_dir(batch: RegulatoryInfoPackageBatch, path: str | Path) -> Path: + root = Path(batch.work_dir).resolve() + target = Path(path).resolve() + if root != target and root not in target.parents: + raise ValueError("输出路径必须位于当前材料包批次工作目录内。") + return target + + +def compute_file_sha256(path: str | Path) -> str: + file_path = Path(path) + digest = hashlib.sha256() + with file_path.open("rb") as handle: + for chunk in iter(lambda: handle.read(1024 * 1024), b""): + digest.update(chunk) + return digest.hexdigest() + + +def create_artifact_for_file( + batch: RegulatoryInfoPackageBatch, + *, + path: str | Path, + artifact_type: str, + file_format: str, + name: str = "", + metadata: dict | None = None, + created_by_node: str = "", +) -> RegulatoryInfoPackageArtifact: + file_path = ensure_within_work_dir(batch, path) + return RegulatoryInfoPackageArtifact.objects.create( + batch=batch, + artifact_type=artifact_type, + file_format=file_format, + name=name or file_path.stem, + file_name=file_path.name, + storage_path=str(file_path), + file_size=file_path.stat().st_size if file_path.exists() else 0, + content_hash=compute_file_sha256(file_path) if file_path.exists() else "", + metadata=metadata or {}, + created_by_node=created_by_node, + ) + diff --git a/review_agent/regulatory_info_package/templates/clean/CH1.11.1 符合标准的清单.docx b/review_agent/regulatory_info_package/templates/clean/CH1.11.1 符合标准的清单.docx new file mode 100644 index 0000000..dc874a5 Binary files /dev/null and b/review_agent/regulatory_info_package/templates/clean/CH1.11.1 符合标准的清单.docx differ diff --git a/review_agent/regulatory_info_package/templates/clean/CH1.11.5 真实性声明.docx b/review_agent/regulatory_info_package/templates/clean/CH1.11.5 真实性声明.docx new file mode 100644 index 0000000..4fac204 Binary files /dev/null and b/review_agent/regulatory_info_package/templates/clean/CH1.11.5 真实性声明.docx differ diff --git a/review_agent/regulatory_info_package/templates/clean/CH1.11.6 符合性声明.docx b/review_agent/regulatory_info_package/templates/clean/CH1.11.6 符合性声明.docx new file mode 100644 index 0000000..2b29f3f Binary files /dev/null and b/review_agent/regulatory_info_package/templates/clean/CH1.11.6 符合性声明.docx differ diff --git a/review_agent/regulatory_info_package/templates/clean/CH1.2 监管信息目录 - 页码版.docx b/review_agent/regulatory_info_package/templates/clean/CH1.2 监管信息目录 - 页码版.docx new file mode 100644 index 0000000..4e8c239 Binary files /dev/null and b/review_agent/regulatory_info_package/templates/clean/CH1.2 监管信息目录 - 页码版.docx differ diff --git a/review_agent/regulatory_info_package/templates/clean/CH1.4 申请表 - 复选框调整版.docx b/review_agent/regulatory_info_package/templates/clean/CH1.4 申请表 - 复选框调整版.docx new file mode 100644 index 0000000..565a9b0 Binary files /dev/null and b/review_agent/regulatory_info_package/templates/clean/CH1.4 申请表 - 复选框调整版.docx differ diff --git a/review_agent/regulatory_info_package/templates/clean/CH1.5 产品列表.docx b/review_agent/regulatory_info_package/templates/clean/CH1.5 产品列表.docx new file mode 100644 index 0000000..7b08002 Binary files /dev/null and b/review_agent/regulatory_info_package/templates/clean/CH1.5 产品列表.docx differ diff --git a/review_agent/regulatory_info_package/templates/clean/CH1.9 产品申报前沟通的说明.docx b/review_agent/regulatory_info_package/templates/clean/CH1.9 产品申报前沟通的说明.docx new file mode 100644 index 0000000..112ee12 Binary files /dev/null and b/review_agent/regulatory_info_package/templates/clean/CH1.9 产品申报前沟通的说明.docx differ diff --git a/review_agent/regulatory_info_package/templates/regulatory_info_package_templates_v1.yaml b/review_agent/regulatory_info_package/templates/regulatory_info_package_templates_v1.yaml new file mode 100644 index 0000000..275a1a2 --- /dev/null +++ b/review_agent/regulatory_info_package/templates/regulatory_info_package_templates_v1.yaml @@ -0,0 +1,64 @@ +version: regulatory_info_package_templates_v1 +source_dir: review_agent/regulatory_info_package/templates/clean +zip_name: 第1章 监管信息(预生成版).zip +templates: + - code: ch1_2_directory + source_file: CH1.2 监管信息目录 - 页码版.docx + output_name: CH1.2 监管信息目录.docx + file_format: docx + strategy: directory + include_in_zip: true + fields: [] + - code: ch1_4_application_form + source_file: CH1.4 申请表 - 复选框调整版.docx + output_name: CH1.4 申请表.docx + file_format: docx + strategy: application_form + include_in_zip: true + fields: + - key: product_name + label: 产品名称 + placeholder: "{{product_name}}" + - key: applicant_name + label: 申请人名称 + placeholder: "{{applicant_name}}" + - code: ch1_5_product_list + source_file: CH1.5 产品列表.docx + output_name: CH1.5 产品列表.docx + file_format: docx + strategy: product_list + include_in_zip: true + fields: + - key: package_specification + label: 包装规格 + placeholder: "{{package_specification}}" + - code: ch1_11_1_standards + source_file: CH1.11.1 符合标准的清单.docx + output_name: CH1.11.1 符合标准的清单.docx + file_format: docx + strategy: standards + include_in_zip: true + fields: + - key: standard_no + label: 标准号 + placeholder: "{{standard_no}}" + - code: ch1_11_5_authenticity + source_file: CH1.11.5 真实性声明.docx + output_name: CH1.11.5 真实性声明.docx + file_format: docx + strategy: authenticity + include_in_zip: true + fields: + - key: product_name + label: 产品名称 + placeholder: "{{product_name}}" + - code: ch1_11_6_conformity + source_file: CH1.11.6 符合性声明.docx + output_name: CH1.11.6 符合性声明.docx + file_format: docx + strategy: conformity + include_in_zip: true + fields: + - key: product_name + label: 产品名称 + placeholder: "{{product_name}}" diff --git a/review_agent/regulatory_info_package/views.py b/review_agent/regulatory_info_package/views.py new file mode 100644 index 0000000..662956f --- /dev/null +++ b/review_agent/regulatory_info_package/views.py @@ -0,0 +1,127 @@ +import json + +from django.contrib.auth.decorators import login_required +from django.conf import settings +from django.http import Http404, JsonResponse +from django.views.decorators.http import require_http_methods + +from review_agent.models import ExportedSummaryFile, RegulatoryInfoPackageBatch, WorkflowNodeRun +from review_agent.regulatory_info_package.constants import WORKFLOW_TYPE +from review_agent.regulatory_info_package.services.input_select import select_instruction_input +from review_agent.regulatory_info_package.workflow import ( + create_regulatory_info_package_batch, + start_regulatory_info_package_workflow, +) + + +@require_http_methods(["GET"]) +def health(request): + return JsonResponse({"workflow_type": WORKFLOW_TYPE, "status": "available"}) + + +@login_required +@require_http_methods(["POST"]) +def start(request): + try: + payload = json.loads(request.body.decode("utf-8") or "{}") + except json.JSONDecodeError: + return JsonResponse({"error": "JSON 格式错误。"}, status=400) + from review_agent.models import Conversation + + conversation = Conversation.objects.filter(pk=payload.get("conversation_id"), user=request.user).first() + if not conversation: + raise Http404("对话不存在。") + selection = select_instruction_input(conversation, str(payload.get("message") or "")) + if selection.status != "selected": + return JsonResponse( + {"status": selection.status, "message": selection.message, "candidates": selection.candidates}, + status=400, + ) + batch = create_regulatory_info_package_batch( + conversation=conversation, + user=request.user, + source_attachment=selection.attachment, + source_summary_batch=selection.source_summary_batch, + source_summary_item_id=selection.source_summary_item_id, + source_file_name=selection.file_name, + source_storage_path=selection.storage_path, + ) + start_regulatory_info_package_workflow(batch, async_run=getattr(settings, "REGULATORY_INFO_PACKAGE_ASYNC", True)) + return JsonResponse({"batch_id": batch.pk, "workflow_type": WORKFLOW_TYPE, "status": batch.status}) + + +@login_required +@require_http_methods(["GET"]) +def batch_status(request, batch_id: int): + batch = RegulatoryInfoPackageBatch.objects.filter( + pk=batch_id, + conversation__user=request.user, + is_deleted=False, + ).first() + if not batch: + raise Http404("材料包批次不存在。") + exports = ExportedSummaryFile.objects.filter( + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=batch.pk, + ).order_by("-export_type", "id") + sorted_exports = sorted(exports, key=lambda item: 0 if item.export_type == ExportedSummaryFile.ExportType.ZIP else 1) + return JsonResponse( + { + "batch": { + "id": batch.pk, + "workflow_type": WORKFLOW_TYPE, + "batch_no": batch.batch_no, + "status": batch.status, + "product_name": batch.product_name, + "risk_summary_text": _risk_summary_text(batch), + "error_message": batch.error_message, + }, + "nodes": [ + { + "node_code": node.node_code, + "node_name": node.node_name, + "status": node.status, + "progress": node.progress, + "message": node.message, + } + for node in WorkflowNodeRun.objects.filter( + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=batch.pk, + ).order_by("id") + ], + "exports": [ + { + "id": export.pk, + "export_type": export.export_type, + "export_category": export.export_category, + "file_name": export.file_name, + "download_url": f"/api/review-agent/file-summary/exports/{export.pk}/download/", + } + for export in sorted_exports + ], + "failed_files": [item for item in batch.generated_files if item.get("status") == "failed"], + "notifications": [ + { + "id": item.pk, + "channel": item.channel, + "send_status": item.send_status, + "status_label": "通知已记录" if item.send_status == "success" else item.send_status, + "error_message": item.error_message, + } + for item in batch.notifications.filter(is_deleted=False).order_by("-created_at", "-id") + ], + } + ) + + +def _risk_summary_text(batch: RegulatoryInfoPackageBatch) -> str: + parts = [] + if batch.missing_fields: + parts.append(f"缺失字段 {len(batch.missing_fields)}") + if batch.llm_only_fields: + parts.append(f"LLM-only {len(batch.llm_only_fields)}") + if batch.conflict_fields: + parts.append(f"冲突字段 {len(batch.conflict_fields)}") + if batch.risk_notes: + parts.append(f"提示 {len(batch.risk_notes)}") + return " · ".join(parts) diff --git a/review_agent/regulatory_info_package/workflow.py b/review_agent/regulatory_info_package/workflow.py new file mode 100644 index 0000000..37250ba --- /dev/null +++ b/review_agent/regulatory_info_package/workflow.py @@ -0,0 +1,375 @@ +from __future__ import annotations + +import logging +from threading import Thread +from uuid import uuid4 + +from django.conf import settings +from django.db import transaction +from django.utils import timezone + +from review_agent.file_summary.paths import resolve_storage_path +from review_agent.models import ( + Conversation, + ExportedSummaryFile, + Message, + RegulatoryInfoPackageArtifact, + RegulatoryInfoPackageBatch, + RegulatoryInfoPackageNotificationRecord, + WorkflowNodeRun, +) +from review_agent.regulatory_info_package.constants import ( + DEFAULT_ZIP_NAME, + REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS, + WORKFLOW_TYPE, +) +from review_agent.regulatory_info_package.events import record_event +from review_agent.regulatory_info_package.services.template_config import ( + compute_config_hash, + load_template_config, + validate_template_config, +) +from review_agent.regulatory_info_package.services.field_extract import run_parallel_extract, save_field_extract_result +from review_agent.regulatory_info_package.services.field_merge import merge_fields, save_merged_fields +from review_agent.regulatory_info_package.services.instruction_extract import parse_instruction_docx, save_instruction_extract_json +from review_agent.regulatory_info_package.services.package_generate import generate_package_documents +from review_agent.regulatory_info_package.services.summary import build_assistant_summary +from review_agent.regulatory_info_package.services.traceability_export import save_traceability_exports +from review_agent.regulatory_info_package.services.zip_export import create_zip_package +from review_agent.regulatory_info_package.schemas import GeneratedFileResult, InstructionExtractResult, MergedField +from review_agent.regulatory_info_package.storage import build_batch_work_dir +from review_agent.regulatory_info_package.storage import create_artifact_for_file, ensure_batch_subdir + + +logger = logging.getLogger("review_agent.regulatory_info_package.workflow") + + +def build_batch_no() -> str: + return f"RIP-{timezone.localtime().strftime('%Y%m%d%H%M%S')}-{uuid4().hex[:6]}" + + +@transaction.atomic +def create_regulatory_info_package_batch( + *, + conversation: Conversation, + user, + trigger_message: Message | None = None, + source_attachment=None, + source_summary_batch=None, + source_summary_item_id: int | None = None, + source_file_name: str = "", + source_storage_path: str = "", + existing_batch: RegulatoryInfoPackageBatch | None = None, +) -> RegulatoryInfoPackageBatch: + batch = existing_batch + if batch is None: + batch_no = build_batch_no() + work_dir = build_batch_work_dir(batch_no=batch_no) + work_dir.mkdir(parents=True, exist_ok=True) + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + trigger_message=trigger_message, + source_attachment=source_attachment, + source_summary_batch=source_summary_batch, + source_summary_item_id=source_summary_item_id, + source_file_name=source_file_name or getattr(source_attachment, "original_name", ""), + source_storage_path=source_storage_path or getattr(source_attachment, "storage_path", ""), + batch_no=batch_no, + output_zip_name=DEFAULT_ZIP_NAME, + work_dir=str(work_dir), + ) + for code, name, group in REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS: + WorkflowNodeRun.objects.get_or_create( + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=batch.pk, + node_code=code, + defaults={ + "node_group": group, + "node_name": name, + }, + ) + record_event(batch, "workflow_created", {"batch_id": batch.pk, "batch_no": batch.batch_no}) + return batch + + +class RegulatoryInfoPackageWorkflowExecutor: + """Runs the Chapter 1 regulatory information package workflow.""" + + def __init__(self, batch: RegulatoryInfoPackageBatch): + self.batch = batch + self.template_config: dict = {} + self.instruction: InstructionExtractResult | None = None + self.extract_payload: dict = {} + self.merged_fields: dict[str, MergedField] = {} + self.merge_summary: dict[str, list[dict]] = {} + self.generation_results: list[GeneratedFileResult] = [] + self.exports: list[ExportedSummaryFile] = [] + + def run(self) -> None: + logger.info("监管信息材料包工作流开始 batch_no=%s batch_id=%s", self.batch.batch_no, self.batch.pk) + self.batch.status = RegulatoryInfoPackageBatch.Status.RUNNING + self.batch.started_at = timezone.now() + self.batch.save(update_fields=["status", "started_at"]) + record_event(self.batch, "workflow_started", {"batch_id": self.batch.pk}) + try: + for node in self._nodes(): + if node.status in {WorkflowNodeRun.Status.SUCCESS, WorkflowNodeRun.Status.SKIPPED}: + continue + self._run_node(node) + except Exception as exc: + logger.exception("Regulatory info package workflow failed", extra={"batch_id": self.batch.pk}) + self.batch.status = RegulatoryInfoPackageBatch.Status.FAILED + self.batch.error_message = str(exc) + self.batch.finished_at = timezone.now() + self.batch.save(update_fields=["status", "error_message", "finished_at"]) + record_event(self.batch, "workflow_failed", {"message": str(exc)}) + return + self.batch.status = RegulatoryInfoPackageBatch.Status.SUCCESS + self.batch.finished_at = timezone.now() + self.batch.save(update_fields=["status", "finished_at"]) + self._append_completion_message() + record_event(self.batch, "workflow_completed", {"batch_id": self.batch.pk}) + + def _nodes(self): + return WorkflowNodeRun.objects.filter( + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=self.batch.pk, + ).order_by("id") + + def _run_node(self, node: WorkflowNodeRun) -> None: + node.status = WorkflowNodeRun.Status.RUNNING + node.progress = 10 + node.started_at = timezone.now() + node.message = f"{node.node_name}处理中" + node.save(update_fields=["status", "progress", "started_at", "message"]) + record_event(self.batch, "node_progress", {"node_code": node.node_code, "status": node.status}) + self._execute_node(node) + node.status = WorkflowNodeRun.Status.SUCCESS + node.progress = 100 + node.finished_at = timezone.now() + node.message = f"{node.node_name}完成" + node.save(update_fields=["status", "progress", "finished_at", "message"]) + record_event(self.batch, "node_progress", {"node_code": node.node_code, "status": node.status}) + + def _execute_node(self, node: WorkflowNodeRun) -> None: + if node.node_code == "prepare": + self.template_config = load_template_config() + errors = validate_template_config(self.template_config) + if errors: + raise ValueError(";".join(errors)) + self.batch.template_config_version = str(self.template_config.get("version") or "") + self.batch.template_config_hash = compute_config_hash() + self.batch.save(update_fields=["template_config_version", "template_config_hash"]) + return + if node.node_code == "template_copy": + return + if node.node_code == "text_extract": + if not self.batch.source_storage_path: + self.instruction = None + return + path = resolve_storage_path(self.batch.source_storage_path) + self.instruction = parse_instruction_docx(path) + json_path = ensure_batch_subdir(self.batch, "logs") / "instruction_extract.json" + save_instruction_extract_json(json_path, self.instruction) + create_artifact_for_file( + self.batch, + path=json_path, + artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.INSTRUCTION_EXTRACT, + file_format=RegulatoryInfoPackageArtifact.FileFormat.JSON, + created_by_node=node.node_code, + ) + return + if node.node_code == "field_extract": + if not self.instruction: + self.extract_payload = {"regex_results": {}, "llm_results": {}, "llm_error": ""} + return + self.extract_payload = run_parallel_extract(self.instruction, llm_extract_func=lambda _instruction: {}) + json_path = ensure_batch_subdir(self.batch, "logs") / "field_extract_result.json" + save_field_extract_result(json_path, self.extract_payload) + create_artifact_for_file( + self.batch, + path=json_path, + artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.FIELD_EXTRACT_RESULT, + file_format=RegulatoryInfoPackageArtifact.FileFormat.JSON, + created_by_node=node.node_code, + ) + return + if node.node_code == "field_merge": + self.merged_fields, self.merge_summary = merge_fields( + self.extract_payload.get("regex_results") or {}, + self.extract_payload.get("llm_results") or {}, + ) + product = self.merged_fields.get("product_name") + if product and product.value and product.value != "/": + self.batch.product_name = product.value + self.batch.missing_fields = self.merge_summary.get("missing_fields", []) + self.batch.llm_only_fields = self.merge_summary.get("llm_only_fields", []) + self.batch.conflict_fields = self.merge_summary.get("conflict_fields", []) + self.batch.save(update_fields=["product_name", "missing_fields", "llm_only_fields", "conflict_fields"]) + json_path = ensure_batch_subdir(self.batch, "logs") / "merged_fields.json" + save_merged_fields(json_path, self.merged_fields, self.merge_summary) + create_artifact_for_file( + self.batch, + path=json_path, + artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.MERGED_FIELDS, + file_format=RegulatoryInfoPackageArtifact.FileFormat.JSON, + created_by_node=node.node_code, + ) + return + if node.node_code == "generate_docs": + self.generation_results = generate_package_documents(self.batch, self.template_config, self.merged_fields) + generated_files = [] + for result in self.generation_results: + if result.path: + artifact = create_artifact_for_file( + self.batch, + path=result.path, + artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.GENERATED_DOCUMENT, + file_format=result.actual_format, + name=result.template_code, + metadata=result.__dict__, + created_by_node=node.node_code, + ) + result.artifact_id = artifact.pk + if result.status in {"success", "fallback_success"}: + export = self._create_export( + path=result.path, + export_type=ExportedSummaryFile.ExportType.WORD, + export_category="generated_document", + ) + result.export_id = export.pk + self.exports.append(export) + generated_files.append(result.__dict__) + self.batch.generated_files = generated_files + self.batch.save(update_fields=["generated_files"]) + return + if node.node_code == "highlight_review_items": + return + if node.node_code == "trace_export": + excel_path, json_path = save_traceability_exports(self.batch.work_dir, self.merged_fields) + create_artifact_for_file( + self.batch, + path=json_path, + artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.TRACEABILITY, + file_format=RegulatoryInfoPackageArtifact.FileFormat.JSON, + created_by_node=node.node_code, + ) + artifact = create_artifact_for_file( + self.batch, + path=excel_path, + artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.TRACEABILITY, + file_format=RegulatoryInfoPackageArtifact.FileFormat.EXCEL, + created_by_node=node.node_code, + ) + export = self._create_export( + path=str(excel_path), + export_type=ExportedSummaryFile.ExportType.EXCEL, + export_category="traceability", + ) + self.exports.append(export) + artifact.metadata = {"export_id": export.pk} + artifact.save(update_fields=["metadata"]) + return + if node.node_code == "zip_export": + zip_path = create_zip_package(self.batch.work_dir, self.generation_results, self.batch.output_zip_name) + artifact = create_artifact_for_file( + self.batch, + path=zip_path, + artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.ZIP_PACKAGE, + file_format=RegulatoryInfoPackageArtifact.FileFormat.ZIP, + created_by_node=node.node_code, + ) + export = self._create_export( + path=str(zip_path), + export_type=ExportedSummaryFile.ExportType.ZIP, + export_category="regulatory_info_package", + ) + self.exports.insert(0, export) + artifact.metadata = {"export_id": export.pk} + artifact.save(update_fields=["metadata"]) + return + if node.node_code == "notify": + RegulatoryInfoPackageNotificationRecord.objects.create( + batch=self.batch, + recipient=self.batch.user, + export_ids=[export.pk for export in self.exports], + message_summary=build_assistant_summary( + batch_no=self.batch.batch_no, + exports=[ + { + "file_name": export.file_name, + "download_url": f"/api/review-agent/file-summary/exports/{export.pk}/download/", + "export_type": export.export_type, + } + for export in self.exports + ], + failed_files=[item for item in self.batch.generated_files if item.get("status") == "failed"], + ), + send_status=RegulatoryInfoPackageNotificationRecord.SendStatus.SUCCESS, + ) + return + + def _append_completion_message(self) -> None: + if ( + Message.objects.filter( + conversation=self.batch.conversation, + role=Message.Role.ASSISTANT, + content__contains=self.batch.batch_no, + ) + .filter(content__contains=self.batch.output_zip_name) + .exists() + ): + return + exports = list( + ExportedSummaryFile.objects.filter( + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=self.batch.pk, + ) + ) + exports = sorted(exports, key=lambda export: 0 if export.export_type == ExportedSummaryFile.ExportType.ZIP else 1) + content = build_assistant_summary( + batch_no=self.batch.batch_no, + exports=[ + { + "file_name": export.file_name, + "download_url": f"/api/review-agent/file-summary/exports/{export.pk}/download/", + "export_type": export.export_type, + } + for export in exports + ], + failed_files=[item for item in self.batch.generated_files if item.get("status") == "failed"], + ) + Message.objects.create( + conversation=self.batch.conversation, + role=Message.Role.ASSISTANT, + content=content, + ) + + def _create_export(self, *, path: str, export_type: str, export_category: str) -> ExportedSummaryFile: + from pathlib import Path + + resolved = Path(path) + return ExportedSummaryFile.objects.create( + batch=None, + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=self.batch.pk, + export_category=export_category, + export_type=export_type, + file_name=resolved.name, + storage_path=str(resolved), + ) + + +def start_regulatory_info_package_workflow( + batch: RegulatoryInfoPackageBatch, + *, + async_run: bool | None = None, +) -> None: + if async_run is None: + async_run = getattr(settings, "REGULATORY_INFO_PACKAGE_ASYNC", True) + executor = RegulatoryInfoPackageWorkflowExecutor(batch) + if async_run: + Thread(target=executor.run, daemon=True).start() + else: + executor.run() diff --git a/review_agent/services.py b/review_agent/services.py index 0bd9c7e..bd12ad8 100644 --- a/review_agent/services.py +++ b/review_agent/services.py @@ -19,6 +19,12 @@ from .application_form_fill.workflow import ( find_latest_successful_summary_batch as find_latest_successful_form_fill_summary_batch, start_application_form_fill_workflow, ) +from .regulatory_info_package.constants import WORKFLOW_TYPE as REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE +from .regulatory_info_package.services.input_select import select_instruction_input +from .regulatory_info_package.workflow import ( + create_regulatory_info_package_batch, + start_regulatory_info_package_workflow, +) from .regulatory_review.workflow import ( create_regulatory_review_batch, find_latest_successful_summary_batch, @@ -342,6 +348,56 @@ def stream_message(conversation: Conversation, content: str): ) return + if route.starts_regulatory_info_package: + selection = select_instruction_input(conversation, content) + if selection.status != "selected": + reply_content = selection.message or "请先在当前对话右侧上传产品说明书 docx 文件,然后再发送第1章监管信息生成指令。" + assistant_message = append_assistant_message(conversation, reply_content) + yield sse_event("chunk", {"delta": reply_content}) + yield sse_event( + "done", + { + "assistant_message_id": assistant_message.pk, + "conversation_id": conversation.pk, + "title": conversation.title, + }, + ) + return + batch = create_regulatory_info_package_batch( + conversation=conversation, + user=conversation.user, + trigger_message=user_message, + source_attachment=selection.attachment, + source_summary_batch=selection.source_summary_batch, + source_summary_item_id=selection.source_summary_item_id, + source_file_name=selection.file_name, + source_storage_path=selection.storage_path, + ) + start_regulatory_info_package_workflow( + batch, + async_run=getattr(settings, "REGULATORY_INFO_PACKAGE_ASYNC", True), + ) + reply_content = f"已启动第1章监管信息材料包生成工作流,批次号:{batch.batch_no}。" + assistant_message = append_assistant_message(conversation, reply_content) + yield sse_event( + "workflow_started", + { + "workflow_type": REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE, + "batch_id": batch.pk, + "batch_no": batch.batch_no, + }, + ) + yield sse_event("chunk", {"delta": reply_content}) + yield sse_event( + "done", + { + "assistant_message_id": assistant_message.pk, + "conversation_id": conversation.pk, + "title": conversation.title, + }, + ) + return + if route.starts_regulatory_review: source_summary_batch = find_latest_successful_summary_batch(conversation) if not source_summary_batch: diff --git a/review_agent/skill_router.py b/review_agent/skill_router.py index 24e668a..99d29c8 100644 --- a/review_agent/skill_router.py +++ b/review_agent/skill_router.py @@ -11,6 +11,10 @@ from .file_summary.workflow_trigger import ( from .application_form_fill.constants import FORM_FILL_TRIGGER_KEYWORDS, WORKFLOW_TYPE as FORM_FILL_WORKFLOW_TYPE from .llm import LLMConfigurationError, LLMRequestError, generate_completion from .models import Conversation, FileAttachment +from .regulatory_info_package.constants import ( + REGULATORY_INFO_PACKAGE_TRIGGER_KEYWORDS, + WORKFLOW_TYPE as REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE, +) logger = logging.getLogger(__name__) @@ -18,6 +22,7 @@ logger = logging.getLogger(__name__) ROUTE_ACTIONS = {"normal_chat", "attachment_reader", "file_summary"} ROUTE_ACTIONS.add("regulatory_review") ROUTE_ACTIONS.add(FORM_FILL_WORKFLOW_TYPE) +ROUTE_ACTIONS.add(REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE) @dataclass(frozen=True) @@ -45,6 +50,10 @@ class SkillRoute: def starts_application_form_fill(self) -> bool: return self.action == FORM_FILL_WORKFLOW_TYPE + @property + def starts_regulatory_info_package(self) -> bool: + return self.action == REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE + @property def is_normal_chat(self) -> bool: return self.action == "normal_chat" @@ -80,6 +89,14 @@ def route_message_intent(conversation: Conversation, content: str) -> SkillRoute def _deterministic_workflow_route(conversation: Conversation, content: str) -> SkillRoute | None: + if _matches_regulatory_info_package(content): + return SkillRoute( + action=REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE, + workflow_type=REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE, + confidence=0.9, + reason="命中明确第1章监管信息材料包生成关键词。", + source="rule_preflight", + ) if _matches_application_form_fill(content): return SkillRoute( action=FORM_FILL_WORKFLOW_TYPE, @@ -144,7 +161,9 @@ def _route_with_llm( return SkillRoute( action=action, skill_name="attachment_reader" if action == "attachment_reader" else "", - workflow_type=action if action in {"file_summary", "regulatory_review", FORM_FILL_WORKFLOW_TYPE} else "", + workflow_type=action + if action in {"file_summary", "regulatory_review", FORM_FILL_WORKFLOW_TYPE, REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE} + else "", confidence=_float_or_zero(payload.get("confidence")), reason=str(payload.get("reason") or ""), source="llm", @@ -152,6 +171,15 @@ def _route_with_llm( def _route_with_rules(conversation: Conversation, content: str) -> SkillRoute: + if _matches_regulatory_info_package(content): + return SkillRoute( + action=REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE, + workflow_type=REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE, + confidence=0.7, + reason="命中第1章监管信息材料包生成关键词。", + source="rule_fallback", + ) + if _matches_application_form_fill(content): return SkillRoute( action=FORM_FILL_WORKFLOW_TYPE, @@ -210,11 +238,12 @@ def _router_system_prompt() -> str: return ( "你是审核智能体的工具路由器,只判断是否需要调用工具,不直接回答用户。" "你必须只输出 JSON 对象,不要输出 Markdown。" - "可选 action:normal_chat、attachment_reader、file_summary、regulatory_review、application_form_fill。" + "可选 action:normal_chat、attachment_reader、file_summary、regulatory_review、application_form_fill、regulatory_info_package。" "attachment_reader 用于用户要求阅读、提取、分析、总结、查看上传附件内容。" "file_summary 用于用户要求自动汇总文件目录、页数、清单或生成目录页数报告。" "regulatory_review 用于用户要求法规核查、NMPA核查、完整性核查、章节一致性核查、风险预警或整改建议。" "application_form_fill 用于用户要求填注册证、生成申报模板、填写对应表格、安全和性能基本原则清单或自动填表。" + "regulatory_info_package 用于用户要求根据说明书生成第1章监管信息、监管信息材料包、申请表、产品列表或声明材料包。" "normal_chat 用于不需要读取附件或执行工作流的一般问答。" "输出字段:action、confidence、reason。" ) @@ -268,6 +297,11 @@ def _matches_regulatory_review(content: str) -> bool: return any(keyword in normalized for keyword in keywords) +def _matches_regulatory_info_package(content: str) -> bool: + normalized = "".join((content or "").lower().split()) + return any("".join(keyword.lower().split()) in normalized for keyword in REGULATORY_INFO_PACKAGE_TRIGGER_KEYWORDS) + + def _matches_application_form_fill(content: str) -> bool: normalized = content.lower() return any(keyword.lower() in normalized for keyword in FORM_FILL_TRIGGER_KEYWORDS) diff --git a/review_agent/urls.py b/review_agent/urls.py index 4d46250..59aa2c1 100644 --- a/review_agent/urls.py +++ b/review_agent/urls.py @@ -21,6 +21,10 @@ from .application_form_fill.views import ( batch_status as application_form_fill_batch_status, start as application_form_fill_start, ) +from .regulatory_info_package.views import ( + batch_status as regulatory_info_package_batch_status, + start as regulatory_info_package_start, +) from .views import ( knowledge_base_document_detail, knowledge_base_document_index, @@ -112,6 +116,16 @@ urlpatterns = [ application_form_fill_batch_status, name="application_form_fill_batch_status", ), + path( + "api/review-agent/regulatory-info-package/start/", + regulatory_info_package_start, + name="regulatory_info_package_start", + ), + path( + "api/review-agent/regulatory-info-package//status/", + regulatory_info_package_batch_status, + name="regulatory_info_package_batch_status", + ), path( "api/review-agent/knowledge-base/status/", knowledge_base_status, diff --git a/review_agent/views.py b/review_agent/views.py index 2933923..5613cdd 100644 --- a/review_agent/views.py +++ b/review_agent/views.py @@ -16,7 +16,15 @@ from .services import ( send_message, stream_message, ) -from .models import ApplicationFormFillBatch, Conversation, FileAttachment, FileSummaryBatch, RegulatoryReviewBatch, WorkflowNodeRun +from .models import ( + ApplicationFormFillBatch, + Conversation, + FileAttachment, + FileSummaryBatch, + RegulatoryInfoPackageBatch, + RegulatoryReviewBatch, + WorkflowNodeRun, +) from .knowledge_base import build_knowledge_base_context, search_knowledge_base from .knowledge_base import ( build_knowledge_base_context_for_user, @@ -329,6 +337,25 @@ def build_workflow_cards(conversation: Conversation) -> list[dict[str, object]]: ), } ) + rip_batches = RegulatoryInfoPackageBatch.objects.filter(conversation=conversation, is_deleted=False) + for batch in rip_batches: + cards.append( + { + "id": batch.pk, + "workflow_type": "regulatory_info_package", + "batch_no": batch.batch_no, + "status": batch.status, + "error_message": batch.error_message, + "risk_label": _format_regulatory_info_package_label(batch), + "created_at": batch.created_at, + "nodes": list( + WorkflowNodeRun.objects.filter( + workflow_type="regulatory_info_package", + workflow_batch_id=batch.pk, + ).order_by("id") + ), + } + ) return sorted(cards, key=lambda item: item["created_at"], reverse=True)[:5] @@ -374,6 +401,20 @@ def _format_form_fill_label(batch: ApplicationFormFillBatch) -> str: return " · ".join(parts) +def _format_regulatory_info_package_label(batch: RegulatoryInfoPackageBatch) -> str: + parts = [] + if batch.product_name: + parts.append(batch.product_name) + if batch.generated_files: + success_count = sum(1 for item in batch.generated_files if item.get("status") in {"success", "fallback_success"}) + parts.append(f"生成 {success_count}/7") + if batch.missing_fields: + parts.append(f"缺失 {len(batch.missing_fields)}") + if batch.conflict_fields: + parts.append(f"冲突 {len(batch.conflict_fields)}") + return " · ".join(parts) + + def build_home_dashboard_context(user) -> dict[str, object]: conversations = Conversation.objects.filter(user=user) active_attachments = FileAttachment.objects.filter(user=user).exclude( diff --git a/static/js/app.js b/static/js/app.js index 015a1f5..f99d460 100644 --- a/static/js/app.js +++ b/static/js/app.js @@ -517,6 +517,8 @@ attributeName = "data-regulatory-status-url-template"; } else if (workflow_type === "application_form_fill") { attributeName = "data-application-form-fill-status-url-template"; + } else if (workflow_type === "regulatory_info_package") { + attributeName = "data-regulatory-info-package-status-url-template"; } return templateUrl(attributeName, "__batch_id__", batchId); } diff --git a/templates/home.html b/templates/home.html index 467b64b..f5ba5eb 100644 --- a/templates/home.html +++ b/templates/home.html @@ -225,6 +225,11 @@ type="button" data-prompt-template="请基于当前对话最近成功汇总的产品资料,自动提取产品关键信息并填入申报文件模板" >申报文件填表 + @@ -241,6 +246,7 @@ data-status-url-template="/api/review-agent/file-summary/__batch_id__/status/" data-regulatory-status-url-template="/api/review-agent/regulatory-review/__batch_id__/status/" data-application-form-fill-status-url-template="/api/review-agent/application-form-fill/__batch_id__/status/" + data-regulatory-info-package-status-url-template="/api/review-agent/regulatory-info-package/__batch_id__/status/" data-events-url-template="/api/review-agent/file-summary/__batch_id__/events/" >
diff --git a/tests/conftest.py b/tests/conftest.py new file mode 100644 index 0000000..9912414 --- /dev/null +++ b/tests/conftest.py @@ -0,0 +1,8 @@ +import pytest + + +@pytest.fixture(autouse=True) +def mock_regulatory_info_package_page_count(monkeypatch): + from review_agent.regulatory_info_package.services import package_generate + + monkeypatch.setattr(package_generate, "count_document_pages", lambda _path: 1) diff --git a/tests/test_regulatory_info_package_field_extract.py b/tests/test_regulatory_info_package_field_extract.py new file mode 100644 index 0000000..b84754d --- /dev/null +++ b/tests/test_regulatory_info_package_field_extract.py @@ -0,0 +1,88 @@ +import json + +from review_agent.regulatory_info_package.schemas import InstructionExtractResult +from review_agent.regulatory_info_package.services.field_extract import extract_fields_by_rules, run_parallel_extract + + +def test_extract_fields_by_rules_finds_product_name_and_storage(): + instruction = InstructionExtractResult( + source_file_name="目标产品说明书.docx", + paragraphs=["产品名称:新型冠状病毒检测试剂盒", "储存条件:2-8℃保存"], + sections={}, + tables=[], + component_tables=[], + front_text="产品名称:新型冠状病毒检测试剂盒\n储存条件:2-8℃保存", + ) + + result = extract_fields_by_rules(instruction) + + assert result["product_name"]["value"] == "新型冠状病毒检测试剂盒" + assert result["storage_condition"]["value"] == "2-8℃保存" + + +def test_extract_fields_by_rules_uses_registrant_or_manufacturer_for_applicant(): + instruction = InstructionExtractResult( + source_file_name="目标产品说明书.docx", + paragraphs=[ + "注册人/售后服务单位名称:卡尤迪生物科技宜兴有限公司", + "生产企业名称:卡尤迪生物科技宜兴有限公司", + "生产企业住所:宜兴经济技术开发区杏里路10号宜兴光电产业园4幢101室、102室", + "联系方式: 0510-80330909, 0510-80330919", + "生产地址:江苏省宜兴经济技术开发区杏里路10号宜兴光电产业园4幢102室", + ], + sections={}, + tables=[], + component_tables=[], + front_text="", + ) + + result = extract_fields_by_rules(instruction) + + assert result["applicant_name"]["value"] == "卡尤迪生物科技宜兴有限公司" + assert result["manufacturer_name"]["value"] == "卡尤迪生物科技宜兴有限公司" + assert result["applicant_address"]["value"] == "宜兴经济技术开发区杏里路10号宜兴光电产业园4幢101室、102室" + assert result["applicant_contact"]["value"] == "0510-80330909, 0510-80330919" + assert result["production_address"]["value"] == "江苏省宜兴经济技术开发区杏里路10号宜兴光电产业园4幢102室" + + +def test_extract_fields_by_rules_serializes_component_table_and_notes(): + instruction = InstructionExtractResult( + source_file_name="目标产品说明书.docx", + paragraphs=[], + sections={"【主要组成成分】": "表1 规格A大包装试剂盒组成成分\n注:不同批号试剂盒中各组分不得互换使用。"}, + tables=[], + component_tables=[ + { + "header": ["组分", "主要组成成分", "规格(24人份/盒)", "规格(48人份/盒)"], + "rows": [ + ["PCR反应液 I", "逆转录酶、Taq酶", "840μL/管×1管", "840μL/管×2管"], + ["阳性对照品", "含目的片段的假病毒", "600μL/管×2管", "1200μL/管×2管"], + ], + } + ], + front_text="", + ) + + result = extract_fields_by_rules(instruction) + payload = json.loads(result["component_table"]["value"]) + + assert payload["header"][0:2] == ["组分", "主要组成成分"] + assert payload["rows"][0][0] == "PCR反应液 I" + assert result["component_notes"]["value"] == "表1 规格A大包装试剂盒组成成分\n注:不同批号试剂盒中各组分不得互换使用。" + + +def test_run_parallel_extract_keeps_rule_result_when_llm_fails(): + instruction = InstructionExtractResult( + source_file_name="目标产品说明书.docx", + paragraphs=["产品名称:测试产品"], + sections={}, + tables=[], + component_tables=[], + front_text="产品名称:测试产品", + ) + + result = run_parallel_extract(instruction, llm_extract_func=lambda _instruction: (_ for _ in ()).throw(ValueError("bad llm"))) + + assert result["regex_results"]["product_name"]["value"] == "测试产品" + assert result["llm_results"] == {} + assert result["llm_error"] diff --git a/tests/test_regulatory_info_package_field_merge.py b/tests/test_regulatory_info_package_field_merge.py new file mode 100644 index 0000000..18192ed --- /dev/null +++ b/tests/test_regulatory_info_package_field_merge.py @@ -0,0 +1,24 @@ +from review_agent.regulatory_info_package.services.field_merge import merge_fields + + +def test_merge_fields_marks_missing_llm_only_and_conflict(): + merged, summary = merge_fields( + { + "product_name": {"value": "规则产品", "evidence": "说明书", "confidence": 0.8, "label": "产品名称"}, + "applicant_name": {"value": "", "evidence": "", "confidence": 0.0, "label": "申请人名称"}, + "package_specification": {"value": "24人份/盒", "evidence": "表格", "confidence": 0.7, "label": "包装规格"}, + }, + { + "intended_use": {"value": "用于检测", "evidence": "LLM", "confidence": 0.6, "label": "预期用途"}, + "package_specification": {"value": "48人份/盒", "evidence": "LLM", "confidence": 0.6, "label": "包装规格"}, + }, + ) + + assert merged["applicant_name"].value == "/" + assert merged["applicant_name"].highlight_reason == "missing" + assert merged["intended_use"].highlight_reason == "llm_only" + assert merged["package_specification"].value == "24人份/盒" + assert merged["package_specification"].highlight_reason == "conflict" + assert any(item["field_key"] == "applicant_name" for item in summary["missing_fields"]) + assert len(summary["llm_only_fields"]) == 1 + assert len(summary["conflict_fields"]) == 1 diff --git a/tests/test_regulatory_info_package_frontend.py b/tests/test_regulatory_info_package_frontend.py new file mode 100644 index 0000000..2b10f0b --- /dev/null +++ b/tests/test_regulatory_info_package_frontend.py @@ -0,0 +1,45 @@ +import pytest +from django.urls import reverse + +from review_agent.models import Conversation, RegulatoryInfoPackageBatch, WorkflowNodeRun + + +pytestmark = pytest.mark.django_db + + +def test_workspace_renders_regulatory_info_package_chip_and_card(client, django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-CARD", + status=RegulatoryInfoPackageBatch.Status.SUCCESS, + generated_files=[{"status": "success"} for _ in range(7)], + ) + WorkflowNodeRun.objects.create( + workflow_type="regulatory_info_package", + workflow_batch_id=batch.pk, + node_group="regulatory_info_package", + node_code="zip_export", + node_name="打包下载", + status=WorkflowNodeRun.Status.SUCCESS, + progress=100, + ) + client.force_login(user) + + response = client.get(f"{reverse('chat')}?conversation={conversation.pk}") + content = response.content.decode("utf-8") + + assert "第1章监管信息" in content + assert 'data-workflow-type="regulatory_info_package"' in content + assert "data-regulatory-info-package-status-url-template" in content + assert "RIP-CARD" in content + + +def test_frontend_selects_regulatory_info_package_status_url(): + script = open("static/js/app.js", encoding="utf-8").read() + + assert 'workflow_type === "regulatory_info_package"' in script + assert "data-regulatory-info-package-status-url-template" in script + diff --git a/tests/test_regulatory_info_package_input_select.py b/tests/test_regulatory_info_package_input_select.py new file mode 100644 index 0000000..a580aa5 --- /dev/null +++ b/tests/test_regulatory_info_package_input_select.py @@ -0,0 +1,48 @@ +import pytest + +from review_agent.models import Conversation, FileAttachment +from review_agent.regulatory_info_package.services.input_select import select_instruction_input + + +pytestmark = pytest.mark.django_db + + +def test_select_instruction_input_prefers_message_filename(django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + selected = FileAttachment.objects.create( + conversation=conversation, + user=user, + original_name="目标产品说明书.docx", + storage_path="uploads/target.docx", + ) + FileAttachment.objects.create( + conversation=conversation, + user=user, + original_name="其他说明书.docx", + storage_path="uploads/other.docx", + ) + + result = select_instruction_input(conversation, "请使用目标产品说明书生成第1章监管信息") + + assert result.status == "selected" + assert result.attachment == selected + assert result.file_name == "目标产品说明书.docx" + + +def test_select_instruction_input_waits_on_multiple_candidates(django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + for name in ["A说明书.docx", "B说明书.docx"]: + FileAttachment.objects.create( + conversation=conversation, + user=user, + original_name=name, + storage_path=f"uploads/{name}", + ) + + result = select_instruction_input(conversation, "生成第1章监管信息") + + assert result.status == "waiting_user" + assert result.candidates == ["A说明书.docx", "B说明书.docx"] + diff --git a/tests/test_regulatory_info_package_instruction_extract.py b/tests/test_regulatory_info_package_instruction_extract.py new file mode 100644 index 0000000..93b9e78 --- /dev/null +++ b/tests/test_regulatory_info_package_instruction_extract.py @@ -0,0 +1,16 @@ +from pathlib import Path + +from review_agent.regulatory_info_package.services.instruction_extract import parse_instruction_docx + + +def test_parse_instruction_docx_extracts_paragraphs_and_tables(): + path = Path("docs/0.原始材料/目标产品说明书.docx") + + result = parse_instruction_docx(path) + + assert result.source_file_name == "目标产品说明书.docx" + assert result.paragraphs + assert isinstance(result.sections, dict) + assert isinstance(result.tables, list) + assert result.front_text + diff --git a/tests/test_regulatory_info_package_legacy_doc.py b/tests/test_regulatory_info_package_legacy_doc.py new file mode 100644 index 0000000..951b609 --- /dev/null +++ b/tests/test_regulatory_info_package_legacy_doc.py @@ -0,0 +1,9 @@ +from review_agent.regulatory_info_package.services.legacy_doc_document import detect_legacy_doc_capability + + +def test_detect_legacy_doc_capability_is_stable(): + capability = detect_legacy_doc_capability() + + assert capability.status in {"available", "unavailable"} + assert capability.adapter in {"WordComDocAdapter", "UnavailableLegacyDocAdapter"} + diff --git a/tests/test_regulatory_info_package_models.py b/tests/test_regulatory_info_package_models.py new file mode 100644 index 0000000..e100935 --- /dev/null +++ b/tests/test_regulatory_info_package_models.py @@ -0,0 +1,109 @@ +import pytest +from django.db import IntegrityError + +from review_agent.models import ( + Conversation, + ExportedSummaryFile, + FileAttachment, + RegulatoryInfoPackageArtifact, + RegulatoryInfoPackageBatch, + RegulatoryInfoPackageNotificationRecord, + WorkflowNodeRun, +) + + +pytestmark = pytest.mark.django_db + + +def test_regulatory_info_package_batch_defaults(django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + attachment = FileAttachment.objects.create( + conversation=conversation, + user=user, + original_name="目标产品说明书.docx", + storage_path="uploads/instruction.docx", + ) + + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + source_attachment=attachment, + batch_no="RIP-20260610153000-abcdef", + source_file_name=attachment.original_name, + source_storage_path=attachment.storage_path, + ) + + assert batch.status == RegulatoryInfoPackageBatch.Status.PENDING + assert batch.output_zip_name == "第1章 监管信息(预生成版).zip" + assert batch.generated_files == [] + assert batch.missing_fields == [] + assert batch.llm_only_fields == [] + assert batch.conflict_fields == [] + assert batch.risk_notes == [] + assert batch.adapter_summary == {} + assert str(batch) == "RIP-20260610153000-abcdef" + + +def test_regulatory_info_package_artifact_and_notification(django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-20260610153100-abcdef", + ) + + artifact = RegulatoryInfoPackageArtifact.objects.create( + batch=batch, + artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.ZIP_PACKAGE, + file_format=RegulatoryInfoPackageArtifact.FileFormat.ZIP, + name="主下载包", + file_name="第1章 监管信息(预生成版).zip", + storage_path="media/regulatory_info_package/package.zip", + ) + notification = RegulatoryInfoPackageNotificationRecord.objects.create( + batch=batch, + recipient=user, + export_ids=[1, 2], + message_summary="材料包已生成", + send_status=RegulatoryInfoPackageNotificationRecord.SendStatus.SUCCESS, + ) + + assert artifact.metadata == {} + assert artifact.is_deleted is False + assert notification.channel == RegulatoryInfoPackageNotificationRecord.Channel.MOCK + assert notification.retry_count == 0 + + +def test_exported_summary_file_supports_zip_type(): + values = {value for value, _label in ExportedSummaryFile.ExportType.choices} + + assert "zip" in values + + +def test_workflow_node_run_unique_for_workflow_batch(django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-20260610153200-abcdef", + ) + + WorkflowNodeRun.objects.create( + workflow_type="regulatory_info_package", + workflow_batch_id=batch.pk, + node_group="regulatory_info_package", + node_code="prepare", + node_name="准备资料", + ) + + with pytest.raises(IntegrityError): + WorkflowNodeRun.objects.create( + workflow_type="regulatory_info_package", + workflow_batch_id=batch.pk, + node_group="regulatory_info_package", + node_code="prepare", + node_name="准备资料", + ) diff --git a/tests/test_regulatory_info_package_notification.py b/tests/test_regulatory_info_package_notification.py new file mode 100644 index 0000000..6b69ac8 --- /dev/null +++ b/tests/test_regulatory_info_package_notification.py @@ -0,0 +1,17 @@ +import pytest + +from review_agent.models import Conversation, RegulatoryInfoPackageBatch, RegulatoryInfoPackageNotificationRecord + + +pytestmark = pytest.mark.django_db + + +def test_regulatory_info_package_notification_record_defaults(django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create(conversation=conversation, user=user, batch_no="RIP-NOTIFY") + + record = RegulatoryInfoPackageNotificationRecord.objects.create(batch=batch, recipient=user) + + assert record.channel == RegulatoryInfoPackageNotificationRecord.Channel.MOCK + assert record.send_status == RegulatoryInfoPackageNotificationRecord.SendStatus.PENDING diff --git a/tests/test_regulatory_info_package_package_generate.py b/tests/test_regulatory_info_package_package_generate.py new file mode 100644 index 0000000..c1331a9 --- /dev/null +++ b/tests/test_regulatory_info_package_package_generate.py @@ -0,0 +1,281 @@ +import json +import pytest +from docx import Document +from pathlib import Path + +from django.conf import settings +from django.utils import timezone +from review_agent.models import Conversation, RegulatoryInfoPackageBatch +from review_agent.regulatory_info_package.services.field_merge import merge_fields +from review_agent.regulatory_info_package.services import package_generate +from review_agent.regulatory_info_package.services.package_generate import generate_package_documents +from review_agent.regulatory_info_package.services.template_config import load_template_config + + +pytestmark = pytest.mark.django_db + + +def test_template_config_uses_clean_internal_templates(): + config = load_template_config() + source_dir = Path(config["source_dir"]) + + assert source_dir == settings.BASE_DIR / "review_agent" / "regulatory_info_package" / "templates" / "clean" + assert source_dir.exists() + assert len(config["templates"]) == 6 + assert all((source_dir / item["source_file"]).exists() for item in config["templates"]) + + +def test_clean_templates_expose_stable_fill_placeholders(): + config = load_template_config() + source_dir = Path(config["source_dir"]) + expected_by_code = { + "ch1_2_directory": {"{{product_name}}"}, + "ch1_4_application_form": {"{{product_name}}", "{{applicant_name}}"}, + "ch1_5_product_list": {"{{product_name}}"}, + "ch1_11_1_standards": {"{{product_name}}"}, + "ch1_11_5_authenticity": {"{{product_name}}"}, + "ch1_11_6_conformity": {"{{product_name}}"}, + } + + for item in config["templates"]: + document = Document(source_dir / item["source_file"]) + text = _document_text(document) + for placeholder in expected_by_code[item["code"]]: + assert placeholder in text + + +def test_directory_template_includes_page_numbers(): + config = load_template_config() + source_dir = Path(config["source_dir"]) + item = next(template for template in config["templates"] if template["code"] == "ch1_2_directory") + document = Document(source_dir / item["source_file"]) + page_numbers = [row.cells[4].text.strip() for row in document.tables[0].rows[1:]] + + assert page_numbers == ["1", "1", "1", "1", "1", "1"] + + +def test_application_form_template_uses_real_checkbox_symbols(): + config = load_template_config() + source_dir = Path(config["source_dir"]) + item = next(template for template in config["templates"] if template["code"] == "ch1_4_application_form") + text = _document_text(Document(source_dir / item["source_file"])) + + assert "{{复选框}}" not in text + assert "{{}}" not in text + assert "☐" in text + assert "☑" in text + + +def test_generate_package_documents_creates_six_results(django_user_model, tmp_path): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-20260610154000-abcdef", + work_dir=str(tmp_path), + ) + merged, _summary = merge_fields({"product_name": {"value": "测试产品", "label": "产品名称"}}, {}) + + results = generate_package_documents(batch, load_template_config(), merged) + + assert len(results) == 6 + assert all(result.status in {"success", "fallback_success"} for result in results), [ + (result.template_code, result.status, result.error_message) for result in results + ] + assert all(result.path for result in results) + + +def test_directory_is_generated_last_with_real_page_counts(django_user_model, tmp_path, monkeypatch): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-20260610154010-abcdef", + work_dir=str(tmp_path), + ) + merged, _summary = merge_fields({"product_name": {"value": "测试产品", "label": "产品名称"}}, {}) + page_counts = { + "CH1.4 申请表.docx": 3, + "CH1.5 产品列表.docx": 5, + "CH1.11.1 符合标准的清单.docx": 2, + "CH1.11.5 真实性声明.docx": 4, + "CH1.11.6 符合性声明.docx": 6, + } + counted_files = [] + + def fake_count(path): + counted_files.append(Path(path).name) + return page_counts[Path(path).name] + + monkeypatch.setattr(package_generate, "count_document_pages", fake_count, raising=False) + + results = generate_package_documents(batch, load_template_config(), merged) + + assert results[-1].template_code == "ch1_2_directory" + assert set(counted_files) == set(page_counts) + directory = Document(results[-1].path) + directory_pages = {row.cells[0].text.strip(): row.cells[4].text.strip() for row in directory.tables[0].rows[1:]} + assert directory_pages == { + "CH1.2": "1", + "CH1.4": "3", + "CH1.5": "5", + "CH1.11.1": "2", + "CH1.11.5": "4", + "CH1.11.6": "6", + } + + +def test_generated_docx_does_not_add_prefill_or_audit_blocks(django_user_model, tmp_path): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-20260610154100-abcdef", + work_dir=str(tmp_path), + ) + merged, _summary = merge_fields({"product_name": {"value": "测试产品", "label": "产品名称"}}, {}) + + results = generate_package_documents(batch, load_template_config(), merged) + for result in results: + document = Document(result.path) + text = _document_text(document) + + assert "预生成版" not in text + assert "预生成字段" not in text + assert "component_table" not in text + assert '"header"' not in text + assert "测试产品" in text + + +def test_generated_docx_replaces_sample_case_content(django_user_model, tmp_path): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-20260610154200-abcdef", + work_dir=str(tmp_path), + ) + merged, _summary = merge_fields( + { + "product_name": {"value": "测试产品", "label": "产品名称"}, + "package_specification": {"value": "24人份/盒;48人份/盒", "label": "包装规格"}, + }, + {}, + ) + + results = generate_package_documents(batch, load_template_config(), merged) + docx_results = [result for result in results if result.actual_format == "docx"] + for result in docx_results: + document = Document(result.path) + text = "\n".join(paragraph.text for paragraph in document.paragraphs) + for table in document.tables: + for row in table.rows: + text += "\n" + "\t".join(cell.text for cell in row.cells) + assert "呼吸道合胞病毒、肺炎支原体核酸检测试剂盒" not in text + product_list = next(result for result in results if result.template_code == "ch1_5_product_list") + product_doc = Document(product_list.path) + table = product_doc.tables[0] + assert table.rows[1].cells[0].text == "24人份/盒" + assert table.rows[1].cells[1].text == "/" + assert "6018003102" not in "\n".join(cell.text for row in table.rows for cell in row.cells) + + +def test_generated_docs_fill_clean_template_body(django_user_model, tmp_path): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-20260610154300-abcdef", + work_dir=str(tmp_path), + ) + merged, _summary = merge_fields( + { + "product_name": {"value": "甲型流感病毒核酸检测试剂盒", "label": "产品名称"}, + "applicant_name": {"value": "星河医疗科技有限公司", "label": "申请人名称"}, + "package_specification": {"value": "24人份/盒;48人份/盒", "label": "包装规格"}, + "standard_no": {"value": "GB/T 29791.1-2013", "label": "标准号"}, + }, + {}, + ) + + results = generate_package_documents(batch, load_template_config(), merged) + + for code in ["ch1_2_directory", "ch1_4_application_form", "ch1_11_5_authenticity", "ch1_11_6_conformity"]: + result = next(item for item in results if item.template_code == code) + text = _document_text(Document(result.path)) + assert "甲型流感病毒核酸检测试剂盒" in text + if code == "ch1_4_application_form": + assert "星河医疗科技有限公司" in text + assert "{{" not in text + assert "}}" not in text + + today = timezone.localdate().strftime("%Y年%m月%d日") + for code in ["ch1_11_1_standards", "ch1_11_5_authenticity", "ch1_11_6_conformity"]: + result = next(item for item in results if item.template_code == code) + text = _document_text(Document(result.path)) + assert today in text + assert "xxxx年xx月xx日" not in text + assert "星河医疗科技有限公司" not in text + + product_list = next(item for item in results if item.template_code == "ch1_5_product_list") + product_text = _document_text(Document(product_list.path)) + assert "24人份/盒" in product_text + assert "48人份/盒" in product_text + + +def test_product_list_uses_component_table_from_instruction(django_user_model, tmp_path): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-20260610154400-abcdef", + work_dir=str(tmp_path), + ) + component_payload = { + "header": ["组分", "主要组成成分", "规格(24人份/盒)", "规格(48人份/盒)"], + "rows": [ + ["PCR反应液 I", "逆转录酶、Taq酶", "840μL/管×1管", "840μL/管×2管"], + ["阳性对照品", "含目的片段的假病毒", "600μL/管×2管", "1200μL/管×2管"], + ], + } + merged, _summary = merge_fields( + { + "product_name": {"value": "新型冠状病毒核酸检测试剂盒", "label": "产品名称"}, + "package_specification": {"value": "24人份/盒;48人份/盒", "label": "包装规格"}, + "component_table": { + "value": json.dumps(component_payload, ensure_ascii=False), + "label": "主要组成成分", + }, + "component_notes": { + "value": "注:不同批号试剂盒中各组分不得互换使用。", + "label": "主要组成成分备注", + }, + }, + {}, + ) + + results = generate_package_documents(batch, load_template_config(), merged) + product_list = next(result for result in results if result.template_code == "ch1_5_product_list") + document = Document(product_list.path) + text = _document_text(document) + + assert "PCR反应液 I" in text + assert "840μL/管×1管" in text + assert "840μL/管×2管" in text + assert "注:不同批号试剂盒中各组分不得互换使用。" in text + assert "RSV&MP" not in text + assert "6018003102" not in text + + +def _document_text(document: Document) -> str: + text = "\n".join(paragraph.text for paragraph in document.paragraphs) + for table in document.tables: + for row in table.rows: + text += "\n" + "\t".join(cell.text for cell in row.cells) + return text diff --git a/tests/test_regulatory_info_package_summary.py b/tests/test_regulatory_info_package_summary.py new file mode 100644 index 0000000..6575a96 --- /dev/null +++ b/tests/test_regulatory_info_package_summary.py @@ -0,0 +1,13 @@ +from review_agent.regulatory_info_package.services.summary import build_assistant_summary + + +def test_build_assistant_summary_puts_zip_first(): + exports = [ + {"file_name": "CH1.4 申请表.docx", "download_url": "/docx"}, + {"file_name": "第1章 监管信息(预生成版).zip", "download_url": "/zip", "export_type": "zip"}, + ] + + summary = build_assistant_summary(batch_no="RIP-1", exports=exports, failed_files=[]) + + assert summary.index("第1章 监管信息(预生成版).zip") < summary.index("CH1.4 申请表.docx") + diff --git a/tests/test_regulatory_info_package_template_config.py b/tests/test_regulatory_info_package_template_config.py new file mode 100644 index 0000000..ed4e132 --- /dev/null +++ b/tests/test_regulatory_info_package_template_config.py @@ -0,0 +1,46 @@ +from pathlib import Path + +import pytest + +from review_agent.regulatory_info_package.constants import DEFAULT_ZIP_NAME +from review_agent.regulatory_info_package.services.template_config import ( + compute_config_hash, + load_template_config, + validate_template_config, +) + + +def test_template_config_loads_six_templates(): + config = load_template_config() + + assert config["version"] == "regulatory_info_package_templates_v1" + assert config["zip_name"] == DEFAULT_ZIP_NAME + assert len(config["templates"]) == 6 + assert {template["code"] for template in config["templates"]} == { + "ch1_2_directory", + "ch1_4_application_form", + "ch1_5_product_list", + "ch1_11_1_standards", + "ch1_11_5_authenticity", + "ch1_11_6_conformity", + } + assert validate_template_config(config) == [] + assert compute_config_hash() + + +def test_template_config_rejects_duplicate_codes(): + config = load_template_config() + config["templates"].append(dict(config["templates"][0])) + + errors = validate_template_config(config) + + assert any("重复" in error for error in errors) + + +def test_template_config_sources_exist(): + config = load_template_config() + source_dir = Path(config["source_dir"]) + + assert source_dir.exists() + for template in config["templates"]: + assert (source_dir / template["source_file"]).exists() diff --git a/tests/test_regulatory_info_package_traceability.py b/tests/test_regulatory_info_package_traceability.py new file mode 100644 index 0000000..e80fac8 --- /dev/null +++ b/tests/test_regulatory_info_package_traceability.py @@ -0,0 +1,28 @@ +from pathlib import Path + +from openpyxl import load_workbook + +from review_agent.regulatory_info_package.schemas import MergedField +from review_agent.regulatory_info_package.services.traceability_export import save_traceability_exports + + +def test_save_traceability_exports_writes_excel_and_json(tmp_path): + fields = { + "product_name": MergedField( + key="product_name", + label="产品名称", + value="测试产品", + source="rule", + evidence="说明书", + confidence=0.9, + ) + } + + excel_path, json_path = save_traceability_exports(tmp_path, fields) + + assert excel_path.name == "traceability.xlsx" + assert json_path.name == "traceability.json" + assert json_path.exists() + workbook = load_workbook(excel_path) + assert workbook.active["A1"].value == "target_file" + diff --git a/tests/test_regulatory_info_package_trigger.py b/tests/test_regulatory_info_package_trigger.py new file mode 100644 index 0000000..2402e0a --- /dev/null +++ b/tests/test_regulatory_info_package_trigger.py @@ -0,0 +1,19 @@ +import pytest + +from review_agent.models import Conversation +from review_agent.skill_router import route_message_intent + + +pytestmark = pytest.mark.django_db + + +def test_fixed_keyword_routes_to_regulatory_info_package(django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + + route = route_message_intent(conversation, "请根据说明书生成第1章监管信息") + + assert route.action == "regulatory_info_package" + assert route.workflow_type == "regulatory_info_package" + assert route.starts_regulatory_info_package is True + diff --git a/tests/test_regulatory_info_package_views.py b/tests/test_regulatory_info_package_views.py new file mode 100644 index 0000000..9836eae --- /dev/null +++ b/tests/test_regulatory_info_package_views.py @@ -0,0 +1,140 @@ +from pathlib import Path + +import pytest + +from review_agent.models import ( + Conversation, + ExportedSummaryFile, + RegulatoryInfoPackageBatch, + WorkflowNodeRun, +) + + +pytestmark = pytest.mark.django_db + + +def test_regulatory_info_package_export_download_checks_owner(client, django_user_model, tmp_path): + owner = django_user_model.objects.create_user(username="owner", password="pass") + other = django_user_model.objects.create_user(username="other", password="pass") + conversation = Conversation.objects.create(user=owner, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=owner, + batch_no="RIP-20260610153300-abcdef", + ) + path = tmp_path / "第1章 监管信息(预生成版).zip" + path.write_bytes(b"zip-content") + exported = ExportedSummaryFile.objects.create( + batch=None, + workflow_type="regulatory_info_package", + workflow_batch_id=batch.pk, + export_category="regulatory_info_package", + export_type=ExportedSummaryFile.ExportType.ZIP, + file_name=path.name, + storage_path=str(path), + ) + + client.force_login(other) + denied = client.get(f"/api/review-agent/file-summary/exports/{exported.pk}/download/") + assert denied.status_code == 404 + + client.force_login(owner) + allowed = client.get(f"/api/review-agent/file-summary/exports/{exported.pk}/download/") + assert allowed.status_code == 200 + assert allowed["Content-Type"] == "application/zip" + + +@pytest.mark.parametrize( + ("file_name", "export_type", "expected"), + [ + ("CH1.9 产品申报前沟通的说明.doc", ExportedSummaryFile.ExportType.WORD, "application/msword"), + ( + "CH1.4 申请表.docx", + ExportedSummaryFile.ExportType.WORD, + "application/vnd.openxmlformats-officedocument.wordprocessingml.document", + ), + ("第1章 监管信息(预生成版).zip", ExportedSummaryFile.ExportType.ZIP, "application/zip"), + ], +) +def test_regulatory_info_package_download_mime_by_extension( + client, + django_user_model, + tmp_path, + file_name, + export_type, + expected, +): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no=f"RIP-20260610153400-{Path(file_name).suffix[1:] or 'zip'}", + ) + path = tmp_path / file_name + path.write_bytes(b"content") + exported = ExportedSummaryFile.objects.create( + batch=None, + workflow_type="regulatory_info_package", + workflow_batch_id=batch.pk, + export_category="generated_document", + export_type=export_type, + file_name=file_name, + storage_path=str(path), + ) + client.force_login(user) + + response = client.get(f"/api/review-agent/file-summary/exports/{exported.pk}/download/") + + assert response.status_code == 200 + assert response["Content-Type"] == expected + + +def test_regulatory_info_package_status_returns_nodes_and_zip_first(client, django_user_model, tmp_path): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = RegulatoryInfoPackageBatch.objects.create( + conversation=conversation, + user=user, + batch_no="RIP-20260610153500-abcdef", + status=RegulatoryInfoPackageBatch.Status.SUCCESS, + ) + WorkflowNodeRun.objects.create( + workflow_type="regulatory_info_package", + workflow_batch_id=batch.pk, + node_group="regulatory_info_package", + node_code="zip_export", + node_name="打包下载", + status=WorkflowNodeRun.Status.SUCCESS, + progress=100, + ) + doc = tmp_path / "CH1.4 申请表.docx" + zip_file = tmp_path / "第1章 监管信息(预生成版).zip" + doc.write_bytes(b"doc") + zip_file.write_bytes(b"zip") + ExportedSummaryFile.objects.create( + batch=None, + workflow_type="regulatory_info_package", + workflow_batch_id=batch.pk, + export_category="generated_document", + export_type=ExportedSummaryFile.ExportType.WORD, + file_name=doc.name, + storage_path=str(doc), + ) + ExportedSummaryFile.objects.create( + batch=None, + workflow_type="regulatory_info_package", + workflow_batch_id=batch.pk, + export_category="regulatory_info_package", + export_type=ExportedSummaryFile.ExportType.ZIP, + file_name=zip_file.name, + storage_path=str(zip_file), + ) + client.force_login(user) + + response = client.get(f"/api/review-agent/regulatory-info-package/{batch.pk}/status/") + + payload = response.json() + assert payload["batch"]["workflow_type"] == "regulatory_info_package" + assert payload["nodes"][0]["node_code"] == "zip_export" + assert payload["exports"][0]["export_type"] == "zip" diff --git a/tests/test_regulatory_info_package_workflow.py b/tests/test_regulatory_info_package_workflow.py new file mode 100644 index 0000000..4f2b699 --- /dev/null +++ b/tests/test_regulatory_info_package_workflow.py @@ -0,0 +1,92 @@ +from pathlib import Path + +import pytest + +from review_agent.models import Conversation, FileAttachment, Message, RegulatoryInfoPackageBatch, WorkflowNodeRun +from review_agent.regulatory_info_package.constants import ( + REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS, + WORKFLOW_TYPE, +) +from review_agent.regulatory_info_package.workflow import ( + create_regulatory_info_package_batch, + start_regulatory_info_package_workflow, +) + + +pytestmark = pytest.mark.django_db + + +def test_create_regulatory_info_package_batch_initializes_nodes(django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + + batch = create_regulatory_info_package_batch(conversation=conversation, user=user) + + assert batch.batch_no.startswith("RIP-") + assert batch.work_dir + nodes = WorkflowNodeRun.objects.filter( + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=batch.pk, + ).order_by("id") + assert [node.node_code for node in nodes] == [ + code for code, _name, _group in REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS + ] + + +def test_create_regulatory_info_package_batch_is_node_idempotent(django_user_model): + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = create_regulatory_info_package_batch(conversation=conversation, user=user) + + create_regulatory_info_package_batch(conversation=conversation, user=user, existing_batch=batch) + + assert WorkflowNodeRun.objects.filter( + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=batch.pk, + ).count() == len(REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS) + + +def test_empty_workflow_skeleton_completes(django_user_model, settings): + settings.REGULATORY_INFO_PACKAGE_ASYNC = False + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + batch = create_regulatory_info_package_batch(conversation=conversation, user=user) + + start_regulatory_info_package_workflow(batch, async_run=False) + batch.refresh_from_db() + + assert batch.status == RegulatoryInfoPackageBatch.Status.SUCCESS + assert WorkflowNodeRun.objects.filter( + workflow_type=WORKFLOW_TYPE, + workflow_batch_id=batch.pk, + status=WorkflowNodeRun.Status.SUCCESS, + ).count() == len(REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS) + + +def test_completed_workflow_appends_download_summary_message(django_user_model, settings): + settings.REGULATORY_INFO_PACKAGE_ASYNC = False + user = django_user_model.objects.create_user(username="owner", password="pass") + conversation = Conversation.objects.create(user=user, title="会话") + trigger = Message.objects.create(conversation=conversation, role=Message.Role.USER, content="根据说明书生成第1章监管信息") + source = Path("docs/0.原始材料/目标产品说明书.docx").resolve() + attachment = FileAttachment.objects.create( + conversation=conversation, + user=user, + original_name="目标产品说明书.docx", + storage_path=str(source), + file_size=source.stat().st_size, + ) + batch = create_regulatory_info_package_batch( + conversation=conversation, + user=user, + trigger_message=trigger, + source_attachment=attachment, + source_file_name=attachment.original_name, + source_storage_path=attachment.storage_path, + ) + + start_regulatory_info_package_workflow(batch, async_run=False) + + message = conversation.messages.filter(role=Message.Role.ASSISTANT, content__contains=batch.batch_no).latest("id") + assert "第1章 监管信息(预生成版).zip" in message.content + assert "/api/review-agent/file-summary/exports/" in message.content diff --git a/tests/test_regulatory_info_package_zip.py b/tests/test_regulatory_info_package_zip.py new file mode 100644 index 0000000..60e9235 --- /dev/null +++ b/tests/test_regulatory_info_package_zip.py @@ -0,0 +1,22 @@ +import zipfile + +from review_agent.regulatory_info_package.schemas import GeneratedFileResult +from review_agent.regulatory_info_package.services.zip_export import create_zip_package + + +def test_create_zip_package_includes_only_success_files(tmp_path): + success = tmp_path / "ok.docx" + failed = tmp_path / "bad.docx" + success.write_bytes(b"ok") + failed.write_bytes(b"bad") + + zip_path = create_zip_package( + tmp_path, + [ + GeneratedFileResult("ok", "ok.docx", "docx", "docx", "success", path=str(success)), + GeneratedFileResult("bad", "bad.docx", "docx", "docx", "failed", path=str(failed)), + ], + ) + + with zipfile.ZipFile(zip_path) as archive: + assert archive.namelist() == ["ok.docx"]