932 lines
28 KiB
Markdown
932 lines
28 KiB
Markdown
# 第1章监管信息材料包生成详细设计
|
||
|
||
## 文档信息
|
||
|
||
| 项目 | 内容 |
|
||
| --- | --- |
|
||
| 需求分析文档 | docs/1.需求分析/5.第1章监管信息材料包生成.md |
|
||
| 功能设计文档 | docs/2.功能设计/5.第1章监管信息材料包生成.md |
|
||
| 数据库设计文档 | docs/3.数据库设计/5.第1章监管信息材料包生成.md |
|
||
| 参考详细设计 | docs/4.详细设计/3.产品关键信息提取与申报文件自动填表.md |
|
||
| 功能名称 | 第1章监管信息材料包生成 |
|
||
| 工作流编码 | regulatory_info_package |
|
||
| 所属模块 | 审核智能体 review_agent |
|
||
| 设计日期 | 2026-06-10 |
|
||
| 设计版本 | V1.0 |
|
||
|
||
---
|
||
|
||
## 一、详细设计目标
|
||
|
||
本详细设计用于指导 `regulatory_info_package` 独立工作流开发落地。系统根据用户上传或指定的产品说明书,抽取产品关键信息,基于 `docs/0.原始材料/第1章 监管信息` 下的样例模板生成第1章监管信息材料包,并以 `第1章 监管信息(预生成版).zip` 作为对话摘要首位下载入口。
|
||
|
||
核心约束:
|
||
|
||
| 约束 | 说明 |
|
||
| --- | --- |
|
||
| 独立工作流 | 使用 `workflow_type=regulatory_info_package`,拥有独立批次、产物、通知和卡片 |
|
||
| 独立模块 | 新增 `review_agent/regulatory_info_package/`,与 `application_form_fill` 平级 |
|
||
| 模型集中 | Django 模型仍集中放在 `review_agent/models.py` |
|
||
| 输入优先级 | 用户消息指定文件名优先;其次 active 附件;再兼容最近成功文件汇总 |
|
||
| 模板固定 | 固定处理第1章监管信息 7 个模板 |
|
||
| 规则优先可演示 | 规则抽取可独立跑通;LLM 失败最多重试 3 次,失败后继续 |
|
||
| 文档并发生成 | 工作流整体串行,`generate_docs` 节点内部每个文档可独立线程并发处理 |
|
||
| `.doc` 兜底 | 优先原生 `.doc` 写入;失败后允许生成 `.docx` 兜底文件 |
|
||
| zip 只含成功文件 | zip 只打包成功或兜底成功的文件;失败文件不进入 zip |
|
||
| 高亮规则 | 缺失和 LLM-only 黄底;冲突黄底红字 |
|
||
| 追溯输出 | 用户下载 Excel;JSON 仅保存到后台 logs 目录 |
|
||
| 前端最小接入 | 不做多说明书选择 UI;不确定时通过对话反问 |
|
||
|
||
---
|
||
|
||
## 二、代码结构设计
|
||
|
||
### 2.1 目录结构
|
||
|
||
```text
|
||
review_agent/
|
||
models.py
|
||
services.py
|
||
skill_router.py
|
||
regulatory_info_package/
|
||
__init__.py
|
||
constants.py
|
||
schemas.py
|
||
storage.py
|
||
events.py
|
||
workflow.py
|
||
views.py
|
||
services/
|
||
__init__.py
|
||
input_select.py
|
||
template_config.py
|
||
template_repository.py
|
||
instruction_extract.py
|
||
field_extract.py
|
||
field_merge.py
|
||
standard_candidates.py
|
||
document_writer.py
|
||
docx_document.py
|
||
legacy_doc_document.py
|
||
package_generate.py
|
||
traceability_export.py
|
||
zip_export.py
|
||
summary.py
|
||
notifier.py
|
||
templates/
|
||
regulatory_info_package_templates_v1.yaml
|
||
prompts/
|
||
field_extract.md
|
||
```
|
||
|
||
### 2.2 文件职责
|
||
|
||
| 文件 | 职责 |
|
||
| --- | --- |
|
||
| constants.py | 工作流编码、节点定义、触发关键词、模板编码、状态常量 |
|
||
| schemas.py | dataclass 数据结构,如 `TemplateSpec`、`InstructionExtractResult`、`MergedField`、`GeneratedFileResult` |
|
||
| storage.py | 批次目录、子目录、hash、产物创建、路径安全校验 |
|
||
| events.py | 记录与序列化 `WorkflowEvent` |
|
||
| workflow.py | `RegulatoryInfoPackageWorkflowExecutor`、批次创建、工作流启动 |
|
||
| views.py | health、start、status、select-input 接口 |
|
||
| input_select.py | 根据用户消息、active 附件、文件汇总选择说明书 |
|
||
| template_config.py | YAML 加载、校验、hash |
|
||
| template_repository.py | 定位样例模板、复制到批次目录 |
|
||
| instruction_extract.py | 说明书段落、章节、表格和组成成分表解析 |
|
||
| field_extract.py | 规则抽取与 LLM 抽取并行执行,LLM 最多 3 次重试 |
|
||
| field_merge.py | 合并字段,输出缺失、LLM-only、冲突和高亮决策 |
|
||
| standard_candidates.py | 从说明书抽标准号,调用现有知识库搜索候选 |
|
||
| document_writer.py | 文档适配器接口与通用高亮策略 |
|
||
| docx_document.py | `DocxDocumentAdapter`,处理 `.docx` |
|
||
| legacy_doc_document.py | `LegacyDocDocumentAdapter`,处理 `.doc` 原生写入与 `.docx` 兜底 |
|
||
| package_generate.py | 7 个文档生成策略,多线程生成文件 |
|
||
| traceability_export.py | 生成 `exports/traceability.xlsx` 和 `logs/traceability.json` |
|
||
| zip_export.py | 生成主下载 zip,只包含成功文件 |
|
||
| summary.py | 构造助手回显,zip 链接排首位 |
|
||
| notifier.py | 写专项通知记录,并调用统一通知服务 |
|
||
|
||
---
|
||
|
||
## 三、数据模型详细设计
|
||
|
||
模型放在 `review_agent/models.py`。
|
||
|
||
### 3.1 RegulatoryInfoPackageBatch
|
||
|
||
```python
|
||
class RegulatoryInfoPackageBatch(models.Model):
|
||
class Status(models.TextChoices):
|
||
PENDING = "pending", "待执行"
|
||
RUNNING = "running", "执行中"
|
||
WAITING_USER = "waiting_user", "等待用户"
|
||
SUCCESS = "success", "成功"
|
||
PARTIAL_SUCCESS = "partial_success", "部分成功"
|
||
FAILED = "failed", "失败"
|
||
CANCELLED = "cancelled", "已取消"
|
||
```
|
||
|
||
关键字段:
|
||
|
||
| 字段 | 说明 |
|
||
| --- | --- |
|
||
| conversation | 所属对话 |
|
||
| user | 发起用户 |
|
||
| trigger_message | 触发消息 |
|
||
| source_attachment | 直接选中的说明书附件,可空 |
|
||
| source_summary_batch | 兼容文件汇总批次,可空 |
|
||
| source_summary_item_id | 文件汇总条目 ID,可空 |
|
||
| batch_no | `RIP-YYYYMMDDHHMMSS-abcdef` |
|
||
| source_file_name | 说明书原文件名 |
|
||
| source_storage_path | 说明书存储路径 |
|
||
| product_name | 抽取产品名称 |
|
||
| output_zip_name | `第1章 监管信息(预生成版).zip` |
|
||
| generated_files | 7 个文件状态 |
|
||
| missing_fields | 缺失字段 |
|
||
| llm_only_fields | LLM-only 字段 |
|
||
| conflict_fields | 冲突字段 |
|
||
| risk_notes | 风险和降级提示 |
|
||
| adapter_summary | doc/docx 适配器实际执行摘要 |
|
||
| template_config_version/hash | 模板配置版本和 hash |
|
||
| work_dir | 批次工作目录 |
|
||
| is_deleted | 软删除 |
|
||
|
||
### 3.2 RegulatoryInfoPackageArtifact
|
||
|
||
```python
|
||
class RegulatoryInfoPackageArtifact(models.Model):
|
||
class ArtifactType(models.TextChoices):
|
||
TEMPLATE_COPY = "template_copy", "模板副本"
|
||
INSTRUCTION_EXTRACT = "instruction_extract", "说明书抽取结果"
|
||
FIELD_EXTRACT_RESULT = "field_extract_result", "字段抽取结果"
|
||
MERGED_FIELDS = "merged_fields", "合并字段"
|
||
GENERATED_DOCUMENT = "generated_document", "生成文件"
|
||
TRACEABILITY = "traceability", "追溯清单"
|
||
ZIP_PACKAGE = "zip_package", "ZIP包"
|
||
NOTIFICATION_RECORD = "notification_record", "通知记录"
|
||
```
|
||
|
||
`file_format` 包含:`json`、`excel`、`docx`、`doc`、`zip`、`markdown`。
|
||
|
||
### 3.3 RegulatoryInfoPackageNotificationRecord
|
||
|
||
字段对齐自动填表通知记录:`batch`、`recipient`、`channel`、`export_ids`、`message_summary`、`send_status`、`retry_count`、`external_message_id`、`error_message`、`sent_at`、`is_deleted`。
|
||
|
||
### 3.4 ExportedSummaryFile 扩展
|
||
|
||
`ExportedSummaryFile.ExportType` 增加:
|
||
|
||
```python
|
||
ZIP = "zip", "ZIP"
|
||
```
|
||
|
||
下载 MIME 按扩展名兜底:
|
||
|
||
| 条件 | MIME |
|
||
| --- | --- |
|
||
| zip | application/zip |
|
||
| .doc | application/msword |
|
||
| .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
|
||
|
||
---
|
||
|
||
## 四、常量设计
|
||
|
||
### 4.1 工作流常量
|
||
|
||
```python
|
||
WORKFLOW_TYPE = "regulatory_info_package"
|
||
DEFAULT_ZIP_NAME = "第1章 监管信息(预生成版).zip"
|
||
|
||
REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS = [
|
||
("prepare", "准备资料", "regulatory_info_package"),
|
||
("template_copy", "复制模板", "regulatory_info_package"),
|
||
("text_extract", "抽取说明书", "regulatory_info_package"),
|
||
("field_extract", "抽取字段", "regulatory_info_package"),
|
||
("field_merge", "合并字段", "regulatory_info_package"),
|
||
("generate_docs", "生成材料", "regulatory_info_package"),
|
||
("highlight_review_items", "标记待确认", "regulatory_info_package"),
|
||
("trace_export", "追溯清单", "regulatory_info_package"),
|
||
("zip_export", "打包下载", "regulatory_info_package"),
|
||
("notify", "通知", "regulatory_info_package"),
|
||
("completed", "完成", "completed"),
|
||
]
|
||
```
|
||
|
||
### 4.2 触发关键词
|
||
|
||
```python
|
||
REGULATORY_INFO_PACKAGE_TRIGGER_KEYWORDS = [
|
||
"根据说明书生成第1章监管信息",
|
||
"生成监管信息材料包",
|
||
"从说明书生成第1章材料",
|
||
"第1章监管信息",
|
||
"监管信息材料包",
|
||
]
|
||
```
|
||
|
||
### 4.3 文件状态
|
||
|
||
```python
|
||
GENERATED_FILE_SUCCESS = "success"
|
||
GENERATED_FILE_FALLBACK_SUCCESS = "fallback_success"
|
||
GENERATED_FILE_FAILED = "failed"
|
||
GENERATED_FILE_SKIPPED = "skipped"
|
||
```
|
||
|
||
---
|
||
|
||
## 五、核心数据结构
|
||
|
||
### 5.1 TemplateSpec
|
||
|
||
```python
|
||
@dataclass(frozen=True)
|
||
class TemplateSpec:
|
||
code: str
|
||
output_name: str
|
||
source_file: str
|
||
file_format: str
|
||
strategy: str
|
||
include_in_zip: bool
|
||
require_legacy_doc_native: bool = False
|
||
fields: list[dict[str, Any]] = field(default_factory=list)
|
||
```
|
||
|
||
### 5.2 InstructionExtractResult
|
||
|
||
```python
|
||
@dataclass
|
||
class InstructionExtractResult:
|
||
source_file_name: str
|
||
paragraphs: list[str]
|
||
sections: dict[str, str]
|
||
tables: list[list[list[str]]]
|
||
component_tables: list["ComponentTable"]
|
||
front_text: str
|
||
```
|
||
|
||
### 5.3 ProductListRow
|
||
|
||
```python
|
||
@dataclass
|
||
class ProductListRow:
|
||
package_specification: str
|
||
item_no: str
|
||
composition: str
|
||
component_name: str
|
||
main_component: str
|
||
quantity: str
|
||
source_table_title: str
|
||
needs_review_fields: list[str] = field(default_factory=list)
|
||
```
|
||
|
||
其中 `item_no` 对应货号,本期固定 `/` 并黄底。
|
||
|
||
### 5.4 MergedField
|
||
|
||
```python
|
||
@dataclass
|
||
class MergedField:
|
||
key: str
|
||
label: str
|
||
value: str
|
||
source: str
|
||
evidence: str
|
||
confidence: float
|
||
highlight_reason: str = "none"
|
||
needs_review: bool = False
|
||
rule_value: str = ""
|
||
llm_value: str = ""
|
||
```
|
||
|
||
### 5.5 GeneratedFileResult
|
||
|
||
```python
|
||
@dataclass
|
||
class GeneratedFileResult:
|
||
template_code: str
|
||
file_name: str
|
||
requested_format: str
|
||
actual_format: str
|
||
status: str
|
||
path: str = ""
|
||
artifact_id: int | None = None
|
||
export_id: int | None = None
|
||
highlight_count: int = 0
|
||
missing_count: int = 0
|
||
llm_only_count: int = 0
|
||
error_message: str = ""
|
||
```
|
||
|
||
---
|
||
|
||
## 六、存储目录设计
|
||
|
||
```text
|
||
media/regulatory_info_package/{user_id}/{conversation_id}/{batch_no}/
|
||
templates/
|
||
logs/
|
||
instruction_extract.json
|
||
field_extract_result.json
|
||
merged_fields.json
|
||
doc_adapter_result.json
|
||
traceability.json
|
||
generated/
|
||
CH1.2 监管信息目录.docx
|
||
CH1.4 申请表.docx
|
||
CH1.5 产品列表.docx
|
||
CH1.9 产品申报前沟通的说明.docx
|
||
CH1.11.1 符合标准的清单.docx
|
||
CH1.11.5 真实性声明.docx
|
||
CH1.11.6 符合性声明.docx
|
||
exports/
|
||
traceability.xlsx
|
||
第1章 监管信息(预生成版).zip
|
||
```
|
||
|
||
说明:
|
||
|
||
| 目录 | 说明 |
|
||
| --- | --- |
|
||
| templates | 模板副本 |
|
||
| logs | 后台 JSON 产物,不作为用户主下载 |
|
||
| generated | 生成成功或兜底成功的单文件 |
|
||
| exports | 用户可下载的追溯 Excel 和 zip |
|
||
|
||
---
|
||
|
||
## 七、输入选择详细设计
|
||
|
||
### 7.1 选择优先级
|
||
|
||
`input_select.py` 的选择顺序:
|
||
|
||
1. 用户消息显式指定文件名时,按 active 附件名模糊匹配。
|
||
2. 当前对话 active 附件中文件名包含“说明书”的 `.docx`。
|
||
3. 当前对话 active 附件中唯一 `.docx`。
|
||
4. 最近成功 `FileSummaryBatch.items` 中包含“说明书”的 `.docx`。
|
||
5. 多候选或无候选时返回 `InputSelectionResult(status="waiting_user")`。
|
||
|
||
### 7.2 多候选处理
|
||
|
||
本期不新增在线选择弹窗。多候选时:
|
||
|
||
| 场景 | 处理 |
|
||
| --- | --- |
|
||
| 用户消息可模糊匹配唯一附件 | 直接选择 |
|
||
| 多个候选且无法确定 | 对话反问用户确认哪个说明书 |
|
||
| 无说明书 | 提示上传产品说明书 |
|
||
|
||
反问示例:
|
||
|
||
```text
|
||
我找到多个说明书候选,请回复要使用的文件名:A.docx、B.docx。
|
||
```
|
||
|
||
---
|
||
|
||
## 八、模板配置详细设计
|
||
|
||
配置路径:
|
||
|
||
```text
|
||
review_agent/regulatory_info_package/templates/regulatory_info_package_templates_v1.yaml
|
||
```
|
||
|
||
必须包含 7 个模板:
|
||
|
||
| code | source_file | strategy |
|
||
| --- | --- | --- |
|
||
| ch1_2_directory | CH1.2 监管信息目录.docx | directory |
|
||
| ch1_4_application_form | CH1.4 申请表.docx | application_form |
|
||
| ch1_5_product_list | CH1.5 产品列表.docx | product_list |
|
||
| ch1_9_pre_submission | CH1.9 产品申报前沟通的说明.doc | pre_submission |
|
||
| ch1_11_1_standard_list | CH1.11.1 符合标准的清单.docx | standard_list |
|
||
| ch1_11_5_authenticity | CH1.11.5 真实性声明.docx | authenticity_statement |
|
||
| ch1_11_6_compliance | CH1.11.6 符合性声明.docx | compliance_statement |
|
||
|
||
校验规则:
|
||
|
||
| 校验 | 说明 |
|
||
| --- | --- |
|
||
| version 必填 | 写入批次 |
|
||
| source_dir 存在 | 指向样例目录 |
|
||
| code 唯一 | 防止覆盖产物 |
|
||
| source_file 存在 | 缺失则配置错误 |
|
||
| strategy 合法 | 必须命中生成策略 |
|
||
| doc 模板标记 | `.doc` 模板需声明 `require_legacy_doc_native` |
|
||
|
||
---
|
||
|
||
## 九、字段抽取详细设计
|
||
|
||
### 9.1 规则抽取
|
||
|
||
规则抽取必须独立可用,覆盖:
|
||
|
||
| 字段 | 规则 |
|
||
| --- | --- |
|
||
| product_name | `【产品名称】` 下一段 |
|
||
| package_specification | `【包装规格】` 至下一章节 |
|
||
| intended_use | `【预期用途】` 至下一章节 |
|
||
| detection_principle | `【检测原理】` 至下一章节 |
|
||
| main_components | `【主要组成成分】` 下方表格摘要 |
|
||
| storage_condition_and_validity | `【储存条件及有效期】` 至下一章节 |
|
||
| sample_type | 样本要求章节中的“适用样本类型” |
|
||
| detection_targets | 预期用途/检测原理中的基因、病原体、靶标 |
|
||
| applicable_instruments | `【适用仪器】` 至下一章节 |
|
||
| test_method | `【检验方法】` 摘要 |
|
||
| standards | 正则抽取标准号 |
|
||
|
||
### 9.2 LLM 抽取与重试
|
||
|
||
`field_extract.py` 并行执行规则抽取和 LLM 抽取:
|
||
|
||
```text
|
||
ThreadPoolExecutor(max_workers=2)
|
||
-> rule_extract()
|
||
-> llm_extract_with_retry(max_attempts=3)
|
||
```
|
||
|
||
LLM 重试策略:
|
||
|
||
| 次数 | 间隔 |
|
||
| --- | --- |
|
||
| 第 1 次 | 立即 |
|
||
| 第 2 次 | 等待 1 秒 |
|
||
| 第 3 次 | 等待 2 秒 |
|
||
|
||
三次失败后:
|
||
|
||
| 产物 | 处理 |
|
||
| --- | --- |
|
||
| risk_notes | 增加 `llm_extract_failed` |
|
||
| logs/field_extract_result.json | 记录每次错误摘要 |
|
||
| 工作流 | 继续使用规则结果 |
|
||
|
||
LLM 不允许填企业信息、分类编码、管理类别、临床评价路径等说明书无法证明的内容。
|
||
|
||
### 9.3 字段合并
|
||
|
||
| 场景 | 写入值 | 高亮 | needs_review |
|
||
| --- | --- | --- | --- |
|
||
| rule 与 LLM 一致 | rule/LLM 值 | 否 | 否 |
|
||
| rule 与 LLM 冲突 | 规则优先或配置优先 | 黄底红字 | 是 |
|
||
| rule 缺失、LLM 命中 | LLM 值 | 黄底 | 是 |
|
||
| 全部缺失 | `/` | 黄底 | 是 |
|
||
|
||
---
|
||
|
||
## 十、文档适配器详细设计
|
||
|
||
### 10.1 统一接口
|
||
|
||
```python
|
||
class DocumentAdapter(Protocol):
|
||
def replace_text(self, old: str, new: str, *, highlight: bool = False, conflict: bool = False) -> int: ...
|
||
def fill_table_cell(self, row_label: str, value: str, *, highlight: bool = False, conflict: bool = False) -> bool: ...
|
||
def replace_table(self, marker: str, rows: list[ProductListRow], *, highlight_columns: list[str] | None = None) -> bool: ...
|
||
def save(self, path: Path) -> Path: ...
|
||
```
|
||
|
||
高亮规则:
|
||
|
||
| 类型 | 视觉 |
|
||
| --- | --- |
|
||
| missing | 黄色底色 |
|
||
| llm_only | 黄色底色 |
|
||
| conflict | 黄色底色 + 红色字体 |
|
||
|
||
### 10.2 DocxDocumentAdapter
|
||
|
||
实现能力:
|
||
|
||
| 方法 | 说明 |
|
||
| --- | --- |
|
||
| replace_text | 支持段落与表格中的文本替换,需处理 run 拆分 |
|
||
| fill_table_cell | 按行标签定位目标单元格 |
|
||
| replace_table | 重建 CH1.5 产品列表表格 |
|
||
| apply_highlight | 使用 `w:shd` 设置黄色底色 |
|
||
| apply_conflict_style | 黄色底色 + 红字 |
|
||
|
||
### 10.3 LegacyDocDocumentAdapter
|
||
|
||
接口:
|
||
|
||
```python
|
||
class AdapterCapability:
|
||
adapter_name: str
|
||
supports_native_doc_write: bool
|
||
supports_docx_fallback: bool
|
||
status: str
|
||
error_message: str = ""
|
||
|
||
class LegacyDocDocumentAdapter:
|
||
@staticmethod
|
||
def detect_available_adapter() -> AdapterCapability: ...
|
||
```
|
||
|
||
执行顺序:
|
||
|
||
1. 优先尝试 `WordComDocAdapter` 原生打开 `.doc` 并保存 `.doc`。
|
||
2. 原生失败时,尝试将 `.doc` 另存为 `.docx`,再交给 `DocxDocumentAdapter`。
|
||
3. 兜底成功时,输出 `CH1.9 产品申报前沟通的说明.docx`。
|
||
4. 原生和兜底均失败时,该文件状态为 `failed`,不进入 zip。
|
||
|
||
兜底成功 `adapter_summary.doc`:
|
||
|
||
```json
|
||
{
|
||
"requested_format": "doc",
|
||
"actual_format": "docx",
|
||
"adapter": "ConversionFallbackAdapter",
|
||
"status": "fallback_success"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 十一、材料生成详细设计
|
||
|
||
### 11.1 generate_docs 节点并发
|
||
|
||
工作流节点仍串行执行,但 `generate_docs` 内部并发生成单文件:
|
||
|
||
```python
|
||
with ThreadPoolExecutor(max_workers=min(7, len(specs))) as executor:
|
||
futures = [executor.submit(generate_one_document, spec, context) for spec in specs]
|
||
```
|
||
|
||
并发注意事项:
|
||
|
||
| 注意事项 | 说明 |
|
||
| --- | --- |
|
||
| 每个文档使用独立模板副本 | 避免并发写同一文件 |
|
||
| 共享字段只读 | `merged_fields`、`product_list_rows` 不在子线程修改 |
|
||
| 数据库写入集中处理 | 子线程返回 `GeneratedFileResult`,主线程统一写 artifact/export |
|
||
| 异常隔离 | 单文件失败不影响其他文件 |
|
||
|
||
### 11.2 7 个生成策略
|
||
|
||
| 模板 | 输出规则 |
|
||
| --- | --- |
|
||
| CH1.2 | 替换产品名;页码沿用样例 |
|
||
| CH1.4 | 填产品名、包装规格、预期用途、组成、储存有效期、方法原理;企业/分类等缺失项 `/` 黄底 |
|
||
| CH1.5 | 按样例表头重建,货号 `/` 黄底 |
|
||
| CH1.9 | 优先 `.doc` 原生写入;失败则 `.docx` 兜底;兜底失败则不输出 |
|
||
| CH1.11.1 | 说明书标准号直接写;知识库候选只作为待确认高亮/追溯 |
|
||
| CH1.11.5 | 保留正文,替换产品名,公司名 `/` 黄底,日期当天 |
|
||
| CH1.11.6 | 保留正文,替换产品名,公司名 `/` 黄底,日期当天 |
|
||
|
||
### 11.3 产品名缺失
|
||
|
||
规则和 LLM 都抽不到产品名称时:
|
||
|
||
| 项 | 处理 |
|
||
| --- | --- |
|
||
| 文件内容 | 产品名位置写 `/` 并黄底 |
|
||
| 批次状态 | 至少 `partial_success` |
|
||
| zip | 仍生成,包含成功文件 |
|
||
| 摘要 | 明确提示产品名称待确认 |
|
||
|
||
---
|
||
|
||
## 十二、追溯与 zip 设计
|
||
|
||
### 12.1 追溯 Excel
|
||
|
||
用户可下载:
|
||
|
||
```text
|
||
exports/traceability.xlsx
|
||
```
|
||
|
||
创建导出记录:
|
||
|
||
```text
|
||
export_category = traceability
|
||
export_type = excel
|
||
```
|
||
|
||
字段:
|
||
|
||
| 字段 | 说明 |
|
||
| --- | --- |
|
||
| target_file | 目标文件 |
|
||
| target_field | 目标字段 |
|
||
| final_value | 写入值 |
|
||
| extraction_source | rule、llm、missing、knowledge_candidate |
|
||
| evidence | 来源片段 |
|
||
| highlight_reason | missing、llm_only、conflict、rag_candidate |
|
||
| needs_review | 是否需复核 |
|
||
|
||
### 12.2 后台 JSON
|
||
|
||
JSON 产物仅写入 `logs/`,按需从后台查看:
|
||
|
||
```text
|
||
logs/instruction_extract.json
|
||
logs/field_extract_result.json
|
||
logs/merged_fields.json
|
||
logs/traceability.json
|
||
logs/doc_adapter_result.json
|
||
```
|
||
|
||
这些 JSON 产物写入 `RegulatoryInfoPackageArtifact`,但不作为用户主下载。
|
||
|
||
### 12.3 zip 打包
|
||
|
||
zip 文件名:
|
||
|
||
```text
|
||
第1章 监管信息(预生成版).zip
|
||
```
|
||
|
||
规则:
|
||
|
||
| 场景 | 是否进入 zip |
|
||
| --- | --- |
|
||
| 文件状态 `success` | 是 |
|
||
| 文件状态 `fallback_success` | 是 |
|
||
| 文件状态 `failed` | 否 |
|
||
| 文件状态 `skipped` | 否 |
|
||
|
||
若 `CH1.9 .doc` 兜底 `.docx` 成功,zip 中放入:
|
||
|
||
```text
|
||
CH1.9 产品申报前沟通的说明.docx
|
||
```
|
||
|
||
---
|
||
|
||
## 十三、工作流详细设计
|
||
|
||
### 13.1 批次创建
|
||
|
||
```python
|
||
def create_regulatory_info_package_batch(
|
||
*,
|
||
conversation: Conversation,
|
||
user,
|
||
trigger_message: Message | None = None,
|
||
source_attachment: FileAttachment | None = None,
|
||
source_summary_batch: FileSummaryBatch | None = None,
|
||
source_summary_item_id: int | None = None,
|
||
) -> RegulatoryInfoPackageBatch:
|
||
```
|
||
|
||
创建后初始化 `REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS`。
|
||
|
||
### 13.2 执行器
|
||
|
||
```python
|
||
class RegulatoryInfoPackageWorkflowExecutor:
|
||
def run(self) -> None: ...
|
||
def _nodes(self): ...
|
||
def _run_node(self, node: WorkflowNodeRun) -> None: ...
|
||
def _execute_node(self, node: WorkflowNodeRun) -> None: ...
|
||
```
|
||
|
||
节点执行:
|
||
|
||
| 节点 | 关键动作 |
|
||
| --- | --- |
|
||
| prepare | 确认说明书,或 waiting_user |
|
||
| template_copy | 复制 7 个模板 |
|
||
| text_extract | 抽取说明书章节和表格 |
|
||
| field_extract | 规则 + LLM 并行抽取 |
|
||
| field_merge | 合并字段、高亮决策 |
|
||
| generate_docs | 多线程生成单文件 |
|
||
| highlight_review_items | 若生成策略已完成高亮,该节点记录确认结果即可 |
|
||
| trace_export | 写 Excel 和 logs JSON |
|
||
| zip_export | 打包成功/兜底成功文件 |
|
||
| notify | 写专项通知并调用统一通知 |
|
||
| completed | 写助手摘要 |
|
||
|
||
### 13.3 状态落定
|
||
|
||
| 条件 | 状态 |
|
||
| --- | --- |
|
||
| zip 成功且 7 个文件均 success/fallback_success | success |
|
||
| zip 成功但有 failed/skipped | partial_success |
|
||
| zip 失败但至少一个单文件成功 | partial_success |
|
||
| 全部文件失败或关键输入缺失 | failed |
|
||
| 多说明书候选等待确认 | waiting_user |
|
||
|
||
---
|
||
|
||
## 十四、路由与接口详细设计
|
||
|
||
### 14.1 skill_router.py
|
||
|
||
增加:
|
||
|
||
| 项 | 内容 |
|
||
| --- | --- |
|
||
| ROUTE_ACTIONS | 加入 `regulatory_info_package` |
|
||
| SkillRoute 属性 | `starts_regulatory_info_package` |
|
||
| deterministic route | 命中触发关键词直接返回 |
|
||
| LLM prompt | action 列表加入 `regulatory_info_package` |
|
||
|
||
### 14.2 services.py
|
||
|
||
`stream_message` 增加分支:
|
||
|
||
1. 调用 `select_instruction_input(conversation, content)`。
|
||
2. 若多候选,回复反问,不启动工作流。
|
||
3. 若无候选,回复请上传说明书。
|
||
4. 若唯一候选,创建批次并启动工作流。
|
||
5. SSE 发送 `workflow_started`。
|
||
|
||
### 14.3 views.py
|
||
|
||
接口:
|
||
|
||
```text
|
||
GET /api/review-agent/regulatory-info-package/health/
|
||
POST /api/review-agent/regulatory-info-package/start/
|
||
GET /api/review-agent/regulatory-info-package/<batch_id>/status/
|
||
POST /api/review-agent/regulatory-info-package/<batch_id>/select-input/
|
||
```
|
||
|
||
`status` 返回:
|
||
|
||
| 字段 | 说明 |
|
||
| --- | --- |
|
||
| batch | 状态、产品名、缺失/LLM-only/冲突数量 |
|
||
| nodes | 节点状态 |
|
||
| generated_files | 7 个文件成功/失败/兜底状态 |
|
||
| exports | zip、单文件、Excel 下载 |
|
||
| risk_notes | 风险提示 |
|
||
| notifications | 通知 |
|
||
|
||
zip 不需要 `is_primary` 字段,前端或摘要按返回顺序把 zip 放首位。
|
||
|
||
---
|
||
|
||
## 十五、助手摘要设计
|
||
|
||
完成消息结构:
|
||
|
||
```markdown
|
||
已生成第1章监管信息材料包。
|
||
|
||
批次号:RIP-...
|
||
产品名称:...
|
||
状态:success / partial_success
|
||
|
||
主下载:[第1章 监管信息(预生成版).zip](...)
|
||
|
||
| 文件 | 状态 | 下载/原因 |
|
||
| --- | --- | --- |
|
||
| CH1.2 监管信息目录.docx | 成功 | 下载 |
|
||
| CH1.9 产品申报前沟通的说明.docx | 兜底成功 | 下载 |
|
||
| CH1.11.1 符合标准的清单.docx | 失败 | 失败原因 |
|
||
|
||
待确认:缺失项 X 个,LLM复核项 Y 个,冲突项 Z 个。
|
||
```
|
||
|
||
要求:
|
||
|
||
| 要求 | 说明 |
|
||
| --- | --- |
|
||
| zip 首位 | zip 链接必须在单文件列表之前 |
|
||
| 失败可见 | 失败文件展示状态和原因,无下载链接 |
|
||
| 兜底提示 | `.doc -> .docx` 时显示“兜底成功” |
|
||
| 待确认摘要 | 展示 missing、llm_only、conflict 数量 |
|
||
|
||
---
|
||
|
||
## 十六、前端详细设计
|
||
|
||
### 16.1 模板
|
||
|
||
`templates/home.html` 增加工具 chip:
|
||
|
||
```html
|
||
<button
|
||
class="tool-chip"
|
||
type="button"
|
||
data-prompt-template="根据说明书生成第1章监管信息"
|
||
>第1章监管信息</button>
|
||
```
|
||
|
||
`summaryPanel` 增加:
|
||
|
||
```html
|
||
data-regulatory-info-package-status-url-template="/api/review-agent/regulatory-info-package/__batch_id__/status/"
|
||
```
|
||
|
||
### 16.2 app.js
|
||
|
||
增加:
|
||
|
||
| 位置 | 处理 |
|
||
| --- | --- |
|
||
| workflow type 判断 | 支持 `regulatory_info_package` |
|
||
| 状态 URL 选择 | 使用 `data-regulatory-info-package-status-url-template` |
|
||
| 终态判断 | success、partial_success、failed、waiting_user |
|
||
| 导出展示 | 直接按 exports 返回顺序展示,zip 在后端排首位 |
|
||
|
||
### 16.3 不做选择 UI
|
||
|
||
多说明书候选时,本期不做弹窗。通过对话反问用户确认文件名。
|
||
|
||
---
|
||
|
||
## 十七、导出下载权限
|
||
|
||
`file_summary.views._export_for_user` 增加:
|
||
|
||
```python
|
||
if exported.workflow_type == "regulatory_info_package":
|
||
allowed = RegulatoryInfoPackageBatch.objects.filter(
|
||
pk=exported.workflow_batch_id,
|
||
conversation__user=user,
|
||
is_deleted=False,
|
||
).exists()
|
||
return exported if allowed else None
|
||
```
|
||
|
||
下载 content type 增加 zip 和 `.doc` 后缀判断。
|
||
|
||
---
|
||
|
||
## 十八、通知详细设计
|
||
|
||
`notifier.py`:
|
||
|
||
```python
|
||
def notify_completion(batch: RegulatoryInfoPackageBatch, exports: list[ExportedSummaryFile]) -> RegulatoryInfoPackageNotificationRecord:
|
||
```
|
||
|
||
处理:
|
||
|
||
| 步骤 | 说明 |
|
||
| --- | --- |
|
||
| 创建专项通知记录 | 写 `RegulatoryInfoPackageNotificationRecord` |
|
||
| 调用统一通知 | `dispatch_workflow_notification(build_regulatory_info_package_context(batch))` |
|
||
| 捕获异常 | 通知失败写记录和 risk_notes,不影响批次下载 |
|
||
|
||
---
|
||
|
||
## 十九、测试详细设计
|
||
|
||
| 测试文件 | 覆盖 |
|
||
| --- | --- |
|
||
| test_regulatory_info_package_models.py | 三张表、zip export type、基础关联 |
|
||
| test_regulatory_info_package_trigger.py | 固定关键词与 LLM action |
|
||
| test_regulatory_info_package_input_select.py | 文件名模糊匹配、active 附件、多候选反问 |
|
||
| test_regulatory_info_package_template_config.py | YAML 加载、模板缺失、code 唯一 |
|
||
| test_regulatory_info_package_instruction_extract.py | 说明书章节和组成表抽取 |
|
||
| test_regulatory_info_package_field_extract.py | 规则抽取、LLM 三次重试、失败降级 |
|
||
| test_regulatory_info_package_field_merge.py | missing、llm_only、conflict |
|
||
| test_regulatory_info_package_docx_writer.py | 替换、表格填充、黄底、红字 |
|
||
| test_regulatory_info_package_legacy_doc.py | adapter 探测、docx 兜底、失败状态 |
|
||
| test_regulatory_info_package_package_generate.py | 7 文件生成结果、多线程异常隔离 |
|
||
| test_regulatory_info_package_traceability.py | Excel 追溯和 logs JSON |
|
||
| test_regulatory_info_package_zip.py | zip 只包含 success/fallback_success |
|
||
| test_regulatory_info_package_workflow.py | 节点流转、partial_success、waiting_user |
|
||
| test_regulatory_info_package_views.py | start/status/download 权限 |
|
||
| test_regulatory_info_package_frontend.py | chip、卡片、状态 URL |
|
||
|
||
---
|
||
|
||
## 二十、异常处理矩阵
|
||
|
||
| 异常 | 批次状态 | 处理 |
|
||
| --- | --- | --- |
|
||
| 无说明书 | waiting_user 或不创建批次 | 提示上传说明书 |
|
||
| 多候选无法匹配 | waiting_user 或不创建批次 | 反问确认文件名 |
|
||
| 模板缺失 | failed | 列出缺失模板 |
|
||
| 规则抽取失败 | partial_success/continue | 使用 LLM 结果 |
|
||
| LLM 三次失败 | continue | 使用规则结果,写 risk_notes |
|
||
| 产品名缺失 | partial_success | 写 `/` 黄底,继续生成 zip |
|
||
| 单个 docx 文件生成失败 | partial_success | 不进入 zip,摘要展示失败 |
|
||
| CH1.9 doc 原生失败但 docx 兜底成功 | success/partial_success | 状态 fallback_success,进入 zip |
|
||
| CH1.9 doc 和 docx 兜底均失败 | partial_success | 不进入 zip,摘要展示失败 |
|
||
| traceability.xlsx 失败 | partial_success | 不阻断 zip |
|
||
| zip 失败 | partial_success | 保留单文件下载 |
|
||
| 通知失败 | 不影响主状态 | 写通知失败和 risk_notes |
|
||
|
||
---
|
||
|
||
## 二十一、设计结论
|
||
|
||
| 编号 | 结论 |
|
||
| --- | --- |
|
||
| D1 | 详细设计文档路径为 `docs/4.详细设计/5.第1章监管信息材料包生成.md` |
|
||
| D2 | 模型集中在 `review_agent/models.py`,业务模块为 `review_agent/regulatory_info_package/` |
|
||
| D3 | `.doc` 采用 A+C:优先 Word COM 原生处理,同时设计适配器层和能力探测 |
|
||
| D4 | `.doc` 原生失败时允许 `.docx` 兜底;兜底文件名为 `CH1.9 产品申报前沟通的说明.docx` |
|
||
| D5 | zip 只包含成功或兜底成功文件,失败文件不进入 zip |
|
||
| D6 | LLM 最多重试 3 次,失败后使用规则结果继续 |
|
||
| D7 | 缺失和 LLM-only 黄底,冲突黄底红字 |
|
||
| D8 | 产品列表使用 `ProductListRow`,货号固定 `/` 黄底 |
|
||
| D9 | 标准清单只复用现有知识库能力,不新增独立 RAG 流程 |
|
||
| D10 | 前端最小接入,不做说明书选择弹窗 |
|
||
| D11 | 追溯 Excel 可下载,JSON 只放后台 logs |
|
||
| D12 | 本期不新增字段级数据库表 |
|
||
| D13 | 工作流串行,文档生成节点内部可多线程 |
|
||
| D14 | 本轮只产出详细设计,不写代码、不生成迁移 |
|