Compare commits

...

38 Commits

Author SHA1 Message Date
8f7d1482ed docs: 更新V2项目与agent协作文档 2026-06-11 00:20:40 +08:00
1d0b5338a8 chore(master): 对齐V2忽略规则 2026-06-11 00:15:56 +08:00
bb2fbe272f chore(master): 保留V2环境配置文件 2026-06-11 00:15:20 +08:00
ccd6e8ef4d chore(master): 清理V2合并后的旧版遗留文件 2026-06-11 00:09:54 +08:00
64d09ec30f merge: 合并V2到master
# Conflicts:
#	.gitignore
#	README.md
#	config/asgi.py
#	config/settings.py
#	config/urls.py
#	config/wsgi.py
#	manage.py
#	requirements.txt
#	templates/base.html
#	tests/conftest.py
2026-06-11 00:08:00 +08:00
7def60f1b6 merge: 合并监管信息材料包最新代码到V2 2026-06-10 23:58:24 +08:00
9c6cad481c test(regulatory-info-package): 补充模板生成回归覆盖 2026-06-10 23:56:51 +08:00
1bf8634373 feat(regulatory-info-package): 完善目录页码与组成成分填充 2026-06-10 23:56:40 +08:00
3bcf9647a1 docs(regulatory-info-package): 更新材料包生成设计决策 2026-06-10 23:56:20 +08:00
cf4f4456c4 fix(regulatory-info-package): 使用干净字段模板生成材料包 2026-06-10 20:23:06 +08:00
b728703e67 fix(regulatory-info-package): 完成后追加下载摘要 2026-06-10 19:56:50 +08:00
6d4b519f83 test(regulatory-info-package): 覆盖材料包主链路 2026-06-10 19:50:22 +08:00
dcd829e821 feat(regulatory-info-package): 接入对话和前端卡片 2026-06-10 19:50:03 +08:00
dac8ce3c14 feat(regulatory-info-package): 实现材料包生成工作流 2026-06-10 19:49:44 +08:00
f0286264e2 feat(regulatory-info-package): 增加材料包数据模型 2026-06-10 19:49:25 +08:00
e8c2a591fe docs(project): 同步当前实现与协作约定 2026-05-30 09:25:01 +08:00
1056bf62d9 refactor(models): 补充模型与视图层中文说明 2026-05-30 00:55:45 +08:00
0de6f6b2ff refactor(django): 补充应用外壳层中文注释 2026-05-30 00:53:43 +08:00
43196f79e6 feat(audit): 补充原始模型输出展示 2026-05-30 00:51:19 +08:00
322c161818 refactor(core): 梳理模型配置与审计脱敏服务 2026-05-30 00:47:31 +08:00
ccfe5eb667 refactor(rag): 梳理文档入库与检索服务结构 2026-05-30 00:44:52 +08:00
f68b44f325 feat(tools): 增强工具注册表与内置工具能力 2026-05-30 00:39:26 +08:00
f7e0d8e4d8 feat(scenarios): 兼容非法配置并展示错误摘要 2026-05-30 00:36:31 +08:00
c57ab2f194 feat(scenarios): 增强场景摘要与题型展示 2026-05-30 00:33:34 +08:00
81f17319ff feat(audit): 增加场景筛选与日志摘要展示 2026-05-30 00:31:13 +08:00
c2b3a3b4f7 feat(documents): 增强上传反馈与状态展示 2026-05-30 00:29:03 +08:00
905067277a feat(frontend): 优化对话与管理页面展示体验 2026-05-30 00:26:18 +08:00
df45a89eb1 feat(agent-core): 补齐提示词编排与结构化解析 2026-05-30 00:20:40 +08:00
ba3f5fc584 feat(chat): 打通场景对话与结果展示 2026-05-30 00:10:47 +08:00
5c9718ddb1 feat(audit): 增加审计日志与演示数据管理 2026-05-30 00:10:26 +08:00
4a831ee2c5 feat(documents): 支持文档上传与本地RAG入库 2026-05-30 00:10:05 +08:00
7a6c110103 feat(agent-core): 增加智能编排与模型工具基础 2026-05-30 00:08:27 +08:00
35b80929b0 feat(scenarios): 支持场景配置加载与首页展示 2026-05-30 00:08:00 +08:00
6291940734 chore(config): 初始化项目配置与部署基础 2026-05-30 00:07:37 +08:00
b5ed5b6faa docs(project): 更新项目说明和实现计划 2026-05-29 23:04:01 +08:00
e24d9804ba docs(design): 补全中文设计文档体系 2026-05-29 23:02:54 +08:00
d4a236d0db docs(requirements): 统一需求文档中文命名 2026-05-29 22:58:21 +08:00
569542bdea docs: 初始化项目需求和协作文档 2026-05-29 21:09:03 +08:00
63 changed files with 4064 additions and 102 deletions

65
AGENTS.md Normal file
View File

@@ -0,0 +1,65 @@
# Agent Collaboration Guide
This guide is for Codex or other coding agents working in this repository.
## Project Summary
DEMO-AGENT V2 is a Django application for IVD registration document review. The main app is `review_agent`, with workflow modules for file summaries, regulatory review, application form filling, regulatory information package generation, knowledge-base management, and Feishu notification/question handling.
The current `master` branch is intended to match `V2`.
## Important Paths
| Path | Purpose |
| --- | --- |
| `config/settings.py` | Django settings and environment loading |
| `config/urls.py` | Page routes and included API routes |
| `review_agent/models.py` | Shared Django models |
| `review_agent/urls.py` | Review-agent API routes |
| `review_agent/file_summary/` | Attachment handling, file inventory, page count, exports |
| `review_agent/regulatory_review/` | NMPA review workflow, rules, RAG, risk and issue review |
| `review_agent/application_form_fill/` | Application form field extraction and Word filling |
| `review_agent/regulatory_info_package/` | Chapter 1 regulatory information package generation |
| `review_agent/notifications/` | Notification dispatch and Feishu adapters |
| `templates/` | Django templates |
| `static/` | Frontend CSS and JavaScript |
| `docs/` | Requirements, designs, plans, source materials |
| `tests/` | pytest suite |
## Development Rules
- Prefer the existing Django patterns in `review_agent` before introducing new abstractions.
- Keep workflow modules independent. Do not fold regulatory package, application form fill, or regulatory review logic into unrelated modules.
- Preserve user data and generated artifacts. Do not delete `media/`, `.tmp/`, `db.sqlite3`, or `.env` unless explicitly asked.
- Treat `.env` as environment-specific configuration. It is currently tracked because this project needs a complete V2 state, but do not print secret values in logs or docs.
- For Word/PDF/Excel handling, use structured libraries already in the project instead of ad hoc text parsing when possible.
- For frontend work, keep the current workbench style: restrained, task-focused, evidence-first, and consistent with existing templates and CSS.
## Common Commands
```bash
python manage.py check
python manage.py migrate
python manage.py runserver
pytest
pytest tests -k regulatory_info_package
pytest tests/test_feishu_*.py
```
## Verification Notes
Before claiming a code change is complete, run at least the narrow test set for the touched workflow. For broad changes, run `python manage.py check` and `pytest`.
Known current state:
- `python manage.py check` passes.
- `pytest tests -k regulatory_info_package` passes.
- Full `pytest` may still include a few historical failures unrelated to the latest regulatory-info-package merge; report exact failures if they remain.
## Git Notes
- Check `git status --short --branch` before editing.
- Do not reset or revert user changes unless explicitly asked.
- Keep commits grouped by logical concern: docs, feature behavior, tests, cleanup.
- When merging `V2` and `master`, remember these histories were unrelated before the merge. Prefer preserving the V2 tree when the goal is to keep `master` as the complete V2 state.

View File

@@ -1,32 +1,55 @@
# Product # Product
## Register ## Product Name
product DEMO-AGENT V2
## Users ## Users
注册资料准备、法规审核项目管理人员,在资料整理、法规核查、问题整改和申报文件填表过程中使用 注册资料准备人员、法规审核人员、项目管理人员和演示评审人员。用户通常需要在资料量大、文件格式复杂、法规要求多、证据链容易断裂的情况下快速完成资料整理、核查、整改和申报文件准备
## Product Purpose ## Product Purpose
DEMO-AGENT 是一个体外诊断试剂注册资料审核工作台。它把上传资料、文件汇总、法规规则核查、RAG 依据检索、风险预警、整改复核申报表填充组织成可追溯的工作流。 DEMO-AGENT V2 是一个体外诊断试剂注册资料审核工作台。它把上传资料、文件汇总、法规规则核查、RAG 依据检索、风险预警、整改复核申报表填充和第 1 章监管信息材料包生成组织成可追溯的工作流。
产品目标不是替代法规负责人作最终判断,而是把机械整理、跨文件检索、字段预填、问题归类和证据追溯做扎实,让负责人把精力放在判断和确认上。
## Core Workflows
| 工作流 | 目标产物 |
| --- | --- |
| 文件汇总 | 文件目录、页数、类型、批次状态、Markdown/Excel 导出 |
| 法规核查 | 缺失项、风险项、一致性问题、整改建议、复核记录 |
| 知识库管理 | 用户资料索引、内置法规资料检索、引用片段 |
| 申报表填充 | 预填申报表、字段来源、冲突和缺失提示 |
| 第 1 章监管信息材料包 | CH1.2、CH1.4、CH1.5、CH1.11 等 docx 文件和 zip |
| 飞书通知与问答 | 批次完成通知、问题模拟查询、系统入口链接 |
## Brand Personality ## Brand Personality
克制、可信、清晰。界面应服务审核任务,优先呈现状态、证据和下一步动作。 克制、可信、清晰。界面应服务审核任务,优先呈现状态、证据和下一步动作。
## Anti-references ## Anti-References
避免营销页式大标题、装饰性卡片堆叠、过度动画、过亮渐变和不必要的视觉噪声。 避免营销页式大标题、装饰性卡片堆叠、过度动画、过亮渐变和不必要的视觉噪声。不要把审核工作台做成展示型官网,也不要隐藏关键状态或证据来源。
## Design Principles ## Design Principles
- 证据优先:每个结论都应能回到来源文件、规则或检索片段。 - 证据优先:每个结论都应能回到来源文件、规则或检索片段。
- 状态清楚:批次、节点、风险、异常和导出结果要一眼可辨。 - 状态清楚:批次、节点、风险、异常和导出结果要一眼可辨。
- 操作克制:页面提供必要动作,不把审核工作做成复杂后台。 - 操作克制:页面提供必要动作,不把审核工作做成复杂后台。
- 人工确认:系统负责预处理和提示,法规负责人保留最终确认权。
- 可追溯:导出文件、消息、节点事件和问题状态都应能回到批次。
- 复用现有模式:新增页面沿用当前工作台导航、面板、表格和按钮体系。 - 复用现有模式:新增页面沿用当前工作台导航、面板、表格和按钮体系。
## Accessibility & Inclusion ## Accessibility & Inclusion
默认按 WCAG AA 方向处理对比度、键盘可访问和清晰标签。动效仅用于状态反馈,并尊重减少动态效果需求。 默认按 WCAG AA 方向处理对比度、键盘可访问和清晰标签。动效仅用于状态反馈,并尊重减少动态效果需求。
## Operational Boundaries
- `.env` 可用于本地和演示环境,但包含密钥时应限制分发范围。
- LLM、飞书、Word COM、7z、RAG 索引等外部能力必须允许 mock 或降级。
- 生成的申报和监管信息文件是预生成结果,需要人工复核后再用于正式申报。
- 默认存储使用 SQLite 和本地 `media/`,生产环境应迁移到持久化卷和受控备份。

140
README.md
View File

@@ -1,6 +1,49 @@
# DEMO-AGENT V2 # DEMO-AGENT V2
V2 是一个重置后的最小 Django 项目,仅保留基础配置和登录页面 DEMO-AGENT V2 是一个面向体外诊断试剂注册资料准备与审核的 Django 工作台。系统把资料上传、文件目录汇总、法规核查、知识库检索、风险提示、整改复核、申报表自动填充和第 1 章监管信息材料包生成组织到同一个可追溯的审核流程中
当前 `master` 已与 `V2` 内容对齐,是项目主线。
## 核心能力
| 能力 | 说明 |
| --- | --- |
| 审核工作台 | 登录后进入首页,查看对话、附件、知识库、批次和处理状态 |
| 对话式工作流 | 在 `/chat/` 中围绕当前对话上传资料、触发汇总、法规核查和生成任务 |
| 文件汇总 | 读取 PDF、Word、Excel、PowerPoint、压缩包等资料生成目录、页数、类型和导出结果 |
| NMPA 法规核查 | 基于规则、文本抽取、RAG 检索和 LLM 复核生成问题、风险和整改建议 |
| 知识库管理 | 上传管理资料、重建索引、检索引用片段,并过滤已停用或删除文档 |
| 申报表填充 | 从说明书和资料中抽取关键字段,生成预填申报表和追溯结果 |
| 第 1 章监管信息材料包 | 生成 CH1.2、CH1.4、CH1.5、CH1.11 等监管信息文件和 zip 产物 |
| 飞书通知与问答 | 支持企业自建应用消息通知,并预留飞书问答模拟命令 |
## 页面入口
| 页面 | 路径 |
| --- | --- |
| 登录页 | `http://127.0.0.1:8000/login/` |
| 首页 | `http://127.0.0.1:8000/` |
| 审核智能体 | `http://127.0.0.1:8000/chat/` |
| 知识库管理 | `http://127.0.0.1:8000/knowledge-base/` |
| 附件管理 | `http://127.0.0.1:8000/attachments/` |
| 管理后台 | `http://127.0.0.1:8000/admin/` |
## 项目结构
```text
config/ Django 配置和总路由
review_agent/ 核心业务应用
application_form_fill/ 申报表自动填充
file_summary/ 文件汇总、附件和导出
regulatory_review/ 法规核查与整改复核
regulatory_info_package/ 第 1 章监管信息材料包生成
notifications/ 飞书通知和消息适配
feishu_questions/ 飞书问答预留能力
static/ 前端脚本和样式
templates/ Django 模板
docs/ 需求、设计、开发计划和原始材料
tests/ pytest 测试
```
## 本地运行 ## 本地运行
@@ -13,65 +56,68 @@ python manage.py createsuperuser
python manage.py runserver python manage.py runserver
``` ```
访问: 项目会自动读取仓库根目录 `.env`。当前仓库保留了 V2 的 `.env` 文件;后续如果要面向外部协作,请先确认其中没有不应公开的密钥。
- 登录页http://127.0.0.1:8000/login/ ## 常用环境变量
- 首页http://127.0.0.1:8000/
- 管理后台http://127.0.0.1:8000/admin/
## 文件汇总依赖
自动汇总文件目录与页数功能使用轻量 Python 库读取 PDF、Word、Excel、PowerPoint 文件。
Docker 或生产环境如需处理 `.7z``.rar` 压缩包,还需要安装系统 `7z`/`p7zip`
命令,并确认以下命令可用:
```bash
7z
7z i
```
LibreOffice 不是必需依赖,仅作为未来增强老格式文档解析的可选能力。
上传原始文件、批次工作目录和导出文件默认存储在 Django `MEDIA_ROOT` 下的
`file_summary/users/<user_id>/<conversation_id>/` 或批次 `work_dir` 目录中。生产环境
需要把 `MEDIA_ROOT` 挂载到持久化卷,并纳入备份或归档策略。
## 飞书通知与问答预留
飞书接入使用企业自建应用/智能体的消息 API。敏感信息只允许写入本地 `.env`
或部署环境变量,不要提交真实 App Secret、tenant token、open_id 或 user_id。
常用环境变量:
| 变量名 | 用途 | | 变量名 | 用途 |
| --- | --- | | --- | --- |
| `FEISHU_NOTIFY_ENABLED` | 是否启用真实飞书通知,未启用时只写未启用记录 | | `DJANGO_SECRET_KEY` | Django secret key |
| `FEISHU_NOTIFY_CHANNEL` | 通知通道,首期使用 `feishu_api` | | `DJANGO_DEBUG` | 是否开启调试模式 |
| `DJANGO_ALLOWED_HOSTS` | 允许访问的主机列表 |
| `LLM_PROVIDER` | LLM provider 选择 |
| `LLM_API_KEY` | LLM API key |
| `LLM_BASE_URL` | OpenAI 兼容 LLM API 地址 |
| `LLM_MODEL` | 默认对话/抽取模型 |
| `SILICONFLOW_API_KEY` | SiliconFlow API key默认可复用 `LLM_API_KEY` |
| `SILICONFLOW_EMBEDDING_MODEL` | 法规 RAG 使用的 embedding 模型 |
| `SILICONFLOW_EMBEDDING_DIMENSIONS` | embedding 维度 |
| `REGULATORY_RAG_CHROMA_PATH` | 法规 RAG Chroma 存储路径 |
| `REGULATORY_RAG_COLLECTION` | 法规 RAG collection 名称 |
| `FEISHU_NOTIFY_ENABLED` | 是否启用真实飞书通知 |
| `FEISHU_APP_ID` | 飞书应用 App ID | | `FEISHU_APP_ID` | 飞书应用 App ID |
| `FEISHU_APP_SECRET` | 飞书应用 App Secret | | `FEISHU_APP_SECRET` | 飞书应用 App Secret |
| `FEISHU_DEFAULT_USER_OPEN_ID` | 默认个人接收人 open_id,优先使用 | | `FEISHU_DEFAULT_USER_OPEN_ID` | 默认飞书接收人 open_id |
| `FEISHU_DEFAULT_USER_ID` | 默认个人接收人的 user_idopen_id 为空时使用 | | `PUBLIC_BASE_URL` | 飞书消息中的系统入口根地址 |
| `FEISHU_DEFAULT_TARGET_NAME` | 默认接收人展示名,用于记录和页面展示 |
| `FEISHU_TENANT_TOKEN_CACHE_SECONDS` | tenant_access_token 缓存秒数 |
| `PUBLIC_BASE_URL` | 飞书消息中的系统入口根地址,默认 `http://127.0.0.1:8000` |
自动化测试会 mock 飞书 token API 和消息 API不请求真实飞书接口。真实发送只通过 ## 外部依赖
本地手动命令验证:
```bash Python 依赖见 `requirements.txt`,主要包括:
python manage.py send_test_feishu_notification --username owner
```
问答预留能力可用本地模拟命令验证: - Django
- PyYAML
- httpx
- chromadb
- pypdf
- python-docx
- python-pptx
- openpyxl / xlrd
- py7zr
- playwright
```bash 文件汇总支持 `.7z``.rar` 时,运行环境还需要可用的 `7z`/`p7zip` 命令。LibreOffice 不是必需依赖,仅作为后续增强老格式文档处理能力的可选项。
python manage.py feishu_question_simulate --username owner "查最新法规核查"
```
集中测试建议在补齐 `.env` 后执行: ## 常用命令
```bash ```bash
python manage.py check python manage.py check
pytest
pytest tests -k regulatory_info_package
pytest tests/test_feishu_*.py pytest tests/test_feishu_*.py
pytest tests/test_file_summary_workflow.py tests/test_regulatory_notification.py tests/test_application_form_fill_notification.py python manage.py send_test_feishu_notification --username owner
python manage.py feishu_question_simulate --username owner "查最新法规核查"
``` ```
已知情况:当前全量 `pytest` 中仍有少量历史测试与当前页面/LLM 调用策略不完全一致;监管信息材料包主链路测试已通过。
## 文档入口
- [产品说明](PRODUCT.md)
- [Agent 协作约定](AGENTS.md)
- [docs 文档索引](docs/README.md)
- [需求分析](docs/1.需求分析)
- [功能设计](docs/2.功能设计)
- [数据库设计](docs/3.数据库设计)
- [详细设计](docs/4.详细设计)
- [开发计划](docs/5.开发计划)

View File

@@ -40,10 +40,11 @@
| 6 | 尽量多填 | 对说明书中可识别的产品名称、包装规格、预期用途、组成成分、储存条件、适用仪器、样本类型、检测靶标等字段尽量填入 | | 6 | 尽量多填 | 对说明书中可识别的产品名称、包装规格、预期用途、组成成分、储存条件、适用仪器、样本类型、检测靶标等字段尽量填入 |
| 7 | 缺失项标记 | 系统新填入的缺失项使用 `/`,并设置黄色底色提醒负责人补充 | | 7 | 缺失项标记 | 系统新填入的缺失项使用 `/`,并设置黄色底色提醒负责人补充 |
| 8 | LLM-only 标记 | 代码抽取未取到但 LLM 抽取到的字段,也需要在输出文件中高亮提示人工复核 | | 8 | LLM-only 标记 | 代码抽取未取到但 LLM 抽取到的字段,也需要在输出文件中高亮提示人工复核 |
| 9 | doc 能力增强 | `.doc` 文档需要具备与 `.docx` 等价的原始处理能力,不能只依赖预转换作为唯一方案 | | 9 | 模板字段化 | 优先将样例模板整理为 Agent/代码可识别字段模板,使用内容控件 Tag 或稳定占位符,代码只填内容不手改格式 |
| 10 | zip 主输出 | 生成 `第1章 监管信息(预生成版).zip` 作为主下载入口,单文件作为辅助下载 | | 10 | doc 能力增强 | `.doc` 文档按能力驱动处理:有原生能力时优先原生写入,无原生能力时明确记录并允许 `.docx` 兜底,不静默输出未改写文件 |
| 11 | 对话唤起提示 | 在对话框底部增加本工作流的唤起提示词 | | 11 | zip 主输出 | 生成 `第1章 监管信息(预生成版).zip` 作为主下载入口,单文件作为辅助下载 |
| 12 | LLM 意图判断 | 触发判断不能只依赖固定关键词,需要引入 LLM 判断用户是否要生成第1章监管信息材料包 | | 12 | 对话唤起提示 | 在对话框底部增加本工作流的唤起提示词 |
| 13 | LLM 意图判断 | 触发判断不能只依赖固定关键词,需要引入 LLM 判断用户是否要生成第1章监管信息材料包 |
### 2.2 非本期范围 ### 2.2 非本期范围
@@ -444,5 +445,6 @@
| D9 | 需求分析文档新增为 `docs/1.需求分析/5.第1章监管信息材料包生成.md` | | D9 | 需求分析文档新增为 `docs/1.需求分析/5.第1章监管信息材料包生成.md` |
| D10 | zip 作为主入口,单文件作为辅助下载 | | D10 | zip 作为主入口,单文件作为辅助下载 |
| D11 | 对话框底部增加工作流唤起提示词 | | D11 | 对话框底部增加工作流唤起提示词 |
| D12 | `.doc` 要实现与 `.docx` 等价能力,不能只依赖转换作为需求唯一方案 | | D12 | 模板优先字段化,使用内容控件 Tag 或稳定占位符服务 Agent/代码填充,行标签定位仅作为兜底 |
| D13 | 触发判断需要引入 LLM不只依赖固定关键词 | | D13 | `.doc` 要按能力驱动实现与 `.docx` 等价能力;原生能力不可用时允许 `.docx` 兜底并明确提示 |
| D14 | 触发判断需要引入 LLM不只依赖固定关键词 |

View File

@@ -27,9 +27,10 @@
| 独立工作流 | 新增 `regulatory_info_package` 批次、节点和卡片 | | 独立工作流 | 新增 `regulatory_info_package` 批次、节点和卡片 |
| 单说明书输入 | 直接从当前对话 active 附件中选择唯一说明书;兼容最近成功文件汇总批次 | | 单说明书输入 | 直接从当前对话 active 附件中选择唯一说明书;兼容最近成功文件汇总批次 |
| 模板驱动 | 通过 YAML 配置维护 7 个模板、字段映射和生成策略 | | 模板驱动 | 通过 YAML 配置维护 7 个模板、字段映射和生成策略 |
| 模板字段化 | 优先使用 Word 内容控件 Tag 或稳定占位符,让代码只写字段值,最大限度保留原格式 |
| 规则 + LLM 并行抽取 | 代码抽取与 LLM 抽取并行,合并后写入模板 | | 规则 + LLM 并行抽取 | 代码抽取与 LLM 抽取并行,合并后写入模板 |
| 待确认高亮 | 系统新填入的 `/`、LLM-only 字段、冲突字段均高亮 | | 待确认高亮 | 系统新填入的 `/`、LLM-only 字段、冲突字段均高亮 |
| `.doc` 等价处理 | 设计 `LegacyWordDocumentService`,提供与 `.docx` 一致的文档操作接口 | | `.doc` 等价处理 | 设计 `LegacyWordDocumentService`按能力驱动提供与 `.docx` 一致的文档操作接口;原生能力不可用时明确兜底 |
| zip 主输出 | 扩展 `ExportedSummaryFile.ExportType.ZIP`,统一下载权限 | | zip 主输出 | 扩展 `ExportedSummaryFile.ExportType.ZIP`,统一下载权限 |
| LLM 意图路由 | 扩展路由 action支持固定话术和 LLM 语义判断 | | LLM 意图路由 | 扩展路由 action支持固定话术和 LLM 语义判断 |
@@ -159,7 +160,7 @@ flowchart TD
| 工作流状态 | `WorkflowNodeRun``WorkflowEvent` | 使用 `workflow_type=regulatory_info_package` | | 工作流状态 | `WorkflowNodeRun``WorkflowEvent` | 使用 `workflow_type=regulatory_info_package` |
| 模板配置 | YAML | 便于维护 7 个模板和字段映射 | | 模板配置 | YAML | 便于维护 7 个模板和字段映射 |
| `.docx` 操作 | `python-docx` | 表格、段落、run、底色和字体可控 | | `.docx` 操作 | `python-docx` | 表格、段落、run、底色和字体可控 |
| `.doc` 操作 | 适配器抽象 | Python 标准库不支持 `.doc` 二进制 Word 写入;设计为 COM/UNO/第三方库适配器 | | `.doc` 操作 | 适配器抽象 | Python 标准库不支持 `.doc` 二进制 Word 写入;设计为 COM/UNO/第三方库适配器,能力不可用时使用可追溯的 `.docx` 兜底 |
| zip 打包 | Python `zipfile` 标准库 | 标准库可满足打包需求 | | zip 打包 | Python `zipfile` 标准库 | 标准库可满足打包需求 |
| Excel 追溯 | `openpyxl` | 复用现有依赖 | | Excel 追溯 | `openpyxl` | 复用现有依赖 |
| LLM | `review_agent.llm.generate_completion` | 统一模型调用 | | LLM | `review_agent.llm.generate_completion` | 统一模型调用 |
@@ -281,10 +282,19 @@ templates:
source_file: CH1.9 产品申报前沟通的说明.doc source_file: CH1.9 产品申报前沟通的说明.doc
file_format: doc file_format: doc
strategy: pre_submission strategy: pre_submission
require_legacy_doc_native: true prefer_legacy_doc_native: true
allow_docx_fallback: true
include_in_zip: true include_in_zip: true
``` ```
字段映射优先级:
| 目标类型 | 说明 |
| --- | --- |
| content_control_tag | 正式模板优先,代码按 Word 内容控件 Tag 写入 |
| placeholder | 过渡方案,替换稳定占位符并保留原 run/段落格式 |
| table_row_label | 未字段化模板的兜底方案,必须保留原单元格格式 |
### 7.1 配置项说明 ### 7.1 配置项说明
| 配置项 | 说明 | | 配置项 | 说明 |
@@ -300,7 +310,8 @@ templates:
| strategy | 生成策略 | | strategy | 生成策略 |
| include_in_zip | 是否进入 zip | | include_in_zip | 是否进入 zip |
| fields | 字段映射与替换目标 | | fields | 字段映射与替换目标 |
| require_legacy_doc_native | `.doc` 是否要求原生处理能力 | | prefer_legacy_doc_native | `.doc` 是否优先尝试原生处理能力 |
| allow_docx_fallback | 原生 `.doc` 能力不可用或失败时是否允许 `.docx` 兜底 |
--- ---
@@ -836,7 +847,8 @@ pytest tests/test_application_form_fill_*.py tests/test_file_summary_views.py te
| 风险 | 说明 | 建议 | | 风险 | 说明 | 建议 |
| --- | --- | --- | | --- | --- | --- |
| `.doc` 原生写入难度 | Python 标准库不支持 Word `.doc` 完整写入 | 优先调研 Word COM 或 LibreOffice UNO设计适配器隔离风险 | | `.doc` 原生写入难度 | Python 标准库不支持 Word `.doc` 完整写入 | 优先调研 Word COM 或 LibreOffice UNO无原生能力时允许可追溯 `.docx` 兜底 |
| 模板字段化工作量 | 需要先把样例模板整理为代码可识别字段 | 优先覆盖 CH1.4、CH1.5 和声明类关键字段;缺少 Tag 时通过模板审计提前暴露 |
| 样例模板文本碎片 | Word run 拆分可能导致简单字符串替换失败 | 文档写入服务需支持跨 run 替换 | | 样例模板文本碎片 | Word run 拆分可能导致简单字符串替换失败 | 文档写入服务需支持跨 run 替换 |
| 产品列表结构复杂 | 说明书表格可能存在合并单元格和多规格 | 先覆盖目标说明书结构,再扩展通用表格归一化 | | 产品列表结构复杂 | 说明书表格可能存在合并单元格和多规格 | 先覆盖目标说明书结构,再扩展通用表格归一化 |
| 标准清单准确性 | 说明书未必包含标准号,知识库候选不能直接作为结论 | 候选全部高亮并进入追溯清单 | | 标准清单准确性 | 说明书未必包含标准号,知识库候选不能直接作为结论 | 候选全部高亮并进入追溯清单 |
@@ -854,7 +866,8 @@ pytest tests/test_application_form_fill_*.py tests/test_file_summary_views.py te
| D4 | 输入选择以 active 附件为主,兼容最近成功文件汇总批次 | | D4 | 输入选择以 active 附件为主,兼容最近成功文件汇总批次 |
| D5 | `ExportedSummaryFile.ExportType` 扩展 `zip` | | D5 | `ExportedSummaryFile.ExportType` 扩展 `zip` |
| D6 | 采用 YAML 配置驱动 7 个模板 | | D6 | 采用 YAML 配置驱动 7 个模板 |
| D7 | `.doc` 通过 `LegacyWordDocumentService` 适配器实现与 `.docx` 等价接口 | | D7 | 模板字段优先使用内容控件 Tag 或稳定占位符,行标签定位仅作为兜底 |
| D8 | 标准候选复用系统已有知识库/RAG不新增独立 RAG | | D8 | `.doc` 通过 `LegacyWordDocumentService` 适配器实现与 `.docx` 等价接口,原生能力不可用时允许可追溯兜底 |
| D9 | 前端只扩展现有对话页、工作流卡片、快捷提示和状态轮询 | | D9 | 标准候选复用系统已有知识库/RAG不新增独立 RAG |
| D10 | 本轮先产出功能设计;数据库设计先在本文档中给出,后续可拆成正式数据库设计文档 | | D10 | 前端只扩展现有对话页、工作流卡片、快捷提示和状态轮询 |
| D11 | 本轮先产出功能设计;数据库设计先在本文档中给出,后续可拆成正式数据库设计文档 |

View File

@@ -50,6 +50,8 @@ erDiagram
说明:`ra_workflow_node_run``ra_workflow_event``ra_exported_summary_file` 通过 `workflow_type``workflow_batch_id` 支持多工作流。本功能统一使用 `workflow_type=regulatory_info_package` 说明:`ra_workflow_node_run``ra_workflow_event``ra_exported_summary_file` 通过 `workflow_type``workflow_batch_id` 支持多工作流。本功能统一使用 `workflow_type=regulatory_info_package`
现状补充:当前通用节点表已有 `batch + node_code` 唯一约束主要服务文件汇总批次。RIP 批次不应强依赖 `FileSummaryBatch.batch`,因此实现时必须为 `workflow_type + workflow_batch_id + node_code` 增加数据库唯一约束,或在创建节点时使用同等幂等逻辑,避免同一 RIP 批次重复初始化节点。
--- ---
## 三、表结构设计 ## 三、表结构设计
@@ -211,6 +213,13 @@ erDiagram
| node_group | regulatory_info_package | | node_group | regulatory_info_package |
| batch_id | 可为空;如为兼容旧查询,不建议绑定文件汇总批次 | | batch_id | 可为空;如为兼容旧查询,不建议绑定文件汇总批次 |
幂等约束建议:
| 约束/策略 | 字段 | 说明 |
| --- | --- | --- |
| uq_ra_node_workflow_batch_code | workflow_type, workflow_batch_id, node_code | 推荐新增数据库唯一约束,防止同一 RIP 批次重复节点 |
| get_or_create 幂等 | workflow_type, workflow_batch_id, node_code | 若暂不改通用表约束,节点初始化必须使用该组合做代码层幂等 |
建议新增节点: 建议新增节点:
```text ```text
@@ -543,6 +552,7 @@ CREATE INDEX idx_ra_rip_batch_created
| JSONField 默认值 | 使用 `default=list``default=dict`,禁止使用可变对象字面量 | | JSONField 默认值 | 使用 `default=list``default=dict`,禁止使用可变对象字面量 |
| 外键删除策略 | conversation/user 使用 CASCADE输入附件和文件汇总批次建议 PROTECT 或 SET_NULL避免历史批次断链 | | 外键删除策略 | conversation/user 使用 CASCADE输入附件和文件汇总批次建议 PROTECT 或 SET_NULL避免历史批次断链 |
| `source_summary_item_id` | 当前没有强制外键到 `FileSummaryItem`,可先保存 ID后续需要强约束时再改 FK | | `source_summary_item_id` | 当前没有强制外键到 `FileSummaryItem`,可先保存 ID后续需要强约束时再改 FK |
| 工作流节点幂等 | RIP 节点不得只依赖 `WorkflowNodeRun.batch + node_code` 唯一约束;必须使用 `workflow_type + workflow_batch_id + node_code` 保证幂等 |
| `.doc` 失败记录 | `.doc` 原生适配器不可用或执行失败时必须写入 `risk_notes` 和 artifact metadata`.docx` 兜底成功则 generated_files 状态为 `fallback_success` | | `.doc` 失败记录 | `.doc` 原生适配器不可用或执行失败时必须写入 `risk_notes` 和 artifact metadata`.docx` 兜底成功则 generated_files 状态为 `fallback_success` |
| zip 主入口 | zip 导出记录的 `export_category` 固定为 `regulatory_info_package` | | zip 主入口 | zip 导出记录的 `export_category` 固定为 `regulatory_info_package` |
| 单文件下载 | 7 个生成文件也写入 `ExportedSummaryFile`,作为辅助下载 | | 单文件下载 | 7 个生成文件也写入 `ExportedSummaryFile`,作为辅助下载 |
@@ -562,8 +572,9 @@ CREATE INDEX idx_ra_rip_batch_created
| 6 | zip 导出 | `ExportedSummaryFile` 支持 `export_type=zip` | | 6 | zip 导出 | `ExportedSummaryFile` 支持 `export_type=zip` |
| 7 | 下载权限 | 非批次所属用户不能下载 RIP 导出 | | 7 | 下载权限 | 非批次所属用户不能下载 RIP 导出 |
| 8 | 节点事件 | `WorkflowNodeRun``WorkflowEvent` 可通过 `workflow_type=regulatory_info_package` 查询 | | 8 | 节点事件 | `WorkflowNodeRun``WorkflowEvent` 可通过 `workflow_type=regulatory_info_package` 查询 |
| 9 | 通知记录 | 通知成功、失败和重试次数可落库 | | 9 | 节点幂等 | 同一 `workflow_type + workflow_batch_id + node_code` 不会重复创建节点 |
| 10 | JSON 摘要 | 缺失项、LLM-only、冲突项、风险提示结构符合本文约定 | | 10 | 通知记录 | 通知成功、失败和重试次数可落库 |
| 11 | JSON 摘要 | 缺失项、LLM-only、冲突项、风险提示结构符合本文约定 |
--- ---

View File

@@ -27,11 +27,13 @@
| 独立工作流 | 使用 `workflow_type=regulatory_info_package`,拥有独立批次、产物、通知和卡片 | | 独立工作流 | 使用 `workflow_type=regulatory_info_package`,拥有独立批次、产物、通知和卡片 |
| 独立模块 | 新增 `review_agent/regulatory_info_package/`,与 `application_form_fill` 平级 | | 独立模块 | 新增 `review_agent/regulatory_info_package/`,与 `application_form_fill` 平级 |
| 模型集中 | Django 模型仍集中放在 `review_agent/models.py` | | 模型集中 | Django 模型仍集中放在 `review_agent/models.py` |
| 节点幂等 | `WorkflowNodeRun` 必须按 `workflow_type + workflow_batch_id + node_code` 幂等创建或加唯一约束 |
| 输入优先级 | 用户消息指定文件名优先;其次 active 附件;再兼容最近成功文件汇总 | | 输入优先级 | 用户消息指定文件名优先;其次 active 附件;再兼容最近成功文件汇总 |
| 模板固定 | 固定处理第1章监管信息 7 个模板 | | 模板固定 | 固定处理第1章监管信息 7 个模板 |
| 模板字段化 | 生成逻辑优先写 Word 内容控件 Tag 或稳定占位符,不以手工调整表格格式为前提 |
| 规则优先可演示 | 规则抽取可独立跑通LLM 失败最多重试 3 次,失败后继续 | | 规则优先可演示 | 规则抽取可独立跑通LLM 失败最多重试 3 次,失败后继续 |
| 文档并发生成 | 工作流整体串行,`generate_docs` 节点内部每个文档可独立线程并发处理 | | 文档并发生成 | 工作流整体串行,`generate_docs` 节点内部每个文档可独立线程并发处理 |
| `.doc` 兜底 | 优先原生 `.doc` 写入;失败允许生成 `.docx` 兜底文件 | | `.doc` 兜底 | 能力驱动:有 Word COM/UNO 时优先原生 `.doc`无原生能力或原生失败允许生成 `.docx` 兜底文件 |
| zip 只含成功文件 | zip 只打包成功或兜底成功的文件;失败文件不进入 zip | | zip 只含成功文件 | zip 只打包成功或兜底成功的文件;失败文件不进入 zip |
| 高亮规则 | 缺失和 LLM-only 黄底;冲突黄底红字 | | 高亮规则 | 缺失和 LLM-only 黄底;冲突黄底红字 |
| 追溯输出 | 用户下载 ExcelJSON 仅保存到后台 logs 目录 | | 追溯输出 | 用户下载 ExcelJSON 仅保存到后台 logs 目录 |
@@ -91,7 +93,7 @@ review_agent/
| views.py | health、start、status、select-input 接口 | | views.py | health、start、status、select-input 接口 |
| input_select.py | 根据用户消息、active 附件、文件汇总选择说明书 | | input_select.py | 根据用户消息、active 附件、文件汇总选择说明书 |
| template_config.py | YAML 加载、校验、hash | | template_config.py | YAML 加载、校验、hash |
| template_repository.py | 定位样例模板、复制到批次目录 | | template_repository.py | 定位样例模板、复制到批次目录、审计字段 Tag/占位符 |
| instruction_extract.py | 说明书段落、章节、表格和组成成分表解析 | | instruction_extract.py | 说明书段落、章节、表格和组成成分表解析 |
| field_extract.py | 规则抽取与 LLM 抽取并行执行LLM 最多 3 次重试 | | field_extract.py | 规则抽取与 LLM 抽取并行执行LLM 最多 3 次重试 |
| field_merge.py | 合并字段输出缺失、LLM-only、冲突和高亮决策 | | field_merge.py | 合并字段输出缺失、LLM-only、冲突和高亮决策 |
@@ -248,7 +250,8 @@ class TemplateSpec:
file_format: str file_format: str
strategy: str strategy: str
include_in_zip: bool include_in_zip: bool
require_legacy_doc_native: bool = False prefer_legacy_doc_native: bool = False
allow_docx_fallback: bool = True
fields: list[dict[str, Any]] = field(default_factory=list) fields: list[dict[str, Any]] = field(default_factory=list)
``` ```
@@ -414,7 +417,31 @@ review_agent/regulatory_info_package/templates/regulatory_info_package_templates
| code 唯一 | 防止覆盖产物 | | code 唯一 | 防止覆盖产物 |
| source_file 存在 | 缺失则配置错误 | | source_file 存在 | 缺失则配置错误 |
| strategy 合法 | 必须命中生成策略 | | strategy 合法 | 必须命中生成策略 |
| doc 模板标记 | `.doc` 模板需声明 `require_legacy_doc_native` | | doc 模板标记 | `.doc` 模板需声明 `prefer_legacy_doc_native`,并配置允许 `.docx` 兜底 |
### 8.1 模板字段化约定
为避免生成时破坏 Word 表格、复选框、字号、缩进和合并单元格,本工作流优先使用字段化模板:
| 方式 | 使用场景 | 说明 |
| --- | --- | --- |
| Word 内容控件 Tag | 正式模板优先 | 在 Word 中为产品名、申请人、复选框、日期、说明文字等填写区设置稳定 Tag代码按 Tag 写入 |
| 稳定占位符 | 过渡方案 | 使用 `{{ product_name }}` 等不会影响版式的占位符,代码替换占位符所在 run |
| 行标签定位 | 兜底方案 | 仅用于未字段化的旧模板,必须保留原单元格、段落和 run 格式 |
模板配置中的字段目标优先级:
```yaml
targets:
- type: content_control_tag
tag: product_name
- type: placeholder
marker: "{{ product_name }}"
- type: table_row_label
label: 产品名称
```
模板加载时必须执行字段审计:关键字段缺少 Tag/占位符时给出清晰错误或降级说明;不得静默使用会破坏格式的整格重建策略。
--- ---
@@ -504,7 +531,9 @@ class DocumentAdapter(Protocol):
| 方法 | 说明 | | 方法 | 说明 |
| --- | --- | | --- | --- |
| replace_text | 支持段落与表格中的文本替换,需处理 run 拆分 | | replace_text | 支持段落与表格中的文本替换,需处理 run 拆分 |
| fill_table_cell | 按行标签定位目标单元格 | | fill_content_control | 按内容控件 Tag 填写文本、日期或复选框 |
| replace_placeholder | 按稳定占位符替换文本,保留占位符所在 run/段落格式 |
| fill_table_cell | 按行标签定位目标单元格,仅作为未字段化模板的兜底 |
| replace_table | 重建 CH1.5 产品列表表格 | | replace_table | 重建 CH1.5 产品列表表格 |
| apply_highlight | 使用 `w:shd` 设置黄色底色 | | apply_highlight | 使用 `w:shd` 设置黄色底色 |
| apply_conflict_style | 黄色底色 + 红字 | | apply_conflict_style | 黄色底色 + 红字 |
@@ -528,10 +557,11 @@ class LegacyDocDocumentAdapter:
执行顺序: 执行顺序:
1. 优先尝试 `WordComDocAdapter` 原生打开 `.doc` 并保存 `.doc` 1. 执行能力探测Word COM、LibreOffice UNO 或其他可写 `.doc` 能力
2. 原生失败时,尝试将 `.doc` 另存为 `.docx`,再交给 `DocxDocumentAdapter` 2. 原生能力时优先尝试原生打开 `.doc` 并保存 `.doc`
3. 兜底成功时,输出 `CH1.9 产品申报前沟通的说明.docx` 3. 无原生能力或原生失败时,尝试生成同语义 `.docx` 兜底文件,再交给 `DocxDocumentAdapter`
4. 原生和兜底均失败时,该文件状态为 `failed`,不进入 zip 4. 兜底成功时,输出 `CH1.9 产品申报前沟通的说明.docx`,状态为 `fallback_success`
5. 原生和兜底均失败时,该文件状态为 `failed`,不进入 zip。
兜底成功 `adapter_summary.doc` 兜底成功 `adapter_summary.doc`
@@ -693,6 +723,7 @@ class RegulatoryInfoPackageWorkflowExecutor:
| --- | --- | | --- | --- |
| prepare | 确认说明书,或 waiting_user | | prepare | 确认说明书,或 waiting_user |
| template_copy | 复制 7 个模板 | | template_copy | 复制 7 个模板 |
| template_audit | 审计模板字段 Tag/占位符,记录缺失和降级策略 |
| text_extract | 抽取说明书章节和表格 | | text_extract | 抽取说明书章节和表格 |
| field_extract | 规则 + LLM 并行抽取 | | field_extract | 规则 + LLM 并行抽取 |
| field_merge | 合并字段、高亮决策 | | field_merge | 合并字段、高亮决策 |
@@ -917,8 +948,8 @@ def notify_completion(batch: RegulatoryInfoPackageBatch, exports: list[ExportedS
| --- | --- | | --- | --- |
| D1 | 详细设计文档路径为 `docs/4.详细设计/5.第1章监管信息材料包生成.md` | | D1 | 详细设计文档路径为 `docs/4.详细设计/5.第1章监管信息材料包生成.md` |
| D2 | 模型集中在 `review_agent/models.py`,业务模块为 `review_agent/regulatory_info_package/` | | D2 | 模型集中在 `review_agent/models.py`,业务模块为 `review_agent/regulatory_info_package/` |
| D3 | `.doc` 采用 A+C优先 Word COM 原生处理,同时设计适配器层和能力探测 | | D3 | `.doc` 采用能力驱动策略:探测 Word COM/UNO 等原生能力,有能力时优先原生处理 |
| D4 | `.doc` 原生失败时允许 `.docx` 兜底;兜底文件名为 `CH1.9 产品申报前沟通的说明.docx` | | D4 | `.doc` 无原生能力或原生失败时允许 `.docx` 兜底;兜底文件名为 `CH1.9 产品申报前沟通的说明.docx` |
| D5 | zip 只包含成功或兜底成功文件,失败文件不进入 zip | | D5 | zip 只包含成功或兜底成功文件,失败文件不进入 zip |
| D6 | LLM 最多重试 3 次,失败后使用规则结果继续 | | D6 | LLM 最多重试 3 次,失败后使用规则结果继续 |
| D7 | 缺失和 LLM-only 黄底,冲突黄底红字 | | D7 | 缺失和 LLM-only 黄底,冲突黄底红字 |
@@ -928,4 +959,5 @@ def notify_completion(batch: RegulatoryInfoPackageBatch, exports: list[ExportedS
| D11 | 追溯 Excel 可下载JSON 只放后台 logs | | D11 | 追溯 Excel 可下载JSON 只放后台 logs |
| D12 | 本期不新增字段级数据库表 | | D12 | 本期不新增字段级数据库表 |
| D13 | 工作流串行,文档生成节点内部可多线程 | | D13 | 工作流串行,文档生成节点内部可多线程 |
| D14 | 本轮只产出详细设计,不写代码、不生成迁移 | | D14 | 模板优先字段化,正式填充路径使用内容控件 Tag 或稳定占位符,行标签定位仅作为兜底 |
| D15 | 本轮只产出详细设计,不写代码、不生成迁移 |

View File

@@ -19,7 +19,9 @@
## 一、开发计划目标 ## 一、开发计划目标
本开发计划面向 Codex 执行,目标是把 `regulatory_info_package` 独立工作流按可验证、可回滚、可阶段提交的方式落地。计划以现有自动填表工作流 `application_form_fill` 为主要参考,但保持独立模块、独立批次、独立产物、独立通知和独立前端卡片。 本开发计划面向 Codex 执行,目标是把 `regulatory_info_package` 独立工作流按可验证、可回滚、可阶段验收的方式落地。计划以现有自动填表工作流 `application_form_fill` 为主要参考,但保持独立模块、独立批次、独立产物、独立通知和独立前端卡片。
现状裁决:当前最新代码中尚未存在 `regulatory_info_package` 正式工作流,本计划按“新建正式材料包工作流”执行;不得把该功能并入或改造 `application_form_fill`
开发完成后用户可在对话中上传或指定产品说明书并通过“根据说明书生成第1章监管信息”触发工作流。系统基于 `docs/0.原始材料/第1章 监管信息` 样例模板生成 7 个监管信息文件,以 `第1章 监管信息(预生成版).zip` 作为首位下载入口,同时提供单文件和追溯 Excel 辅助下载。 开发完成后用户可在对话中上传或指定产品说明书并通过“根据说明书生成第1章监管信息”触发工作流。系统基于 `docs/0.原始材料/第1章 监管信息` 样例模板生成 7 个监管信息文件,以 `第1章 监管信息(预生成版).zip` 作为首位下载入口,同时提供单文件和追溯 Excel 辅助下载。
@@ -32,18 +34,20 @@
| 工作流独立 | 新增 `workflow_type=regulatory_info_package`,不并入 `application_form_fill` | | 工作流独立 | 新增 `workflow_type=regulatory_info_package`,不并入 `application_form_fill` |
| 模块独立 | 新增 `review_agent/regulatory_info_package/`,服务与自动填表平级 | | 模块独立 | 新增 `review_agent/regulatory_info_package/`,服务与自动填表平级 |
| 模型集中 | Django 模型继续放在 `review_agent/models.py` | | 模型集中 | Django 模型继续放在 `review_agent/models.py` |
| 节点幂等 | RIP 节点必须基于 `workflow_type + workflow_batch_id + node_code` 做幂等创建或数据库唯一约束 |
| 单说明书输入 | 用户消息指定文件名优先,其次 active 附件,再兼容最近成功文件汇总 | | 单说明书输入 | 用户消息指定文件名优先,其次 active 附件,再兼容最近成功文件汇总 |
| 多候选处理 | 不做选择弹窗,通过对话反问用户确认说明书文件名 | | 多候选处理 | 不做选择弹窗,通过对话反问用户确认说明书文件名 |
| 模板固定 | 固定处理第1章监管信息 7 个模板 | | 模板固定 | 固定处理第1章监管信息 7 个模板 |
| 模板字段化 | 优先把模板整理为 Agent/代码可识别的字段模板,使用内容控件 Tag 或稳定占位符;代码只填字段,不依赖手工改格式 |
| 抽取策略 | 规则抽取和 LLM 抽取并行LLM 最多重试 3 次,失败后规则结果继续 | | 抽取策略 | 规则抽取和 LLM 抽取并行LLM 最多重试 3 次,失败后规则结果继续 |
| 文档生成 | 工作流节点串行,`generate_docs` 节点内部每个文档独立线程处理 | | 文档生成 | 工作流节点串行,`generate_docs` 节点内部每个文档独立线程处理 |
| `.doc` 策略 | CH1.9 优先原生 `.doc` 写入,失败后允许 `.docx` 兜底 | | `.doc` 策略 | CH1.9 能力驱动:探测到 Word COM/UNO 时优先原生 `.doc`,无原生能力时明确记录并允许 `.docx` 兜底 |
| zip 策略 | zip 只包含成功或兜底成功文件,失败文件不进入 zip | | zip 策略 | zip 只包含成功或兜底成功文件,失败文件不进入 zip |
| 高亮策略 | 缺失项 `/` 黄底LLM-only 黄底;冲突黄底红字 | | 高亮策略 | 缺失项 `/` 黄底LLM-only 黄底;冲突黄底红字 |
| 追溯策略 | 用户下载 ExcelJSON 只写后台 logs 目录 | | 追溯策略 | 用户下载 ExcelJSON 只写后台 logs 目录 |
| 前端策略 | 只做最小接入,不单独建设新页面或独立样式体系 | | 前端策略 | 只做最小接入,不单独建设新页面或独立样式体系 |
| TDD | 新行为先写失败测试,再实现 | | TDD | 新行为先写失败测试,再实现 |
| Git 提交 | 每阶段验证通过后生成提交摘要本地提交 | | Git 提交 | 每阶段验证通过后生成提交摘要;是否本地提交由用户确认 |
| 用户变更保护 | 不回滚、不覆盖用户已有未提交变更 | | 用户变更保护 | 不回滚、不覆盖用户已有未提交变更 |
--- ---
@@ -156,7 +160,7 @@ pytest tests/test_file_summary_views.py -k download
| 目标 | 生成数据库迁移并覆盖基础模型行为 | | 目标 | 生成数据库迁移并覆盖基础模型行为 |
| 修改范围 | `review_agent/migrations/``tests/` | | 修改范围 | `review_agent/migrations/``tests/` |
| 验收标准 | migration 可应用模型测试覆盖批次号、状态、artifact、通知、zip export type | | 验收标准 | migration 可应用模型测试覆盖批次号、状态、artifact、通知、zip export type |
| Codex 执行提示 | 请生成迁移并新增 `tests/test_regulatory_info_package_models.py`,优先覆盖模型字段默认值导出类型。 | | Codex 执行提示 | 请生成迁移并新增 `tests/test_regulatory_info_package_models.py`,优先覆盖模型字段默认值导出类型,以及 `WorkflowNodeRun` 在 RIP 批次下的幂等/唯一节点创建。 |
### RIP-1 阶段验证 ### RIP-1 阶段验证
@@ -182,10 +186,10 @@ pytest tests/test_regulatory_info_package_models.py tests/test_file_summary_view
| 项 | 内容 | | 项 | 内容 |
| --- | --- | | --- | --- |
| 目标 | 配置 7 个样例模板、输出文件名、策略和 `.doc` 标记 | | 目标 | 配置 7 个样例模板、输出文件名、策略、字段 Tag/占位符映射`.doc` 标记 |
| 修改范围 | `review_agent/regulatory_info_package/templates/regulatory_info_package_templates_v1.yaml` | | 修改范围 | `review_agent/regulatory_info_package/templates/regulatory_info_package_templates_v1.yaml` |
| 验收标准 | 7 个模板完整zip 名称为 `第1章 监管信息(预生成版).zip` | | 验收标准 | 7 个模板完整zip 名称为 `第1章 监管信息(预生成版).zip`;字段映射优先使用内容控件 Tag 或稳定占位符 |
| Codex 执行提示 | 请按详细设计录入模板配置source_dir 指向样例目录CH1.9 必须声明 `require_legacy_doc_native: true`。 | | Codex 执行提示 | 请按详细设计录入模板配置source_dir 指向样例目录,字段 targets 优先写 content_control_tag 或 placeholderCH1.9 声明 `prefer_legacy_doc_native: true` 且允许 docx fallback。 |
### RIP-2-003 实现配置加载、模板仓库和存储目录 ### RIP-2-003 实现配置加载、模板仓库和存储目录
@@ -193,8 +197,17 @@ pytest tests/test_regulatory_info_package_models.py tests/test_file_summary_view
| --- | --- | | --- | --- |
| 目标 | 实现 YAML 加载校验、模板复制、批次目录创建、路径安全检查 | | 目标 | 实现 YAML 加载校验、模板复制、批次目录创建、路径安全检查 |
| 修改范围 | `template_config.py``template_repository.py``storage.py` | | 修改范围 | `template_config.py``template_repository.py``storage.py` |
| 验收标准 | 配置错误可返回清晰错误;模板只复制到批次目录;不写原始材料目录 | | 验收标准 | 配置错误可返回清晰错误;模板只复制到批次目录;不写原始材料目录;能审计模板是否包含所需 Tag/占位符 |
| Codex 执行提示 | 请实现配置加载模板复制服务,所有路径必须校验位于批次工作目录内,原始模板目录只读。 | | Codex 执行提示 | 请实现配置加载模板复制和模板字段审计服务,所有路径必须校验位于批次工作目录内,原始模板目录只读。 |
### RIP-2-004 模板字段化整理与审计
| 项 | 内容 |
| --- | --- |
| 目标 | 将样例模板升级为代码友好的字段模板,不手工改生成文件格式 |
| 修改范围 | `docs/0.原始材料/第1章 监管信息` 的模板副本或 `review_agent/regulatory_info_package/templates/field_manifest.yaml` |
| 验收标准 | CH1.4 关键字段、复选框、声明类产品名/申请人位置有稳定 Tag 或占位符;审计缺失字段时测试失败 |
| Codex 执行提示 | 请优先使用 Word 内容控件 Tag若暂不具备内容控件编辑能力则使用不会影响版式的稳定占位符并在配置中记录字段与目标位置。 |
### RIP-2 阶段验证 ### RIP-2 阶段验证
@@ -380,8 +393,8 @@ pytest tests/test_regulatory_info_package_docx_writer.py tests/test_regulatory_i
| --- | --- | | --- | --- |
| 目标 | 探测 Word COM、LibreOffice UNO 或可用兜底能力 | | 目标 | 探测 Word COM、LibreOffice UNO 或可用兜底能力 |
| 修改范围 | `services/legacy_doc_document.py` | | 修改范围 | `services/legacy_doc_document.py` |
| 验收标准 | 当前环境无原生能力时返回清晰 capability不崩溃 | | 验收标准 | 当前环境无原生能力时返回清晰 capability不崩溃;测试不要求本机必须安装 Word 或 LibreOffice |
| Codex 执行提示 | 请先实现能力探测和接口骨架Windows Word COM 可作为优先实现;不可用时进入 docx 兜底。 | | Codex 执行提示 | 请先实现能力探测和接口骨架Windows Word COM/LibreOffice UNO 可作为原生能力;不可用时明确进入 docx 兜底。 |
### RIP-7-002 实现 CH1.9 原生写入与 docx 兜底 ### RIP-7-002 实现 CH1.9 原生写入与 docx 兜底
@@ -389,8 +402,8 @@ pytest tests/test_regulatory_info_package_docx_writer.py tests/test_regulatory_i
| --- | --- | | --- | --- |
| 目标 | CH1.9 优先 `.doc` 输出,失败时生成同语义 `.docx` | | 目标 | CH1.9 优先 `.doc` 输出,失败时生成同语义 `.docx` |
| 修改范围 | `legacy_doc_document.py``package_generate.py` | | 修改范围 | `legacy_doc_document.py``package_generate.py` |
| 验收标准 | 原生成功状态 success兜底成功状态 fallback_success两者失败不进入 zip | | 验收标准 | 有原生能力时原生成功状态 success无原生能力或原生失败但兜底成功状态 fallback_success两者失败不进入 zip |
| Codex 执行提示 | 请把原生失败和兜底失败都写入 `adapter_summary``risk_notes`,不要静默转换。 | | Codex 执行提示 | 请把能力探测、原生失败和兜底失败都写入 `adapter_summary``risk_notes`,不要静默转换。 |
### RIP-7-003 补充 doc 适配器测试 ### RIP-7-003 补充 doc 适配器测试
@@ -565,9 +578,9 @@ pytest tests/test_regulatory_info_package_models.py tests/test_regulatory_info_p
| 用户变更保护 | 不得回滚或覆盖用户已有未提交变更 | | 用户变更保护 | 不得回滚或覆盖用户已有未提交变更 |
| 过程日志 | 每阶段记录关键命令结果和既有失败 | | 过程日志 | 每阶段记录关键命令结果和既有失败 |
| 阶段验证 | 每阶段完成后运行对应验证命令 | | 阶段验证 | 每阶段完成后运行对应验证命令 |
| 阶段提交 | 每阶段验证通过后生成提交摘要并本地提交 | | 阶段提交 | 每阶段验证通过后生成提交摘要;是否执行 `git commit` 由用户确认 |
| 回归保护 | 文件汇总、法规核查、自动填表现有测试不得回归 | | 回归保护 | 文件汇总、法规核查、自动填表现有测试不得回归 |
| doc 风险隔离 | `.doc` 原生处理失败不得阻断其他 6 个 docx 文件生成 | | doc 风险隔离 | `.doc` 原生能力不可用或原生处理失败不得阻断其他 6 个 docx 文件生成 |
| 外部依赖隔离 | LLM、通知、Word COM 均需可 mock测试不依赖真实外部服务 | | 外部依赖隔离 | LLM、通知、Word COM 均需可 mock测试不依赖真实外部服务 |
| 下载安全 | 所有导出下载必须通过所属用户权限校验 | | 下载安全 | 所有导出下载必须通过所属用户权限校验 |
@@ -588,7 +601,7 @@ pytest tests/test_regulatory_info_package_models.py tests/test_regulatory_info_p
5. 不回滚、不覆盖用户已有未提交变更。 5. 不回滚、不覆盖用户已有未提交变更。
6. LLM、通知、Word COM 等外部能力必须可 mock。 6. LLM、通知、Word COM 等外部能力必须可 mock。
7. 每阶段完成后运行该阶段验证命令。 7. 每阶段完成后运行该阶段验证命令。
8. 验证通过后生成提交摘要本地提交。 8. 验证通过后生成提交摘要,是否本地提交等待用户确认
9. 最后使用 docs/0.原始材料/目标产品说明书.docx 做端到端验收。 9. 最后使用 docs/0.原始材料/目标产品说明书.docx 做端到端验收。
``` ```

34
docs/README.md Normal file
View File

@@ -0,0 +1,34 @@
# Documentation Index
This directory keeps the working documents for DEMO-AGENT V2. The docs are organized by project phase rather than by code module.
## Main Sections
| Directory | Purpose |
| --- | --- |
| `0.原始材料/` | Source materials, templates, sample instructions, regulatory references |
| `1.需求分析/` | Requirement analysis for each workflow |
| `2.功能设计/` | Functional design and user-facing behavior |
| `3.数据库设计/` | Data model and persistence design |
| `4.详细设计/` | Module-level design, services, workflow details |
| `5.开发计划/` | Implementation plans and staged delivery notes |
| `6.待办计划/` | Deferred items |
| `7.汇报材料/` | Presentation and reporting material |
## Workflow Documents
| Workflow | Requirement | Functional Design | Detailed Design | Plan |
| --- | --- | --- | --- | --- |
| 自动汇总 | `1.需求分析/1.自动汇总.md` | `2.功能设计/1.自动汇总.md` | `4.详细设计/1.自动汇总.md` | `5.开发计划/1.自动汇总.md` |
| NMPA 注册资料法规核查 | `1.需求分析/2.NMPA注册资料法规核查与整改闭环.md` | `2.功能设计/2.NMPA注册资料法规核查与整改闭环.md` | `4.详细设计/2.NMPA注册资料法规核查与整改闭环.md` | `5.开发计划/2.NMPA注册资料法规核查与整改闭环-第一批主链路.md` |
| 申报文件自动填表 | `1.需求分析/3.产品关键信息提取与申报文件自动填表.md` | `2.功能设计/3.产品关键信息提取与申报文件自动填表.md` | `4.详细设计/3.产品关键信息提取与申报文件自动填表.md` | `5.开发计划/3.产品关键信息提取与申报文件自动填表.md` |
| 飞书通知与问答 | `1.需求分析/4.飞书通知与问答接入.md` | `2.功能设计/4.飞书通知与问答接入.md` | `4.详细设计/4.飞书通知与问答接入.md` | `5.开发计划/4.飞书通知与问答接入.md` |
| 第 1 章监管信息材料包 | `1.需求分析/5.第1章监管信息材料包生成.md` | `2.功能设计/5.第1章监管信息材料包生成.md` | `4.详细设计/5.第1章监管信息材料包生成.md` | `5.开发计划/5.第1章监管信息材料包生成.md` |
## Maintenance Notes
- Keep README-level docs aligned with current `master`.
- When a workflow changes behavior, update the requirement/design/plan document closest to that behavior.
- Do not paste secrets from `.env` into docs.
- Prefer concrete file paths, command examples, and verification notes over broad prose.

View File

@@ -14,6 +14,7 @@ from review_agent.models import (
ExportedSummaryFile, ExportedSummaryFile,
FileAttachment, FileAttachment,
Message, Message,
RegulatoryInfoPackageBatch,
RegulatoryReviewBatch, RegulatoryReviewBatch,
) )
from review_agent.models import FileSummaryBatch, WorkflowEvent from review_agent.models import FileSummaryBatch, WorkflowEvent
@@ -304,14 +305,20 @@ def export_download(request, export_id: int):
extra={"export_id": exported.pk, "storage_path": exported.storage_path}, extra={"export_id": exported.pk, "storage_path": exported.storage_path},
) )
return JsonResponse({"error": "文件不存在。"}, status=404) return JsonResponse({"error": "文件不存在。"}, status=404)
suffix = Path(exported.file_name).suffix.lower()
content_types = { content_types = {
ExportedSummaryFile.ExportType.MARKDOWN: "text/markdown; charset=utf-8", ExportedSummaryFile.ExportType.MARKDOWN: "text/markdown; charset=utf-8",
ExportedSummaryFile.ExportType.EXCEL: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", ExportedSummaryFile.ExportType.EXCEL: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
ExportedSummaryFile.ExportType.JSON: "application/json; charset=utf-8", ExportedSummaryFile.ExportType.JSON: "application/json; charset=utf-8",
ExportedSummaryFile.ExportType.WORD: "application/vnd.openxmlformats-officedocument.wordprocessingml.document", ExportedSummaryFile.ExportType.WORD: "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
ExportedSummaryFile.ExportType.PDF: "application/pdf", ExportedSummaryFile.ExportType.PDF: "application/pdf",
ExportedSummaryFile.ExportType.ZIP: "application/zip",
} }
content_type = content_types.get(exported.export_type, "application/octet-stream") content_type = content_types.get(exported.export_type, "application/octet-stream")
if exported.export_type == ExportedSummaryFile.ExportType.WORD and suffix == ".doc":
content_type = "application/msword"
elif exported.export_type == ExportedSummaryFile.ExportType.WORD and suffix == ".docx":
content_type = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
logger.info( logger.info(
"Export download started", "Export download started",
extra={ extra={
@@ -342,6 +349,17 @@ def _export_for_user(user, export_id: int) -> ExportedSummaryFile | None:
is_deleted=False, is_deleted=False,
).exists() ).exists()
return exported if allowed else None return exported if allowed else None
if exported.workflow_type == "regulatory_info_package":
if not exported.workflow_batch_id:
return None
allowed = RegulatoryInfoPackageBatch.objects.filter(
pk=exported.workflow_batch_id,
conversation__user=user,
is_deleted=False,
).exists()
return exported if allowed else None
if exported.batch_id is None:
return None
if exported.batch.user_id != user.pk: if exported.batch.user_id != user.pk:
return None return None
return exported return exported

View File

@@ -0,0 +1,388 @@
# Generated by Django 5.2.14 on 2026-06-10 11:12
import django.db.models.deletion
from django.conf import settings
from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [
("review_agent", "0008_knowledgebasedocument"),
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
]
operations = [
migrations.CreateModel(
name="RegulatoryInfoPackageArtifact",
fields=[
(
"id",
models.BigAutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
(
"artifact_type",
models.CharField(
choices=[
("template_copy", "模板副本"),
("instruction_extract", "说明书抽取结果"),
("field_extract_result", "字段抽取结果"),
("merged_fields", "合并字段"),
("generated_document", "生成文件"),
("traceability", "追溯清单"),
("zip_package", "ZIP包"),
("notification_record", "通知记录"),
],
max_length=60,
),
),
(
"file_format",
models.CharField(
choices=[
("json", "JSON"),
("excel", "Excel"),
("docx", "DOCX"),
("doc", "DOC"),
("zip", "ZIP"),
("markdown", "Markdown"),
],
max_length=20,
),
),
("name", models.CharField(max_length=160)),
("file_name", models.CharField(max_length=255)),
("storage_path", models.CharField(max_length=500)),
("file_size", models.BigIntegerField(default=0)),
(
"content_hash",
models.CharField(blank=True, default="", max_length=128),
),
("metadata", models.JSONField(blank=True, default=dict)),
(
"created_by_node",
models.CharField(blank=True, default="", max_length=60),
),
("created_at", models.DateTimeField(auto_now_add=True)),
("is_deleted", models.BooleanField(default=False)),
],
options={
"db_table": "ra_regulatory_info_package_artifact",
"ordering": ["-created_at", "-id"],
},
),
migrations.CreateModel(
name="RegulatoryInfoPackageBatch",
fields=[
(
"id",
models.BigAutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
(
"source_summary_item_id",
models.PositiveBigIntegerField(blank=True, null=True),
),
("batch_no", models.CharField(max_length=64, unique=True)),
(
"status",
models.CharField(
choices=[
("pending", "待执行"),
("running", "执行中"),
("waiting_user", "等待用户"),
("success", "成功"),
("partial_success", "部分成功"),
("failed", "失败"),
("cancelled", "已取消"),
],
default="pending",
max_length=30,
),
),
(
"source_file_name",
models.CharField(blank=True, default="", max_length=255),
),
(
"source_storage_path",
models.CharField(blank=True, default="", max_length=500),
),
(
"product_name",
models.CharField(blank=True, default="", max_length=200),
),
(
"output_zip_name",
models.CharField(
blank=True,
default="第1章 监管信息(预生成版).zip",
max_length=255,
),
),
("generated_files", models.JSONField(blank=True, default=list)),
("missing_fields", models.JSONField(blank=True, default=list)),
("llm_only_fields", models.JSONField(blank=True, default=list)),
("conflict_fields", models.JSONField(blank=True, default=list)),
("risk_notes", models.JSONField(blank=True, default=list)),
(
"template_config_version",
models.CharField(blank=True, default="", max_length=80),
),
(
"template_config_hash",
models.CharField(blank=True, default="", max_length=128),
),
("adapter_summary", models.JSONField(blank=True, default=dict)),
("work_dir", models.CharField(blank=True, default="", max_length=500)),
("error_message", models.TextField(blank=True, default="")),
("created_at", models.DateTimeField(auto_now_add=True)),
("started_at", models.DateTimeField(blank=True, null=True)),
("finished_at", models.DateTimeField(blank=True, null=True)),
("archived_at", models.DateTimeField(blank=True, null=True)),
("is_deleted", models.BooleanField(default=False)),
],
options={
"db_table": "ra_regulatory_info_package_batch",
"ordering": ["-created_at", "-id"],
},
),
migrations.CreateModel(
name="RegulatoryInfoPackageNotificationRecord",
fields=[
(
"id",
models.BigAutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
(
"channel",
models.CharField(
choices=[
("feishu_cli", "飞书 CLI"),
("feishu_api", "飞书 API"),
("mock", "模拟"),
],
default="mock",
max_length=30,
),
),
("export_ids", models.JSONField(blank=True, default=list)),
("message_summary", models.TextField(blank=True, default="")),
(
"send_status",
models.CharField(
choices=[
("pending", "待发送"),
("success", "成功"),
("failed", "失败"),
],
default="pending",
max_length=20,
),
),
("retry_count", models.PositiveIntegerField(default=0)),
(
"external_message_id",
models.CharField(blank=True, default="", max_length=120),
),
("error_message", models.TextField(blank=True, default="")),
("sent_at", models.DateTimeField(blank=True, null=True)),
("created_at", models.DateTimeField(auto_now_add=True)),
("updated_at", models.DateTimeField(auto_now=True)),
("is_deleted", models.BooleanField(default=False)),
],
options={
"db_table": "ra_regulatory_info_package_notification_record",
"ordering": ["-created_at", "-id"],
},
),
migrations.AlterField(
model_name="exportedsummaryfile",
name="batch",
field=models.ForeignKey(
blank=True,
null=True,
on_delete=django.db.models.deletion.CASCADE,
related_name="exports",
to="review_agent.filesummarybatch",
),
),
migrations.AlterField(
model_name="exportedsummaryfile",
name="export_type",
field=models.CharField(
choices=[
("markdown", "Markdown"),
("excel", "Excel"),
("json", "JSON"),
("word", "Word"),
("pdf", "PDF"),
("zip", "ZIP"),
],
max_length=20,
),
),
migrations.AddConstraint(
model_name="workflownoderun",
constraint=models.UniqueConstraint(
fields=("workflow_type", "workflow_batch_id", "node_code"),
name="uq_ra_node_workflow_batch_code",
),
),
migrations.AddField(
model_name="regulatoryinfopackagebatch",
name="conversation",
field=models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name="regulatory_info_package_batches",
to="review_agent.conversation",
),
),
migrations.AddField(
model_name="regulatoryinfopackagebatch",
name="source_attachment",
field=models.ForeignKey(
blank=True,
null=True,
on_delete=django.db.models.deletion.SET_NULL,
related_name="regulatory_info_package_batches",
to="review_agent.fileattachment",
),
),
migrations.AddField(
model_name="regulatoryinfopackagebatch",
name="source_summary_batch",
field=models.ForeignKey(
blank=True,
null=True,
on_delete=django.db.models.deletion.SET_NULL,
related_name="regulatory_info_package_batches",
to="review_agent.filesummarybatch",
),
),
migrations.AddField(
model_name="regulatoryinfopackagebatch",
name="trigger_message",
field=models.ForeignKey(
blank=True,
null=True,
on_delete=django.db.models.deletion.SET_NULL,
related_name="triggered_regulatory_info_package_batches",
to="review_agent.message",
),
),
migrations.AddField(
model_name="regulatoryinfopackagebatch",
name="user",
field=models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name="review_regulatory_info_package_batches",
to=settings.AUTH_USER_MODEL,
),
),
migrations.AddField(
model_name="regulatoryinfopackageartifact",
name="batch",
field=models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name="artifacts",
to="review_agent.regulatoryinfopackagebatch",
),
),
migrations.AddField(
model_name="regulatoryinfopackagenotificationrecord",
name="batch",
field=models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name="notifications",
to="review_agent.regulatoryinfopackagebatch",
),
),
migrations.AddField(
model_name="regulatoryinfopackagenotificationrecord",
name="recipient",
field=models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name="regulatory_info_package_notifications",
to=settings.AUTH_USER_MODEL,
),
),
migrations.AddIndex(
model_name="regulatoryinfopackagebatch",
index=models.Index(
fields=["conversation", "status"], name="idx_ra_rip_batch_conv_status"
),
),
migrations.AddIndex(
model_name="regulatoryinfopackagebatch",
index=models.Index(
fields=["user", "created_at"], name="idx_ra_rip_batch_user_created"
),
),
migrations.AddIndex(
model_name="regulatoryinfopackagebatch",
index=models.Index(
fields=["source_attachment"], name="idx_ra_rip_batch_attachment"
),
),
migrations.AddIndex(
model_name="regulatoryinfopackagebatch",
index=models.Index(
fields=["source_summary_batch"], name="idx_ra_rip_batch_summary"
),
),
migrations.AddIndex(
model_name="regulatoryinfopackagebatch",
index=models.Index(fields=["created_at"], name="idx_ra_rip_batch_created"),
),
migrations.AddIndex(
model_name="regulatoryinfopackageartifact",
index=models.Index(
fields=["batch", "artifact_type"], name="idx_ra_rip_artifact_batch_type"
),
),
migrations.AddIndex(
model_name="regulatoryinfopackageartifact",
index=models.Index(
fields=["file_format"], name="idx_ra_rip_artifact_format"
),
),
migrations.AddIndex(
model_name="regulatoryinfopackageartifact",
index=models.Index(
fields=["created_at"], name="idx_ra_rip_artifact_created"
),
),
migrations.AddIndex(
model_name="regulatoryinfopackagenotificationrecord",
index=models.Index(
fields=["batch", "created_at"], name="idx_ra_rip_notify_batch"
),
),
migrations.AddIndex(
model_name="regulatoryinfopackagenotificationrecord",
index=models.Index(
fields=["recipient", "send_status"], name="idx_ra_rip_notify_recipient"
),
),
migrations.AddIndex(
model_name="regulatoryinfopackagenotificationrecord",
index=models.Index(
fields=["send_status", "retry_count"], name="idx_ra_rip_notify_status"
),
),
]

View File

@@ -280,7 +280,11 @@ class WorkflowNodeRun(models.Model):
class Meta: class Meta:
db_table = "ra_workflow_node_run" db_table = "ra_workflow_node_run"
constraints = [ constraints = [
models.UniqueConstraint(fields=["batch", "node_code"], name="uq_ra_node_batch_code") models.UniqueConstraint(fields=["batch", "node_code"], name="uq_ra_node_batch_code"),
models.UniqueConstraint(
fields=["workflow_type", "workflow_batch_id", "node_code"],
name="uq_ra_node_workflow_batch_code",
),
] ]
indexes = [ indexes = [
models.Index(fields=["batch", "status"], name="idx_ra_node_batch_status"), models.Index(fields=["batch", "status"], name="idx_ra_node_batch_status"),
@@ -336,6 +340,7 @@ class ExportedSummaryFile(models.Model):
JSON = "json", "JSON" JSON = "json", "JSON"
WORD = "word", "Word" WORD = "word", "Word"
PDF = "pdf", "PDF" PDF = "pdf", "PDF"
ZIP = "zip", "ZIP"
class Status(models.TextChoices): class Status(models.TextChoices):
SUCCESS = "success", "成功" SUCCESS = "success", "成功"
@@ -345,6 +350,8 @@ class ExportedSummaryFile(models.Model):
FileSummaryBatch, FileSummaryBatch,
on_delete=models.CASCADE, on_delete=models.CASCADE,
related_name="exports", related_name="exports",
null=True,
blank=True,
) )
workflow_type = models.CharField(max_length=40, blank=True, default="file_summary") workflow_type = models.CharField(max_length=40, blank=True, default="file_summary")
workflow_batch_id = models.PositiveBigIntegerField(null=True, blank=True) workflow_batch_id = models.PositiveBigIntegerField(null=True, blank=True)
@@ -524,6 +531,87 @@ class ApplicationFormFillBatch(models.Model):
return self.batch_no return self.batch_no
class RegulatoryInfoPackageBatch(models.Model):
"""Tracks one Chapter 1 regulatory information package workflow run."""
class Status(models.TextChoices):
PENDING = "pending", "待执行"
RUNNING = "running", "执行中"
WAITING_USER = "waiting_user", "等待用户"
SUCCESS = "success", "成功"
PARTIAL_SUCCESS = "partial_success", "部分成功"
FAILED = "failed", "失败"
CANCELLED = "cancelled", "已取消"
conversation = models.ForeignKey(
Conversation,
on_delete=models.CASCADE,
related_name="regulatory_info_package_batches",
)
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name="review_regulatory_info_package_batches",
)
trigger_message = models.ForeignKey(
Message,
on_delete=models.SET_NULL,
null=True,
blank=True,
related_name="triggered_regulatory_info_package_batches",
)
source_attachment = models.ForeignKey(
FileAttachment,
on_delete=models.SET_NULL,
null=True,
blank=True,
related_name="regulatory_info_package_batches",
)
source_summary_batch = models.ForeignKey(
FileSummaryBatch,
on_delete=models.SET_NULL,
null=True,
blank=True,
related_name="regulatory_info_package_batches",
)
source_summary_item_id = models.PositiveBigIntegerField(null=True, blank=True)
batch_no = models.CharField(max_length=64, unique=True)
status = models.CharField(max_length=30, choices=Status.choices, default=Status.PENDING)
source_file_name = models.CharField(max_length=255, blank=True, default="")
source_storage_path = models.CharField(max_length=500, blank=True, default="")
product_name = models.CharField(max_length=200, blank=True, default="")
output_zip_name = models.CharField(max_length=255, blank=True, default="第1章 监管信息(预生成版).zip")
generated_files = models.JSONField(default=list, blank=True)
missing_fields = models.JSONField(default=list, blank=True)
llm_only_fields = models.JSONField(default=list, blank=True)
conflict_fields = models.JSONField(default=list, blank=True)
risk_notes = models.JSONField(default=list, blank=True)
template_config_version = models.CharField(max_length=80, blank=True, default="")
template_config_hash = models.CharField(max_length=128, blank=True, default="")
adapter_summary = models.JSONField(default=dict, blank=True)
work_dir = models.CharField(max_length=500, blank=True, default="")
error_message = models.TextField(blank=True, default="")
created_at = models.DateTimeField(auto_now_add=True)
started_at = models.DateTimeField(null=True, blank=True)
finished_at = models.DateTimeField(null=True, blank=True)
archived_at = models.DateTimeField(null=True, blank=True)
is_deleted = models.BooleanField(default=False)
class Meta:
db_table = "ra_regulatory_info_package_batch"
ordering = ["-created_at", "-id"]
indexes = [
models.Index(fields=["conversation", "status"], name="idx_ra_rip_batch_conv_status"),
models.Index(fields=["user", "created_at"], name="idx_ra_rip_batch_user_created"),
models.Index(fields=["source_attachment"], name="idx_ra_rip_batch_attachment"),
models.Index(fields=["source_summary_batch"], name="idx_ra_rip_batch_summary"),
models.Index(fields=["created_at"], name="idx_ra_rip_batch_created"),
]
def __str__(self) -> str:
return self.batch_no
class RegulatoryReviewBatch(models.Model): class RegulatoryReviewBatch(models.Model):
"""Tracks one NMPA regulatory review workflow run.""" """Tracks one NMPA regulatory review workflow run."""
@@ -745,6 +833,54 @@ class ApplicationFormFillArtifact(models.Model):
] ]
class RegulatoryInfoPackageArtifact(models.Model):
"""Stores regulatory information package intermediate and generated files."""
class ArtifactType(models.TextChoices):
TEMPLATE_COPY = "template_copy", "模板副本"
INSTRUCTION_EXTRACT = "instruction_extract", "说明书抽取结果"
FIELD_EXTRACT_RESULT = "field_extract_result", "字段抽取结果"
MERGED_FIELDS = "merged_fields", "合并字段"
GENERATED_DOCUMENT = "generated_document", "生成文件"
TRACEABILITY = "traceability", "追溯清单"
ZIP_PACKAGE = "zip_package", "ZIP包"
NOTIFICATION_RECORD = "notification_record", "通知记录"
class FileFormat(models.TextChoices):
JSON = "json", "JSON"
EXCEL = "excel", "Excel"
DOCX = "docx", "DOCX"
DOC = "doc", "DOC"
ZIP = "zip", "ZIP"
MARKDOWN = "markdown", "Markdown"
batch = models.ForeignKey(
RegulatoryInfoPackageBatch,
on_delete=models.CASCADE,
related_name="artifacts",
)
artifact_type = models.CharField(max_length=60, choices=ArtifactType.choices)
file_format = models.CharField(max_length=20, choices=FileFormat.choices)
name = models.CharField(max_length=160)
file_name = models.CharField(max_length=255)
storage_path = models.CharField(max_length=500)
file_size = models.BigIntegerField(default=0)
content_hash = models.CharField(max_length=128, blank=True, default="")
metadata = models.JSONField(default=dict, blank=True)
created_by_node = models.CharField(max_length=60, blank=True, default="")
created_at = models.DateTimeField(auto_now_add=True)
is_deleted = models.BooleanField(default=False)
class Meta:
db_table = "ra_regulatory_info_package_artifact"
ordering = ["-created_at", "-id"]
indexes = [
models.Index(fields=["batch", "artifact_type"], name="idx_ra_rip_artifact_batch_type"),
models.Index(fields=["file_format"], name="idx_ra_rip_artifact_format"),
models.Index(fields=["created_at"], name="idx_ra_rip_artifact_created"),
]
class ApplicationFormFillNotificationRecord(models.Model): class ApplicationFormFillNotificationRecord(models.Model):
"""Stores mock/Feishu notification records for application-form auto-fill.""" """Stores mock/Feishu notification records for application-form auto-fill."""
@@ -795,6 +931,55 @@ class ApplicationFormFillNotificationRecord(models.Model):
] ]
class RegulatoryInfoPackageNotificationRecord(models.Model):
"""Stores mock/Feishu notification records for regulatory info packages."""
class Channel(models.TextChoices):
FEISHU_CLI = "feishu_cli", "飞书 CLI"
FEISHU_API = "feishu_api", "飞书 API"
MOCK = "mock", "模拟"
class SendStatus(models.TextChoices):
PENDING = "pending", "待发送"
SUCCESS = "success", "成功"
FAILED = "failed", "失败"
batch = models.ForeignKey(
RegulatoryInfoPackageBatch,
on_delete=models.CASCADE,
related_name="notifications",
)
recipient = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name="regulatory_info_package_notifications",
)
channel = models.CharField(max_length=30, choices=Channel.choices, default=Channel.MOCK)
export_ids = models.JSONField(default=list, blank=True)
message_summary = models.TextField(blank=True, default="")
send_status = models.CharField(
max_length=20,
choices=SendStatus.choices,
default=SendStatus.PENDING,
)
retry_count = models.PositiveIntegerField(default=0)
external_message_id = models.CharField(max_length=120, blank=True, default="")
error_message = models.TextField(blank=True, default="")
sent_at = models.DateTimeField(null=True, blank=True)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
is_deleted = models.BooleanField(default=False)
class Meta:
db_table = "ra_regulatory_info_package_notification_record"
ordering = ["-created_at", "-id"]
indexes = [
models.Index(fields=["batch", "created_at"], name="idx_ra_rip_notify_batch"),
models.Index(fields=["recipient", "send_status"], name="idx_ra_rip_notify_recipient"),
models.Index(fields=["send_status", "retry_count"], name="idx_ra_rip_notify_status"),
]
class FeishuUserMapping(models.Model): class FeishuUserMapping(models.Model):
"""Maps a system user to Feishu identifiers maintained by Admin.""" """Maps a system user to Feishu identifiers maintained by Admin."""

View File

@@ -0,0 +1,2 @@
"""Chapter 1 regulatory information package workflow."""

View File

@@ -0,0 +1,30 @@
WORKFLOW_TYPE = "regulatory_info_package"
DEFAULT_ZIP_NAME = "第1章 监管信息(预生成版).zip"
REGULATORY_INFO_PACKAGE_TRIGGER_KEYWORDS = [
"根据说明书生成第1章监管信息",
"生成监管信息材料包",
"从说明书生成第1章材料",
"第1章监管信息",
"监管信息材料包",
]
REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS = [
("prepare", "准备资料", "regulatory_info_package"),
("template_copy", "复制模板", "regulatory_info_package"),
("text_extract", "抽取说明书", "regulatory_info_package"),
("field_extract", "抽取字段", "regulatory_info_package"),
("field_merge", "合并字段", "regulatory_info_package"),
("generate_docs", "生成材料", "regulatory_info_package"),
("highlight_review_items", "标记待确认", "regulatory_info_package"),
("trace_export", "追溯清单", "regulatory_info_package"),
("zip_export", "打包下载", "regulatory_info_package"),
("notify", "通知", "regulatory_info_package"),
("completed", "完成", "completed"),
]
GENERATED_FILE_SUCCESS = "success"
GENERATED_FILE_FALLBACK_SUCCESS = "fallback_success"
GENERATED_FILE_FAILED = "failed"
GENERATED_FILE_SKIPPED = "skipped"

View File

@@ -0,0 +1,15 @@
from __future__ import annotations
from review_agent.regulatory_info_package.constants import WORKFLOW_TYPE
from review_agent.models import RegulatoryInfoPackageBatch, WorkflowEvent
def record_event(batch: RegulatoryInfoPackageBatch, event_type: str, payload: dict | None = None) -> WorkflowEvent:
return WorkflowEvent.objects.create(
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=batch.pk,
conversation=batch.conversation,
event_type=event_type,
payload=payload or {},
)

View File

@@ -0,0 +1,58 @@
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any
@dataclass(frozen=True)
class TemplateSpec:
code: str
output_name: str
source_file: str
file_format: str
strategy: str
include_in_zip: bool
prefer_legacy_doc_native: bool = False
allow_docx_fallback: bool = True
fields: list[dict[str, Any]] = field(default_factory=list)
@dataclass
class InstructionExtractResult:
source_file_name: str
paragraphs: list[str]
sections: dict[str, str]
tables: list[list[list[str]]]
component_tables: list[dict[str, Any]]
front_text: str
@dataclass
class MergedField:
key: str
label: str
value: str
source: str
evidence: str
confidence: float
highlight_reason: str = "none"
needs_review: bool = False
rule_value: str = ""
llm_value: str = ""
@dataclass
class GeneratedFileResult:
template_code: str
file_name: str
requested_format: str
actual_format: str
status: str
path: str = ""
artifact_id: int | None = None
export_id: int | None = None
highlight_count: int = 0
missing_count: int = 0
llm_only_count: int = 0
error_message: str = ""

View File

@@ -0,0 +1,2 @@
"""Services for the regulatory information package workflow."""

View File

@@ -0,0 +1,322 @@
from __future__ import annotations
import json
import re
from pathlib import Path
from docx import Document
from docx.enum.text import WD_COLOR_INDEX
from docx.shared import RGBColor
from django.utils import timezone
from review_agent.regulatory_info_package.schemas import MergedField
PLACEHOLDER_RE = re.compile(r"\{\{([a-zA-Z0-9_]+)\}\}")
def write_docx_from_template(
source_path: str | Path,
output_path: str | Path,
merged_fields: dict[str, MergedField],
*,
template_code: str = "",
directory_page_numbers: dict[str, str] | None = None,
) -> tuple[int, int, int]:
source = Path(source_path)
output = Path(output_path)
output.parent.mkdir(parents=True, exist_ok=True)
if source.exists():
document = Document(source)
else:
document = Document()
replacements = {f"{{{{{key}}}}}": field for key, field in merged_fields.items()}
highlight_count = 0
missing_count = 0
llm_only_count = 0
highlight_count += _apply_known_template_replacements(document, merged_fields, template_code=template_code)
if template_code == "ch1_5_product_list":
_rebuild_product_list_table(document, merged_fields)
if template_code == "ch1_2_directory":
_apply_directory_page_numbers(document, directory_page_numbers or {})
paragraph_counts = _replace_placeholders(document, replacements, merged_fields)
highlight_count += paragraph_counts[0]
missing_count += paragraph_counts[1]
llm_only_count += paragraph_counts[2]
document.save(output)
return highlight_count, missing_count, llm_only_count
def _replace_paragraph_text(paragraph, text: str, field: MergedField) -> None:
for run in paragraph.runs:
run.text = ""
run = paragraph.add_run(text)
if field.highlight_reason != "none":
run.font.highlight_color = WD_COLOR_INDEX.YELLOW
if field.highlight_reason == "conflict":
run.font.color.rgb = RGBColor(255, 0, 0)
def _apply_directory_page_numbers(document, page_numbers: dict[str, str]) -> None:
for table in document.tables:
if not table.rows:
continue
header = [cell.text.strip() for cell in table.rows[0].cells]
if len(header) < 5 or header[0] != "RPS目录" or header[4] != "页码":
continue
for row in table.rows[1:]:
code = row.cells[0].text.strip()
if code in page_numbers:
row.cells[4].text = page_numbers[code]
return
def _replace_placeholders(
document,
replacements: dict[str, MergedField],
merged_fields: dict[str, MergedField],
) -> tuple[int, int, int]:
highlight_count = 0
missing_count = 0
llm_only_count = 0
for paragraph in _iter_paragraphs(document):
text = paragraph.text
if "{{" not in text or "}}" not in text:
continue
used_fields: list[MergedField] = []
def replace(match: re.Match[str]) -> str:
key = match.group(1)
placeholder = match.group(0)
field = replacements.get(placeholder) or _default_placeholder_field(key, merged_fields)
used_fields.append(field)
return field.value
new_text = PLACEHOLDER_RE.sub(replace, text)
if new_text == text:
continue
field_for_style = next((field for field in used_fields if field.highlight_reason != "none"), None) or used_fields[0]
_replace_paragraph_text(paragraph, new_text, field_for_style)
for field in used_fields:
if field.highlight_reason != "none":
highlight_count += 1
if field.highlight_reason == "missing":
missing_count += 1
if field.highlight_reason == "llm_only":
llm_only_count += 1
return highlight_count, missing_count, llm_only_count
def _iter_paragraphs(document):
yield from document.paragraphs
for table in document.tables:
for row in table.rows:
for cell in row.cells:
yield from cell.paragraphs
def _apply_known_template_replacements(document, merged_fields: dict[str, MergedField], *, template_code: str = "") -> int:
product = _field_value(merged_fields, "product_name")
applicant = _field_value(merged_fields, "applicant_name")
today = timezone.localdate().strftime("%Y年%m月%d")
replacements = {
"xxxx年xx月xx日": today,
"XXXX年XX月XX日": today,
"xxxx 年 xx 月 xx 日": today,
"XXXX 年 XX 月 XX 日": today,
"2023年09月20日": today,
"2023 年 10 月": today[:8],
}
if not template_code.startswith("ch1_11"):
replacements.update({
"呼吸道合胞病毒、肺炎支原体核酸检测试剂盒荧光PCR法": product,
"呼吸道合胞病毒、肺炎支原体核酸检测试剂盒": product,
"呼吸道合胞病毒 、肺炎支产品名称: 原体核酸检测试剂盒(荧": f"产品名称:{product}",
"光PCR法": "",
"卡尤迪生物科技宜兴有限公司": applicant,
})
changed = 0
for paragraph in document.paragraphs:
changed += _replace_text_in_paragraph(paragraph, replacements, merged_fields)
for table in document.tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
changed += _replace_text_in_paragraph(paragraph, replacements, merged_fields)
return changed
def _default_placeholder_field(key: str, merged_fields: dict[str, MergedField]) -> MergedField:
if key == "declaration_date":
return _plain_field(key, "日期", timezone.localdate().strftime("%Y年%m月%d"))
label = key
for field in merged_fields.values():
if field.key == key:
label = field.label
break
return MergedField(
key=key,
label=label,
value="/",
source="missing",
evidence="模板字段未从说明书中抽取到",
confidence=0.0,
highlight_reason="missing",
needs_review=True,
)
def _replace_text_in_paragraph(paragraph, replacements: dict[str, str], merged_fields: dict[str, MergedField]) -> int:
text = paragraph.text
new_text = text
for old, new in replacements.items():
if old in new_text:
new_text = new_text.replace(old, new)
if new_text == text:
return 0
field = merged_fields.get("product_name") or MergedField(
key="product_name",
label="产品名称",
value=new_text,
source="rule",
evidence="",
confidence=0.0,
)
_replace_paragraph_text(paragraph, new_text, field)
return 1
def _rebuild_product_list_table(document, merged_fields: dict[str, MergedField]) -> None:
product = _field_value(merged_fields, "product_name")
package_specification = _field_value(merged_fields, "package_specification")
component_table = _component_table_payload(merged_fields)
component_notes = _field_value(merged_fields, "component_notes")
for paragraph in document.paragraphs:
if "的包装规格、货号、组分及主要组成成分见下表" in paragraph.text:
_replace_paragraph_text(
paragraph,
f"{product}的包装规格、货号、组分及主要组成成分见下表:",
merged_fields.get("product_name") or _plain_field("product_name", "产品名称", product),
)
if "规格A和规格B的区别" in paragraph.text and component_notes != "/":
_replace_paragraph_text(
paragraph,
component_notes,
merged_fields.get("component_notes") or _plain_field("component_notes", "主要组成成分备注", component_notes),
)
target = None
for table in document.tables:
header = [cell.text.strip() for cell in table.rows[0].cells] if table.rows else []
if header[:6] == ["包装规格", "货号", "组成", "组分", "主要组成成分", "规格/数量"]:
target = table
break
specs = _component_specs(component_table) or [
(spec, None) for spec in [item.strip() for item in package_specification.replace("", ";").split(";") if item.strip()]
]
if target is not None:
_clear_table_body(target)
if component_table:
_fill_product_component_table(target, component_table, specs)
else:
if not specs:
specs = [("/", None)]
for spec, _index in specs[:8]:
cells = target.add_row().cells
cells[0].text = spec
cells[1].text = "/"
cells[2].text = _field_value(merged_fields, "composition")
cells[3].text = _field_value(merged_fields, "component_name")
cells[4].text = _field_value(merged_fields, "main_component")
cells[5].text = _field_value(merged_fields, "quantity")
if component_table:
_rebuild_component_comparison_table(document, component_table, specs)
def _field_value(merged_fields: dict[str, MergedField], key: str) -> str:
field = merged_fields.get(key)
if not field or not field.value:
return "/"
return field.value
def _plain_field(key: str, label: str, value: str) -> MergedField:
return MergedField(key=key, label=label, value=value, source="rule", evidence="", confidence=0.0)
def _component_table_payload(merged_fields: dict[str, MergedField]) -> dict:
field = merged_fields.get("component_table")
if not field or not field.value or field.value == "/":
return {}
try:
payload = json.loads(field.value)
except json.JSONDecodeError:
return {}
if not isinstance(payload, dict):
return {}
rows = payload.get("rows") or []
header = payload.get("header") or []
if not isinstance(header, list) or not isinstance(rows, list):
return {}
return {"header": header, "rows": rows}
def _component_specs(component_table: dict) -> list[tuple[str, int]]:
header = component_table.get("header") or []
specs: list[tuple[str, int]] = []
for index, value in enumerate(header[2:], start=2):
label = str(value or "").strip()
if not label:
continue
label = label.replace("规格(", "").replace("规格(", "").rstrip(")")
specs.append((label, index))
return specs
def _clear_table_body(table) -> None:
while len(table.rows) > 1:
table._tbl.remove(table.rows[-1]._tr)
def _fill_product_component_table(table, component_table: dict, specs: list[tuple[str, int]]) -> None:
rows = component_table.get("rows") or []
for spec_label, spec_index in specs:
for row in rows:
cells = table.add_row().cells
cells[0].text = spec_label
cells[1].text = "/"
cells[2].text = "/"
cells[3].text = _row_value(row, 0)
cells[4].text = _row_value(row, 1)
cells[5].text = _row_value(row, spec_index or 0)
def _rebuild_component_comparison_table(document, component_table: dict, specs: list[tuple[str, int]]) -> None:
target = None
for table in document.tables:
header = [cell.text.strip() for cell in table.rows[0].cells] if table.rows else []
if header and header[0] == "组分名称":
target = table
break
if target is None:
return
_clear_table_body(target)
header_cells = target.rows[0].cells
labels = ["组分名称", *[spec for spec, _index in specs[: len(header_cells) - 1]]]
while len(labels) < len(header_cells):
labels.append("备注")
for index, label in enumerate(labels[: len(header_cells)]):
header_cells[index].text = label
for row in component_table.get("rows") or []:
cells = target.add_row().cells
cells[0].text = _row_value(row, 0)
for cell_index, (_spec_label, spec_index) in enumerate(specs[: len(cells) - 1], start=1):
cells[cell_index].text = _row_value(row, spec_index)
for cell_index in range(len(specs[: len(cells) - 1]) + 1, len(cells)):
cells[cell_index].text = "/"
def _row_value(row, index: int) -> str:
if not isinstance(row, list) or index >= len(row):
return "/"
value = str(row[index] or "").strip()
return value or "/"

View File

@@ -0,0 +1,171 @@
from __future__ import annotations
import json
import re
import time
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path
from typing import Callable
from review_agent.llm import generate_completion
from review_agent.regulatory_info_package.schemas import InstructionExtractResult
FIELD_PATTERNS = {
"product_name": ("产品名称", r"产品名称[:\s]*([^\n\r]+)"),
"applicant_name": ("申请人名称", r"(?:申请人名称|注册人/售后服务单位名称|注册人名称|售后服务单位名称|生产企业名称)[:\s]*([^\n\r]+)"),
"manufacturer_name": ("生产企业名称", r"生产企业名称[:\s]*([^\n\r]+)"),
"applicant_address": ("申请人住所", r"(?:申请人住所|注册人住所|生产企业住所)[:\s]*([^\n\r]+)"),
"applicant_contact": ("申请人联系方式", r"(?:联系方式|联系电话|电话)[:\s]*([^\n\r]+)"),
"production_address": ("生产地址", r"生产地址[:\s]*([^\n\r]+)"),
"storage_condition": ("储存条件", r"(?:储存条件|贮存条件|保存条件)[:\s]*([^\n\r]+)"),
"intended_use": ("预期用途", r"预期用途[:\s]*([^\n\r]+)"),
"package_specification": ("包装规格", r"(?:包装规格|规格)[:\s]*([^\n\r]+)"),
"sample_type": ("样本类型", r"样本类型[:\s]*([^\n\r]+)"),
"applicable_instrument": ("适用仪器", r"适用仪器[:\s]*([^\n\r]+)"),
"standard_no": ("标准号", r"((?:GB|YY|WS|T/C[A-Z0-9]*)[ /T0-9.\-—]+)"),
}
def extract_fields_by_rules(instruction: InstructionExtractResult) -> dict[str, dict]:
text = "\n".join([instruction.front_text, *instruction.paragraphs, *instruction.sections.values()])
results: dict[str, dict] = {}
for key, (label, pattern) in FIELD_PATTERNS.items():
section_value = _value_after_label_paragraph(instruction.paragraphs, label)
if section_value:
results[key] = {
"label": label,
"value": section_value,
"evidence": f"{label}\n{section_value}",
"confidence": 0.82,
"source": "rule",
}
continue
match = re.search(pattern, text, flags=re.IGNORECASE)
if match:
value = _clean_value(match.group(1))
if value:
results[key] = {
"label": label,
"value": value,
"evidence": match.group(0)[:240],
"confidence": 0.75,
"source": "rule",
}
component_table = _best_component_table(instruction.component_tables)
if component_table:
results["component_table"] = {
"label": "主要组成成分",
"value": json.dumps(component_table, ensure_ascii=False),
"evidence": "说明书【主要组成成分】表格",
"confidence": 0.86,
"source": "rule",
}
component_notes = _component_notes(instruction.sections)
if component_notes:
results["component_notes"] = {
"label": "主要组成成分备注",
"value": component_notes,
"evidence": "说明书【主要组成成分】段落",
"confidence": 0.8,
"source": "rule",
}
return results
def extract_fields_with_llm(instruction: InstructionExtractResult) -> dict[str, dict]:
prompt = (
"请从体外诊断试剂产品说明书中抽取字段,输出 JSON 对象,字段包括 "
"product_name、storage_condition、intended_use、package_specification、sample_type、applicable_instrument、standard_no。"
"每个字段值为 {label,value,evidence,confidence}。\n\n"
+ instruction.front_text[:6000]
)
raw = generate_completion([{"role": "user", "content": prompt}], temperature=0.0)
payload = _parse_json_object(raw)
return {key: value for key, value in payload.items() if isinstance(value, dict)}
def run_llm_extract_with_retry(
instruction: InstructionExtractResult,
*,
llm_extract_func: Callable[[InstructionExtractResult], dict[str, dict]] | None = None,
sleep_func: Callable[[float], None] = time.sleep,
) -> dict[str, dict]:
func = llm_extract_func or extract_fields_with_llm
last_exc: Exception | None = None
for delay in [0, 1, 2]:
if delay:
sleep_func(delay)
try:
return func(instruction)
except Exception as exc:
last_exc = exc
if last_exc:
raise last_exc
return {}
def run_parallel_extract(
instruction: InstructionExtractResult,
*,
llm_extract_func: Callable[[InstructionExtractResult], dict[str, dict]] | None = None,
) -> dict:
payload = {"regex_results": {}, "llm_results": {}, "llm_error": ""}
with ThreadPoolExecutor(max_workers=2) as executor:
rule_future = executor.submit(extract_fields_by_rules, instruction)
llm_future = executor.submit(run_llm_extract_with_retry, instruction, llm_extract_func=llm_extract_func)
payload["regex_results"] = rule_future.result()
try:
payload["llm_results"] = llm_future.result()
except Exception as exc:
payload["llm_error"] = str(exc)
return payload
def save_field_extract_result(path: str | Path, payload: dict) -> Path:
target = Path(path)
target.parent.mkdir(parents=True, exist_ok=True)
target.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
return target
def _clean_value(value: str) -> str:
cleaned = value.strip()
if cleaned in {"", "】】", "】:"}:
return ""
return re.split(r"[。;;]", cleaned)[0].strip()
def _value_after_label_paragraph(paragraphs: list[str], label: str) -> str:
bracketed = {f"{label}", f"[{label}]", label}
for index, text in enumerate(paragraphs):
stripped = text.strip()
if stripped in bracketed and index + 1 < len(paragraphs):
return _clean_value(paragraphs[index + 1])
return ""
def _parse_json_object(raw: str) -> dict:
text = (raw or "").strip()
if text.startswith("```"):
text = text.strip("`").strip()
if text.lower().startswith("json"):
text = text[4:].strip()
start = text.find("{")
end = text.rfind("}")
if start == -1 or end == -1:
return {}
return json.loads(text[start : end + 1])
def _best_component_table(component_tables: list[dict]) -> dict:
if not component_tables:
return {}
return max(component_tables, key=lambda table: len(table.get("rows") or []))
def _component_notes(sections: dict[str, str]) -> str:
for key, value in sections.items():
if "主要组成" in key:
return value.strip()
return ""

View File

@@ -0,0 +1,115 @@
from __future__ import annotations
import json
from pathlib import Path
from review_agent.regulatory_info_package.schemas import MergedField
REQUIRED_FIELDS = {
"product_name": "产品名称",
"applicant_name": "申请人名称",
"package_specification": "包装规格",
"intended_use": "预期用途",
"storage_condition": "储存条件",
}
def merge_fields(rule_results: dict[str, dict], llm_results: dict[str, dict]) -> tuple[dict[str, MergedField], dict[str, list[dict]]]:
merged: dict[str, MergedField] = {}
missing_fields: list[dict] = []
llm_only_fields: list[dict] = []
conflict_fields: list[dict] = []
keys = set(REQUIRED_FIELDS) | set(rule_results) | set(llm_results)
for key in sorted(keys):
rule = rule_results.get(key) or {}
llm = llm_results.get(key) or {}
rule_value = str(rule.get("value") or "").strip()
llm_value = str(llm.get("value") or "").strip()
label = str(rule.get("label") or llm.get("label") or REQUIRED_FIELDS.get(key) or key)
if rule_value and llm_value and rule_value != llm_value:
field = MergedField(
key=key,
label=label,
value=rule_value,
source="rule_conflict",
evidence=str(rule.get("evidence") or ""),
confidence=float(rule.get("confidence") or 0.0),
highlight_reason="conflict",
needs_review=True,
rule_value=rule_value,
llm_value=llm_value,
)
conflict_fields.append(
{
"field_key": key,
"field_label": label,
"rule_value": rule_value,
"llm_value": llm_value,
"selected_value": rule_value,
"handling": "规则优先,写入值高亮并进入追溯清单",
}
)
elif rule_value:
field = MergedField(
key=key,
label=label,
value=rule_value,
source="rule",
evidence=str(rule.get("evidence") or ""),
confidence=float(rule.get("confidence") or 0.0),
)
elif llm_value:
field = MergedField(
key=key,
label=label,
value=llm_value,
source="llm",
evidence=str(llm.get("evidence") or ""),
confidence=float(llm.get("confidence") or 0.0),
highlight_reason="llm_only",
needs_review=True,
llm_value=llm_value,
)
llm_only_fields.append(_review_dict(field))
else:
field = MergedField(
key=key,
label=label,
value="/",
source="missing",
evidence="",
confidence=0.0,
highlight_reason="missing",
needs_review=True,
)
missing_fields.append(_review_dict(field))
merged[key] = field
return merged, {
"missing_fields": missing_fields,
"llm_only_fields": llm_only_fields,
"conflict_fields": conflict_fields,
}
def save_merged_fields(path: str | Path, merged: dict[str, MergedField], summary: dict[str, list[dict]]) -> Path:
target = Path(path)
target.parent.mkdir(parents=True, exist_ok=True)
payload = {
"fields": {key: field.__dict__ for key, field in merged.items()},
**summary,
}
target.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
return target
def _review_dict(field: MergedField) -> dict:
return {
"target_file": "",
"field_key": field.key,
"field_label": field.label,
"final_value": field.value,
"highlight_reason": field.highlight_reason,
"needs_review": field.needs_review,
}

View File

@@ -0,0 +1,105 @@
from __future__ import annotations
from dataclasses import dataclass, field
from pathlib import Path
from review_agent.models import Conversation, FileAttachment, FileSummaryBatch, FileSummaryItem
@dataclass
class InstructionInputSelection:
status: str
file_name: str = ""
storage_path: str = ""
attachment: FileAttachment | None = None
source_summary_batch: FileSummaryBatch | None = None
source_summary_item_id: int | None = None
candidates: list[str] = field(default_factory=list)
message: str = ""
def select_instruction_input(conversation: Conversation, message: str) -> InstructionInputSelection:
candidates = _active_docx_attachments(conversation)
named = _match_by_message(candidates, message)
if len(named) == 1:
return _selection_from_attachment(named[0])
instruction_candidates = [item for item in candidates if "说明书" in item.original_name]
if len(instruction_candidates) == 1:
return _selection_from_attachment(instruction_candidates[0])
if len(candidates) == 1:
return _selection_from_attachment(candidates[0])
if len(instruction_candidates) > 1 or len(candidates) > 1:
names = [item.original_name for item in (instruction_candidates or candidates)]
return InstructionInputSelection(
status="waiting_user",
candidates=names,
message="请确认用于生成第1章监管信息的说明书文件名" + "".join(names),
)
summary_selection = _select_from_latest_summary(conversation, message)
if summary_selection:
return summary_selection
return InstructionInputSelection(status="missing", message="请先上传产品说明书 docx 文件。")
def _active_docx_attachments(conversation: Conversation) -> list[FileAttachment]:
return list(
FileAttachment.objects.filter(
conversation=conversation,
is_active=True,
)
.exclude(upload_status=FileAttachment.UploadStatus.DELETED)
.filter(original_name__iendswith=".docx")
.order_by("original_name", "-version_no")
)
def _match_by_message(candidates: list[FileAttachment], message: str) -> list[FileAttachment]:
compact = "".join((message or "").lower().split())
matched = []
for attachment in candidates:
stem = Path(attachment.original_name).stem.lower()
name = attachment.original_name.lower()
if stem and stem in compact or name and name in compact:
matched.append(attachment)
return matched
def _selection_from_attachment(attachment: FileAttachment) -> InstructionInputSelection:
return InstructionInputSelection(
status="selected",
file_name=attachment.original_name,
storage_path=attachment.storage_path,
attachment=attachment,
)
def _select_from_latest_summary(conversation: Conversation, message: str) -> InstructionInputSelection | None:
batch = (
FileSummaryBatch.objects.filter(conversation=conversation, status=FileSummaryBatch.Status.SUCCESS)
.order_by("-finished_at", "-created_at", "-id")
.first()
)
if not batch:
return None
items = list(batch.items.filter(file_name__iendswith=".docx").order_by("file_name", "id"))
compact = "".join((message or "").lower().split())
named = [item for item in items if Path(item.file_name).stem.lower() in compact or item.file_name.lower() in compact]
candidates = named or [item for item in items if "说明书" in item.file_name]
if len(candidates) == 1:
item = candidates[0]
return InstructionInputSelection(
status="selected",
file_name=item.file_name,
storage_path=item.storage_path,
source_summary_batch=batch,
source_summary_item_id=item.pk,
)
if len(candidates) > 1:
return InstructionInputSelection(
status="waiting_user",
source_summary_batch=batch,
candidates=[item.file_name for item in candidates],
message="请确认用于生成第1章监管信息的说明书文件名" + "".join(item.file_name for item in candidates),
)
return None

View File

@@ -0,0 +1,77 @@
from __future__ import annotations
import json
from pathlib import Path
from docx import Document
from review_agent.regulatory_info_package.schemas import InstructionExtractResult
def parse_instruction_docx(path: str | Path) -> InstructionExtractResult:
file_path = Path(path)
document = Document(file_path)
paragraphs = [paragraph.text.strip() for paragraph in document.paragraphs if paragraph.text.strip()]
tables = []
for table in document.tables:
rows = []
for row in table.rows:
rows.append([" ".join(cell.text.split()) for cell in row.cells])
if rows:
tables.append(rows)
sections = _build_sections(paragraphs)
front_text = "\n".join(paragraphs[:30])
return InstructionExtractResult(
source_file_name=file_path.name,
paragraphs=paragraphs,
sections=sections,
tables=tables,
component_tables=_component_tables(tables),
front_text=front_text,
)
def save_instruction_extract_json(path: str | Path, result: InstructionExtractResult) -> Path:
target = Path(path)
target.parent.mkdir(parents=True, exist_ok=True)
payload = {
"source_file_name": result.source_file_name,
"paragraphs": result.paragraphs,
"sections": result.sections,
"tables": result.tables,
"component_tables": result.component_tables,
"front_text": result.front_text,
}
target.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
return target
def _build_sections(paragraphs: list[str]) -> dict[str, str]:
sections: dict[str, list[str]] = {}
current = "front"
for text in paragraphs:
if _looks_like_heading(text):
current = text[:80]
sections.setdefault(current, [])
continue
sections.setdefault(current, []).append(text)
return {key: "\n".join(value).strip() for key, value in sections.items() if value}
def _looks_like_heading(text: str) -> bool:
compact = text.strip()
if len(compact) > 40:
return False
heading_markers = ("一、", "二、", "三、", "四、", "五、", "六、", "", "产品名称", "预期用途", "主要组成")
return compact.startswith(heading_markers)
def _component_tables(tables: list[list[list[str]]]) -> list[dict]:
results = []
for table in tables:
header = table[0] if table else []
joined = "".join(header)
if any(keyword in joined for keyword in ["组成", "组分", "成分"]):
results.append({"header": header, "rows": table[1:]})
return results

View File

@@ -0,0 +1,81 @@
from __future__ import annotations
import shutil
from dataclasses import dataclass
from pathlib import Path
from django.conf import settings
from docx import Document
from review_agent.regulatory_info_package.schemas import MergedField
@dataclass(frozen=True)
class LegacyDocCapability:
status: str
adapter: str
message: str = ""
def detect_legacy_doc_capability() -> LegacyDocCapability:
try:
import win32com.client # noqa: F401
return LegacyDocCapability(status="available", adapter="WordComDocAdapter", message="Word COM 可用")
except Exception as exc:
return LegacyDocCapability(
status="unavailable",
adapter="UnavailableLegacyDocAdapter",
message=f"Word COM 不可用:{type(exc).__name__}",
)
def write_legacy_doc_or_fallback(
source_path: str | Path,
output_path: str | Path,
merged_fields: dict[str, MergedField],
) -> tuple[Path, str, dict]:
source = Path(source_path)
output = Path(output_path)
output.parent.mkdir(parents=True, exist_ok=True)
capability = detect_legacy_doc_capability()
native_enabled = bool(getattr(settings, "REGULATORY_INFO_PACKAGE_ENABLE_WORD_COM_NATIVE", False))
if native_enabled and capability.status == "available" and source.exists():
shutil.copy2(source, output)
try:
_append_doc_summary_with_word_com(output, merged_fields)
return output, "success", {"doc": capability.__dict__, "fallback_used": False, "native_write": True}
except Exception as exc:
capability = LegacyDocCapability(
status="unavailable",
adapter="UnavailableLegacyDocAdapter",
message=f"Word COM 写入失败:{exc}",
)
fallback = output.with_suffix(".docx")
document = Document()
heading = document.add_paragraph()
heading.add_run(output.stem).bold = True
document.add_paragraph("【预生成版】当前未启用 .doc 原生写入,已生成 docx 兜底文件。")
for field in merged_fields.values():
document.add_paragraph(f"{field.label}{field.value}")
document.save(fallback)
return fallback, "fallback_success", {"doc": capability.__dict__, "fallback_used": True, "native_enabled": native_enabled}
def _append_doc_summary_with_word_com(path: Path, merged_fields: dict[str, MergedField]) -> None:
import win32com.client
word = win32com.client.Dispatch("Word.Application")
word.Visible = False
document = None
try:
document = word.Documents.Open(str(path.resolve()))
end_range = document.Range(document.Content.End - 1, document.Content.End - 1)
lines = ["", "【预生成版】以下字段由系统根据说明书预填,请人工复核。"]
lines.extend(f"{field.label}{field.value}" for field in merged_fields.values())
end_range.InsertAfter("\r".join(lines))
document.Save()
finally:
if document is not None:
document.Close(False)
word.Quit()

View File

@@ -0,0 +1,186 @@
from __future__ import annotations
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path
from zipfile import ZipFile
from xml.etree import ElementTree
from review_agent.models import RegulatoryInfoPackageBatch
from review_agent.regulatory_info_package.constants import GENERATED_FILE_FAILED
from review_agent.regulatory_info_package.schemas import GeneratedFileResult, MergedField, TemplateSpec
from review_agent.regulatory_info_package.services.docx_document import write_docx_from_template
from review_agent.regulatory_info_package.services.legacy_doc_document import write_legacy_doc_or_fallback
from review_agent.regulatory_info_package.services.template_repository import copy_template_to_batch, template_specs
from review_agent.regulatory_info_package.storage import ensure_batch_subdir
def generate_package_documents(
batch: RegulatoryInfoPackageBatch,
config: dict,
merged_fields: dict[str, MergedField],
) -> list[GeneratedFileResult]:
specs = template_specs(config)
directory_specs = [spec for spec in specs if spec.code == "ch1_2_directory"]
content_specs = [spec for spec in specs if spec.code != "ch1_2_directory"]
results: list[GeneratedFileResult] = []
with ThreadPoolExecutor(max_workers=min(4, len(content_specs) or 1)) as executor:
futures = [executor.submit(_generate_one, batch, config, spec, merged_fields) for spec in content_specs]
results.extend(future.result() for future in as_completed(futures))
page_numbers = _directory_page_numbers(results)
for spec in directory_specs:
results.append(_generate_one(batch, config, spec, merged_fields, directory_page_numbers=page_numbers))
return results
def _generate_one(
batch: RegulatoryInfoPackageBatch,
config: dict,
spec: TemplateSpec,
merged_fields: dict[str, MergedField],
*,
directory_page_numbers: dict[str, str] | None = None,
) -> GeneratedFileResult:
try:
template_path = copy_template_to_batch(batch, config, spec)
generated_dir = ensure_batch_subdir(batch, "generated")
output_path = generated_dir / spec.output_name
adapter_summary = {}
if spec.file_format == "doc":
actual_path, status, adapter_summary = write_legacy_doc_or_fallback(template_path, output_path, merged_fields)
actual_format = actual_path.suffix.lower().lstrip(".")
highlight_count = missing_count = llm_only_count = 0
else:
highlight_count, missing_count, llm_only_count = write_docx_from_template(
template_path,
output_path,
merged_fields,
template_code=spec.code,
directory_page_numbers=directory_page_numbers,
)
actual_path = output_path
actual_format = "docx"
status = "success"
return GeneratedFileResult(
template_code=spec.code,
file_name=actual_path.name,
requested_format=spec.file_format,
actual_format=actual_format,
status=status,
path=str(actual_path),
highlight_count=highlight_count,
missing_count=missing_count,
llm_only_count=llm_only_count,
)
except Exception as exc:
return GeneratedFileResult(
template_code=spec.code,
file_name=spec.output_name,
requested_format=spec.file_format,
actual_format=spec.file_format,
status=GENERATED_FILE_FAILED,
error_message=str(exc),
)
def _directory_page_numbers(results: list[GeneratedFileResult]) -> dict[str, str]:
page_numbers = {"CH1.2": "1"}
for result in results:
if result.status not in {"success", "fallback_success"} or not result.path:
continue
code = _directory_code_from_file_name(result.file_name)
if not code:
continue
page_numbers[code] = str(count_document_pages(result.path))
return page_numbers
def _directory_code_from_file_name(file_name: str) -> str:
stem = Path(file_name).stem.strip()
return stem.split()[0] if stem.startswith("CH") else ""
def count_document_pages(path: str | Path) -> int:
file_path = Path(path)
if not file_path.exists():
return 1
pages = _count_pages_from_docx_properties(file_path)
if pages:
return pages
pages = _count_pages_with_pywin32(file_path)
if pages:
return pages
pages = _count_pages_with_powershell_word(file_path)
if pages:
return pages
return 1
def _count_pages_from_docx_properties(file_path: Path) -> int:
if file_path.suffix.lower() != ".docx":
return 0
try:
with ZipFile(file_path) as archive:
root = ElementTree.fromstring(archive.read("docProps/app.xml"))
namespace = {"ep": "http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"}
pages = root.find("ep:Pages", namespace)
return max(int((pages.text or "").strip()), 1) if pages is not None else 0
except Exception:
return 0
def _count_pages_with_pywin32(file_path: Path) -> int:
try:
import win32com.client
word = win32com.client.DispatchEx("Word.Application")
word.Visible = False
document = None
try:
document = word.Documents.Open(str(file_path.resolve()), ReadOnly=True)
document.Repaginate()
return max(int(document.ComputeStatistics(2)), 1)
finally:
if document is not None:
document.Close(False)
word.Quit()
except Exception:
return 0
def _count_pages_with_powershell_word(file_path: Path) -> int:
script = r"""
param([string]$Path)
$word = $null
$doc = $null
try {
$word = New-Object -ComObject Word.Application
$word.Visible = $false
$doc = $word.Documents.Open($Path, $false, $true)
$doc.Repaginate()
[Console]::Out.Write($doc.ComputeStatistics(2))
exit 0
} catch {
[Console]::Error.Write($_.Exception.Message)
exit 1
} finally {
if ($doc -ne $null) { $doc.Close($false) | Out-Null }
if ($word -ne $null) { $word.Quit() | Out-Null }
}
"""
try:
completed = subprocess.run(
["powershell.exe", "-NoProfile", "-ExecutionPolicy", "Bypass", "-Command", script, str(file_path.resolve())],
capture_output=True,
check=False,
text=True,
timeout=8,
)
except Exception:
return 0
if completed.returncode != 0:
return 0
try:
return max(int(completed.stdout.strip()), 1)
except ValueError:
return 0

View File

@@ -0,0 +1,12 @@
from __future__ import annotations
def build_assistant_summary(*, batch_no: str, exports: list[dict], failed_files: list[dict]) -> str:
zip_exports = [item for item in exports if item.get("export_type") == "zip" or str(item.get("file_name", "")).endswith(".zip")]
other_exports = [item for item in exports if item not in zip_exports]
lines = [f"已完成第1章监管信息材料包生成批次号{batch_no}", ""]
for export in [*zip_exports, *other_exports]:
lines.append(f"- [{export['file_name']}]({export['download_url']})")
for failed in failed_files:
lines.append(f"- {failed.get('file_name')}:生成失败,{failed.get('error_message') or '原因待查看'}")
return "\n".join(lines)

View File

@@ -0,0 +1,53 @@
from __future__ import annotations
import hashlib
from pathlib import Path
import yaml
from django.conf import settings
CONFIG_PATH = Path(__file__).resolve().parents[1] / "templates" / "regulatory_info_package_templates_v1.yaml"
def load_template_config(path: str | Path | None = None) -> dict:
config_path = Path(path) if path else CONFIG_PATH
with config_path.open("r", encoding="utf-8") as handle:
payload = yaml.safe_load(handle) or {}
if payload.get("source_dir"):
payload["source_dir"] = str((Path(settings.BASE_DIR) / payload["source_dir"]).resolve())
return payload
def compute_config_hash(path: str | Path | None = None) -> str:
config_path = Path(path) if path else CONFIG_PATH
digest = hashlib.sha256()
digest.update(config_path.read_bytes())
return digest.hexdigest()
def validate_template_config(config: dict) -> list[str]:
errors: list[str] = []
source_dir = Path(config.get("source_dir") or "")
if not source_dir.exists():
errors.append(f"模板源目录不存在:{source_dir}")
templates = config.get("templates") or []
if len(templates) != 6:
errors.append("第1章监管信息模板配置必须包含 6 个模板。")
seen: set[str] = set()
for template in templates:
code = str(template.get("code") or "")
if not code:
errors.append("模板 code 不能为空。")
elif code in seen:
errors.append(f"模板 code 重复:{code}")
seen.add(code)
source_file = str(template.get("source_file") or "")
output_name = str(template.get("output_name") or "")
if not source_file:
errors.append(f"模板 {code} 缺少 source_file。")
elif source_dir.exists() and not (source_dir / source_file).exists():
errors.append(f"模板源文件不存在:{source_file}")
if not output_name:
errors.append(f"模板 {code} 缺少 output_name。")
return errors

View File

@@ -0,0 +1,34 @@
from __future__ import annotations
import shutil
from pathlib import Path
from review_agent.regulatory_info_package.schemas import TemplateSpec
from review_agent.regulatory_info_package.storage import ensure_batch_subdir
from review_agent.models import RegulatoryInfoPackageBatch
def template_specs(config: dict) -> list[TemplateSpec]:
return [
TemplateSpec(
code=item["code"],
output_name=item["output_name"],
source_file=item["source_file"],
file_format=item.get("file_format", "docx"),
strategy=item.get("strategy", item["code"]),
include_in_zip=bool(item.get("include_in_zip", True)),
prefer_legacy_doc_native=bool(item.get("prefer_legacy_doc_native", False)),
allow_docx_fallback=bool(item.get("allow_docx_fallback", True)),
fields=item.get("fields") or [],
)
for item in config.get("templates") or []
]
def copy_template_to_batch(batch: RegulatoryInfoPackageBatch, config: dict, spec: TemplateSpec) -> Path:
source_dir = Path(config["source_dir"])
source = source_dir / spec.source_file
target = ensure_batch_subdir(batch, "templates") / f"{spec.code}.source{source.suffix}"
shutil.copy2(source, target)
return target

View File

@@ -0,0 +1,51 @@
from __future__ import annotations
import json
from pathlib import Path
from openpyxl import Workbook
from review_agent.regulatory_info_package.schemas import MergedField
HEADERS = [
"target_file",
"target_field",
"final_value",
"extraction_source",
"evidence",
"highlight_reason",
"needs_review",
]
def save_traceability_exports(root: str | Path, merged_fields: dict[str, MergedField]) -> tuple[Path, Path]:
root_path = Path(root)
exports_dir = root_path / "exports"
logs_dir = root_path / "logs"
exports_dir.mkdir(parents=True, exist_ok=True)
logs_dir.mkdir(parents=True, exist_ok=True)
rows = [
{
"target_file": "",
"target_field": field.label,
"final_value": field.value,
"extraction_source": field.source,
"evidence": field.evidence,
"highlight_reason": field.highlight_reason,
"needs_review": field.needs_review,
}
for field in merged_fields.values()
]
excel_path = exports_dir / "traceability.xlsx"
workbook = Workbook()
sheet = workbook.active
sheet.title = "traceability"
sheet.append(HEADERS)
for row in rows:
sheet.append([row.get(header, "") for header in HEADERS])
workbook.save(excel_path)
json_path = logs_dir / "traceability.json"
json_path.write_text(json.dumps(rows, ensure_ascii=False, indent=2), encoding="utf-8")
return excel_path, json_path

View File

@@ -0,0 +1,23 @@
from __future__ import annotations
from pathlib import Path
from zipfile import ZIP_DEFLATED, ZipFile
from review_agent.regulatory_info_package.constants import DEFAULT_ZIP_NAME, GENERATED_FILE_FALLBACK_SUCCESS, GENERATED_FILE_SUCCESS
from review_agent.regulatory_info_package.schemas import GeneratedFileResult
def create_zip_package(root: str | Path, generated_files: list[GeneratedFileResult], zip_name: str = DEFAULT_ZIP_NAME) -> Path:
root_path = Path(root)
exports_dir = root_path / "exports"
exports_dir.mkdir(parents=True, exist_ok=True)
zip_path = exports_dir / zip_name
allowed = {GENERATED_FILE_SUCCESS, GENERATED_FILE_FALLBACK_SUCCESS}
with ZipFile(zip_path, "w", compression=ZIP_DEFLATED) as archive:
for result in generated_files:
if result.status not in allowed or not result.path:
continue
file_path = Path(result.path)
if file_path.exists():
archive.write(file_path, arcname=result.file_name)
return zip_path

View File

@@ -0,0 +1,71 @@
from __future__ import annotations
import hashlib
from pathlib import Path
from django.conf import settings
from review_agent.models import RegulatoryInfoPackageArtifact, RegulatoryInfoPackageBatch
def build_batch_work_dir(batch: RegulatoryInfoPackageBatch | None = None, *, batch_no: str = "") -> Path:
if batch:
return (
Path(settings.MEDIA_ROOT)
/ "regulatory_info_package"
/ str(batch.user_id)
/ str(batch.conversation_id)
/ batch.batch_no
)
return Path(settings.MEDIA_ROOT) / "regulatory_info_package" / batch_no
def ensure_batch_subdir(batch: RegulatoryInfoPackageBatch, name: str) -> Path:
root = Path(batch.work_dir) if batch.work_dir else build_batch_work_dir(batch)
target = root / Path(name).name
ensure_within_work_dir(batch, target)
target.mkdir(parents=True, exist_ok=True)
return target
def ensure_within_work_dir(batch: RegulatoryInfoPackageBatch, path: str | Path) -> Path:
root = Path(batch.work_dir).resolve()
target = Path(path).resolve()
if root != target and root not in target.parents:
raise ValueError("输出路径必须位于当前材料包批次工作目录内。")
return target
def compute_file_sha256(path: str | Path) -> str:
file_path = Path(path)
digest = hashlib.sha256()
with file_path.open("rb") as handle:
for chunk in iter(lambda: handle.read(1024 * 1024), b""):
digest.update(chunk)
return digest.hexdigest()
def create_artifact_for_file(
batch: RegulatoryInfoPackageBatch,
*,
path: str | Path,
artifact_type: str,
file_format: str,
name: str = "",
metadata: dict | None = None,
created_by_node: str = "",
) -> RegulatoryInfoPackageArtifact:
file_path = ensure_within_work_dir(batch, path)
return RegulatoryInfoPackageArtifact.objects.create(
batch=batch,
artifact_type=artifact_type,
file_format=file_format,
name=name or file_path.stem,
file_name=file_path.name,
storage_path=str(file_path),
file_size=file_path.stat().st_size if file_path.exists() else 0,
content_hash=compute_file_sha256(file_path) if file_path.exists() else "",
metadata=metadata or {},
created_by_node=created_by_node,
)

View File

@@ -0,0 +1,64 @@
version: regulatory_info_package_templates_v1
source_dir: review_agent/regulatory_info_package/templates/clean
zip_name: 第1章 监管信息(预生成版).zip
templates:
- code: ch1_2_directory
source_file: CH1.2 监管信息目录 - 页码版.docx
output_name: CH1.2 监管信息目录.docx
file_format: docx
strategy: directory
include_in_zip: true
fields: []
- code: ch1_4_application_form
source_file: CH1.4 申请表 - 复选框调整版.docx
output_name: CH1.4 申请表.docx
file_format: docx
strategy: application_form
include_in_zip: true
fields:
- key: product_name
label: 产品名称
placeholder: "{{product_name}}"
- key: applicant_name
label: 申请人名称
placeholder: "{{applicant_name}}"
- code: ch1_5_product_list
source_file: CH1.5 产品列表.docx
output_name: CH1.5 产品列表.docx
file_format: docx
strategy: product_list
include_in_zip: true
fields:
- key: package_specification
label: 包装规格
placeholder: "{{package_specification}}"
- code: ch1_11_1_standards
source_file: CH1.11.1 符合标准的清单.docx
output_name: CH1.11.1 符合标准的清单.docx
file_format: docx
strategy: standards
include_in_zip: true
fields:
- key: standard_no
label: 标准号
placeholder: "{{standard_no}}"
- code: ch1_11_5_authenticity
source_file: CH1.11.5 真实性声明.docx
output_name: CH1.11.5 真实性声明.docx
file_format: docx
strategy: authenticity
include_in_zip: true
fields:
- key: product_name
label: 产品名称
placeholder: "{{product_name}}"
- code: ch1_11_6_conformity
source_file: CH1.11.6 符合性声明.docx
output_name: CH1.11.6 符合性声明.docx
file_format: docx
strategy: conformity
include_in_zip: true
fields:
- key: product_name
label: 产品名称
placeholder: "{{product_name}}"

View File

@@ -0,0 +1,127 @@
import json
from django.contrib.auth.decorators import login_required
from django.conf import settings
from django.http import Http404, JsonResponse
from django.views.decorators.http import require_http_methods
from review_agent.models import ExportedSummaryFile, RegulatoryInfoPackageBatch, WorkflowNodeRun
from review_agent.regulatory_info_package.constants import WORKFLOW_TYPE
from review_agent.regulatory_info_package.services.input_select import select_instruction_input
from review_agent.regulatory_info_package.workflow import (
create_regulatory_info_package_batch,
start_regulatory_info_package_workflow,
)
@require_http_methods(["GET"])
def health(request):
return JsonResponse({"workflow_type": WORKFLOW_TYPE, "status": "available"})
@login_required
@require_http_methods(["POST"])
def start(request):
try:
payload = json.loads(request.body.decode("utf-8") or "{}")
except json.JSONDecodeError:
return JsonResponse({"error": "JSON 格式错误。"}, status=400)
from review_agent.models import Conversation
conversation = Conversation.objects.filter(pk=payload.get("conversation_id"), user=request.user).first()
if not conversation:
raise Http404("对话不存在。")
selection = select_instruction_input(conversation, str(payload.get("message") or ""))
if selection.status != "selected":
return JsonResponse(
{"status": selection.status, "message": selection.message, "candidates": selection.candidates},
status=400,
)
batch = create_regulatory_info_package_batch(
conversation=conversation,
user=request.user,
source_attachment=selection.attachment,
source_summary_batch=selection.source_summary_batch,
source_summary_item_id=selection.source_summary_item_id,
source_file_name=selection.file_name,
source_storage_path=selection.storage_path,
)
start_regulatory_info_package_workflow(batch, async_run=getattr(settings, "REGULATORY_INFO_PACKAGE_ASYNC", True))
return JsonResponse({"batch_id": batch.pk, "workflow_type": WORKFLOW_TYPE, "status": batch.status})
@login_required
@require_http_methods(["GET"])
def batch_status(request, batch_id: int):
batch = RegulatoryInfoPackageBatch.objects.filter(
pk=batch_id,
conversation__user=request.user,
is_deleted=False,
).first()
if not batch:
raise Http404("材料包批次不存在。")
exports = ExportedSummaryFile.objects.filter(
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=batch.pk,
).order_by("-export_type", "id")
sorted_exports = sorted(exports, key=lambda item: 0 if item.export_type == ExportedSummaryFile.ExportType.ZIP else 1)
return JsonResponse(
{
"batch": {
"id": batch.pk,
"workflow_type": WORKFLOW_TYPE,
"batch_no": batch.batch_no,
"status": batch.status,
"product_name": batch.product_name,
"risk_summary_text": _risk_summary_text(batch),
"error_message": batch.error_message,
},
"nodes": [
{
"node_code": node.node_code,
"node_name": node.node_name,
"status": node.status,
"progress": node.progress,
"message": node.message,
}
for node in WorkflowNodeRun.objects.filter(
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=batch.pk,
).order_by("id")
],
"exports": [
{
"id": export.pk,
"export_type": export.export_type,
"export_category": export.export_category,
"file_name": export.file_name,
"download_url": f"/api/review-agent/file-summary/exports/{export.pk}/download/",
}
for export in sorted_exports
],
"failed_files": [item for item in batch.generated_files if item.get("status") == "failed"],
"notifications": [
{
"id": item.pk,
"channel": item.channel,
"send_status": item.send_status,
"status_label": "通知已记录" if item.send_status == "success" else item.send_status,
"error_message": item.error_message,
}
for item in batch.notifications.filter(is_deleted=False).order_by("-created_at", "-id")
],
}
)
def _risk_summary_text(batch: RegulatoryInfoPackageBatch) -> str:
parts = []
if batch.missing_fields:
parts.append(f"缺失字段 {len(batch.missing_fields)}")
if batch.llm_only_fields:
parts.append(f"LLM-only {len(batch.llm_only_fields)}")
if batch.conflict_fields:
parts.append(f"冲突字段 {len(batch.conflict_fields)}")
if batch.risk_notes:
parts.append(f"提示 {len(batch.risk_notes)}")
return " · ".join(parts)

View File

@@ -0,0 +1,375 @@
from __future__ import annotations
import logging
from threading import Thread
from uuid import uuid4
from django.conf import settings
from django.db import transaction
from django.utils import timezone
from review_agent.file_summary.paths import resolve_storage_path
from review_agent.models import (
Conversation,
ExportedSummaryFile,
Message,
RegulatoryInfoPackageArtifact,
RegulatoryInfoPackageBatch,
RegulatoryInfoPackageNotificationRecord,
WorkflowNodeRun,
)
from review_agent.regulatory_info_package.constants import (
DEFAULT_ZIP_NAME,
REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS,
WORKFLOW_TYPE,
)
from review_agent.regulatory_info_package.events import record_event
from review_agent.regulatory_info_package.services.template_config import (
compute_config_hash,
load_template_config,
validate_template_config,
)
from review_agent.regulatory_info_package.services.field_extract import run_parallel_extract, save_field_extract_result
from review_agent.regulatory_info_package.services.field_merge import merge_fields, save_merged_fields
from review_agent.regulatory_info_package.services.instruction_extract import parse_instruction_docx, save_instruction_extract_json
from review_agent.regulatory_info_package.services.package_generate import generate_package_documents
from review_agent.regulatory_info_package.services.summary import build_assistant_summary
from review_agent.regulatory_info_package.services.traceability_export import save_traceability_exports
from review_agent.regulatory_info_package.services.zip_export import create_zip_package
from review_agent.regulatory_info_package.schemas import GeneratedFileResult, InstructionExtractResult, MergedField
from review_agent.regulatory_info_package.storage import build_batch_work_dir
from review_agent.regulatory_info_package.storage import create_artifact_for_file, ensure_batch_subdir
logger = logging.getLogger("review_agent.regulatory_info_package.workflow")
def build_batch_no() -> str:
return f"RIP-{timezone.localtime().strftime('%Y%m%d%H%M%S')}-{uuid4().hex[:6]}"
@transaction.atomic
def create_regulatory_info_package_batch(
*,
conversation: Conversation,
user,
trigger_message: Message | None = None,
source_attachment=None,
source_summary_batch=None,
source_summary_item_id: int | None = None,
source_file_name: str = "",
source_storage_path: str = "",
existing_batch: RegulatoryInfoPackageBatch | None = None,
) -> RegulatoryInfoPackageBatch:
batch = existing_batch
if batch is None:
batch_no = build_batch_no()
work_dir = build_batch_work_dir(batch_no=batch_no)
work_dir.mkdir(parents=True, exist_ok=True)
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
trigger_message=trigger_message,
source_attachment=source_attachment,
source_summary_batch=source_summary_batch,
source_summary_item_id=source_summary_item_id,
source_file_name=source_file_name or getattr(source_attachment, "original_name", ""),
source_storage_path=source_storage_path or getattr(source_attachment, "storage_path", ""),
batch_no=batch_no,
output_zip_name=DEFAULT_ZIP_NAME,
work_dir=str(work_dir),
)
for code, name, group in REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS:
WorkflowNodeRun.objects.get_or_create(
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=batch.pk,
node_code=code,
defaults={
"node_group": group,
"node_name": name,
},
)
record_event(batch, "workflow_created", {"batch_id": batch.pk, "batch_no": batch.batch_no})
return batch
class RegulatoryInfoPackageWorkflowExecutor:
"""Runs the Chapter 1 regulatory information package workflow."""
def __init__(self, batch: RegulatoryInfoPackageBatch):
self.batch = batch
self.template_config: dict = {}
self.instruction: InstructionExtractResult | None = None
self.extract_payload: dict = {}
self.merged_fields: dict[str, MergedField] = {}
self.merge_summary: dict[str, list[dict]] = {}
self.generation_results: list[GeneratedFileResult] = []
self.exports: list[ExportedSummaryFile] = []
def run(self) -> None:
logger.info("监管信息材料包工作流开始 batch_no=%s batch_id=%s", self.batch.batch_no, self.batch.pk)
self.batch.status = RegulatoryInfoPackageBatch.Status.RUNNING
self.batch.started_at = timezone.now()
self.batch.save(update_fields=["status", "started_at"])
record_event(self.batch, "workflow_started", {"batch_id": self.batch.pk})
try:
for node in self._nodes():
if node.status in {WorkflowNodeRun.Status.SUCCESS, WorkflowNodeRun.Status.SKIPPED}:
continue
self._run_node(node)
except Exception as exc:
logger.exception("Regulatory info package workflow failed", extra={"batch_id": self.batch.pk})
self.batch.status = RegulatoryInfoPackageBatch.Status.FAILED
self.batch.error_message = str(exc)
self.batch.finished_at = timezone.now()
self.batch.save(update_fields=["status", "error_message", "finished_at"])
record_event(self.batch, "workflow_failed", {"message": str(exc)})
return
self.batch.status = RegulatoryInfoPackageBatch.Status.SUCCESS
self.batch.finished_at = timezone.now()
self.batch.save(update_fields=["status", "finished_at"])
self._append_completion_message()
record_event(self.batch, "workflow_completed", {"batch_id": self.batch.pk})
def _nodes(self):
return WorkflowNodeRun.objects.filter(
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=self.batch.pk,
).order_by("id")
def _run_node(self, node: WorkflowNodeRun) -> None:
node.status = WorkflowNodeRun.Status.RUNNING
node.progress = 10
node.started_at = timezone.now()
node.message = f"{node.node_name}处理中"
node.save(update_fields=["status", "progress", "started_at", "message"])
record_event(self.batch, "node_progress", {"node_code": node.node_code, "status": node.status})
self._execute_node(node)
node.status = WorkflowNodeRun.Status.SUCCESS
node.progress = 100
node.finished_at = timezone.now()
node.message = f"{node.node_name}完成"
node.save(update_fields=["status", "progress", "finished_at", "message"])
record_event(self.batch, "node_progress", {"node_code": node.node_code, "status": node.status})
def _execute_node(self, node: WorkflowNodeRun) -> None:
if node.node_code == "prepare":
self.template_config = load_template_config()
errors = validate_template_config(self.template_config)
if errors:
raise ValueError("".join(errors))
self.batch.template_config_version = str(self.template_config.get("version") or "")
self.batch.template_config_hash = compute_config_hash()
self.batch.save(update_fields=["template_config_version", "template_config_hash"])
return
if node.node_code == "template_copy":
return
if node.node_code == "text_extract":
if not self.batch.source_storage_path:
self.instruction = None
return
path = resolve_storage_path(self.batch.source_storage_path)
self.instruction = parse_instruction_docx(path)
json_path = ensure_batch_subdir(self.batch, "logs") / "instruction_extract.json"
save_instruction_extract_json(json_path, self.instruction)
create_artifact_for_file(
self.batch,
path=json_path,
artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.INSTRUCTION_EXTRACT,
file_format=RegulatoryInfoPackageArtifact.FileFormat.JSON,
created_by_node=node.node_code,
)
return
if node.node_code == "field_extract":
if not self.instruction:
self.extract_payload = {"regex_results": {}, "llm_results": {}, "llm_error": ""}
return
self.extract_payload = run_parallel_extract(self.instruction, llm_extract_func=lambda _instruction: {})
json_path = ensure_batch_subdir(self.batch, "logs") / "field_extract_result.json"
save_field_extract_result(json_path, self.extract_payload)
create_artifact_for_file(
self.batch,
path=json_path,
artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.FIELD_EXTRACT_RESULT,
file_format=RegulatoryInfoPackageArtifact.FileFormat.JSON,
created_by_node=node.node_code,
)
return
if node.node_code == "field_merge":
self.merged_fields, self.merge_summary = merge_fields(
self.extract_payload.get("regex_results") or {},
self.extract_payload.get("llm_results") or {},
)
product = self.merged_fields.get("product_name")
if product and product.value and product.value != "/":
self.batch.product_name = product.value
self.batch.missing_fields = self.merge_summary.get("missing_fields", [])
self.batch.llm_only_fields = self.merge_summary.get("llm_only_fields", [])
self.batch.conflict_fields = self.merge_summary.get("conflict_fields", [])
self.batch.save(update_fields=["product_name", "missing_fields", "llm_only_fields", "conflict_fields"])
json_path = ensure_batch_subdir(self.batch, "logs") / "merged_fields.json"
save_merged_fields(json_path, self.merged_fields, self.merge_summary)
create_artifact_for_file(
self.batch,
path=json_path,
artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.MERGED_FIELDS,
file_format=RegulatoryInfoPackageArtifact.FileFormat.JSON,
created_by_node=node.node_code,
)
return
if node.node_code == "generate_docs":
self.generation_results = generate_package_documents(self.batch, self.template_config, self.merged_fields)
generated_files = []
for result in self.generation_results:
if result.path:
artifact = create_artifact_for_file(
self.batch,
path=result.path,
artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.GENERATED_DOCUMENT,
file_format=result.actual_format,
name=result.template_code,
metadata=result.__dict__,
created_by_node=node.node_code,
)
result.artifact_id = artifact.pk
if result.status in {"success", "fallback_success"}:
export = self._create_export(
path=result.path,
export_type=ExportedSummaryFile.ExportType.WORD,
export_category="generated_document",
)
result.export_id = export.pk
self.exports.append(export)
generated_files.append(result.__dict__)
self.batch.generated_files = generated_files
self.batch.save(update_fields=["generated_files"])
return
if node.node_code == "highlight_review_items":
return
if node.node_code == "trace_export":
excel_path, json_path = save_traceability_exports(self.batch.work_dir, self.merged_fields)
create_artifact_for_file(
self.batch,
path=json_path,
artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.TRACEABILITY,
file_format=RegulatoryInfoPackageArtifact.FileFormat.JSON,
created_by_node=node.node_code,
)
artifact = create_artifact_for_file(
self.batch,
path=excel_path,
artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.TRACEABILITY,
file_format=RegulatoryInfoPackageArtifact.FileFormat.EXCEL,
created_by_node=node.node_code,
)
export = self._create_export(
path=str(excel_path),
export_type=ExportedSummaryFile.ExportType.EXCEL,
export_category="traceability",
)
self.exports.append(export)
artifact.metadata = {"export_id": export.pk}
artifact.save(update_fields=["metadata"])
return
if node.node_code == "zip_export":
zip_path = create_zip_package(self.batch.work_dir, self.generation_results, self.batch.output_zip_name)
artifact = create_artifact_for_file(
self.batch,
path=zip_path,
artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.ZIP_PACKAGE,
file_format=RegulatoryInfoPackageArtifact.FileFormat.ZIP,
created_by_node=node.node_code,
)
export = self._create_export(
path=str(zip_path),
export_type=ExportedSummaryFile.ExportType.ZIP,
export_category="regulatory_info_package",
)
self.exports.insert(0, export)
artifact.metadata = {"export_id": export.pk}
artifact.save(update_fields=["metadata"])
return
if node.node_code == "notify":
RegulatoryInfoPackageNotificationRecord.objects.create(
batch=self.batch,
recipient=self.batch.user,
export_ids=[export.pk for export in self.exports],
message_summary=build_assistant_summary(
batch_no=self.batch.batch_no,
exports=[
{
"file_name": export.file_name,
"download_url": f"/api/review-agent/file-summary/exports/{export.pk}/download/",
"export_type": export.export_type,
}
for export in self.exports
],
failed_files=[item for item in self.batch.generated_files if item.get("status") == "failed"],
),
send_status=RegulatoryInfoPackageNotificationRecord.SendStatus.SUCCESS,
)
return
def _append_completion_message(self) -> None:
if (
Message.objects.filter(
conversation=self.batch.conversation,
role=Message.Role.ASSISTANT,
content__contains=self.batch.batch_no,
)
.filter(content__contains=self.batch.output_zip_name)
.exists()
):
return
exports = list(
ExportedSummaryFile.objects.filter(
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=self.batch.pk,
)
)
exports = sorted(exports, key=lambda export: 0 if export.export_type == ExportedSummaryFile.ExportType.ZIP else 1)
content = build_assistant_summary(
batch_no=self.batch.batch_no,
exports=[
{
"file_name": export.file_name,
"download_url": f"/api/review-agent/file-summary/exports/{export.pk}/download/",
"export_type": export.export_type,
}
for export in exports
],
failed_files=[item for item in self.batch.generated_files if item.get("status") == "failed"],
)
Message.objects.create(
conversation=self.batch.conversation,
role=Message.Role.ASSISTANT,
content=content,
)
def _create_export(self, *, path: str, export_type: str, export_category: str) -> ExportedSummaryFile:
from pathlib import Path
resolved = Path(path)
return ExportedSummaryFile.objects.create(
batch=None,
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=self.batch.pk,
export_category=export_category,
export_type=export_type,
file_name=resolved.name,
storage_path=str(resolved),
)
def start_regulatory_info_package_workflow(
batch: RegulatoryInfoPackageBatch,
*,
async_run: bool | None = None,
) -> None:
if async_run is None:
async_run = getattr(settings, "REGULATORY_INFO_PACKAGE_ASYNC", True)
executor = RegulatoryInfoPackageWorkflowExecutor(batch)
if async_run:
Thread(target=executor.run, daemon=True).start()
else:
executor.run()

View File

@@ -19,6 +19,12 @@ from .application_form_fill.workflow import (
find_latest_successful_summary_batch as find_latest_successful_form_fill_summary_batch, find_latest_successful_summary_batch as find_latest_successful_form_fill_summary_batch,
start_application_form_fill_workflow, start_application_form_fill_workflow,
) )
from .regulatory_info_package.constants import WORKFLOW_TYPE as REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE
from .regulatory_info_package.services.input_select import select_instruction_input
from .regulatory_info_package.workflow import (
create_regulatory_info_package_batch,
start_regulatory_info_package_workflow,
)
from .regulatory_review.workflow import ( from .regulatory_review.workflow import (
create_regulatory_review_batch, create_regulatory_review_batch,
find_latest_successful_summary_batch, find_latest_successful_summary_batch,
@@ -342,6 +348,56 @@ def stream_message(conversation: Conversation, content: str):
) )
return return
if route.starts_regulatory_info_package:
selection = select_instruction_input(conversation, content)
if selection.status != "selected":
reply_content = selection.message or "请先在当前对话右侧上传产品说明书 docx 文件然后再发送第1章监管信息生成指令。"
assistant_message = append_assistant_message(conversation, reply_content)
yield sse_event("chunk", {"delta": reply_content})
yield sse_event(
"done",
{
"assistant_message_id": assistant_message.pk,
"conversation_id": conversation.pk,
"title": conversation.title,
},
)
return
batch = create_regulatory_info_package_batch(
conversation=conversation,
user=conversation.user,
trigger_message=user_message,
source_attachment=selection.attachment,
source_summary_batch=selection.source_summary_batch,
source_summary_item_id=selection.source_summary_item_id,
source_file_name=selection.file_name,
source_storage_path=selection.storage_path,
)
start_regulatory_info_package_workflow(
batch,
async_run=getattr(settings, "REGULATORY_INFO_PACKAGE_ASYNC", True),
)
reply_content = f"已启动第1章监管信息材料包生成工作流批次号{batch.batch_no}"
assistant_message = append_assistant_message(conversation, reply_content)
yield sse_event(
"workflow_started",
{
"workflow_type": REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE,
"batch_id": batch.pk,
"batch_no": batch.batch_no,
},
)
yield sse_event("chunk", {"delta": reply_content})
yield sse_event(
"done",
{
"assistant_message_id": assistant_message.pk,
"conversation_id": conversation.pk,
"title": conversation.title,
},
)
return
if route.starts_regulatory_review: if route.starts_regulatory_review:
source_summary_batch = find_latest_successful_summary_batch(conversation) source_summary_batch = find_latest_successful_summary_batch(conversation)
if not source_summary_batch: if not source_summary_batch:

View File

@@ -11,6 +11,10 @@ from .file_summary.workflow_trigger import (
from .application_form_fill.constants import FORM_FILL_TRIGGER_KEYWORDS, WORKFLOW_TYPE as FORM_FILL_WORKFLOW_TYPE from .application_form_fill.constants import FORM_FILL_TRIGGER_KEYWORDS, WORKFLOW_TYPE as FORM_FILL_WORKFLOW_TYPE
from .llm import LLMConfigurationError, LLMRequestError, generate_completion from .llm import LLMConfigurationError, LLMRequestError, generate_completion
from .models import Conversation, FileAttachment from .models import Conversation, FileAttachment
from .regulatory_info_package.constants import (
REGULATORY_INFO_PACKAGE_TRIGGER_KEYWORDS,
WORKFLOW_TYPE as REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE,
)
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -18,6 +22,7 @@ logger = logging.getLogger(__name__)
ROUTE_ACTIONS = {"normal_chat", "attachment_reader", "file_summary"} ROUTE_ACTIONS = {"normal_chat", "attachment_reader", "file_summary"}
ROUTE_ACTIONS.add("regulatory_review") ROUTE_ACTIONS.add("regulatory_review")
ROUTE_ACTIONS.add(FORM_FILL_WORKFLOW_TYPE) ROUTE_ACTIONS.add(FORM_FILL_WORKFLOW_TYPE)
ROUTE_ACTIONS.add(REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE)
@dataclass(frozen=True) @dataclass(frozen=True)
@@ -45,6 +50,10 @@ class SkillRoute:
def starts_application_form_fill(self) -> bool: def starts_application_form_fill(self) -> bool:
return self.action == FORM_FILL_WORKFLOW_TYPE return self.action == FORM_FILL_WORKFLOW_TYPE
@property
def starts_regulatory_info_package(self) -> bool:
return self.action == REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE
@property @property
def is_normal_chat(self) -> bool: def is_normal_chat(self) -> bool:
return self.action == "normal_chat" return self.action == "normal_chat"
@@ -80,6 +89,14 @@ def route_message_intent(conversation: Conversation, content: str) -> SkillRoute
def _deterministic_workflow_route(conversation: Conversation, content: str) -> SkillRoute | None: def _deterministic_workflow_route(conversation: Conversation, content: str) -> SkillRoute | None:
if _matches_regulatory_info_package(content):
return SkillRoute(
action=REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE,
workflow_type=REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE,
confidence=0.9,
reason="命中明确第1章监管信息材料包生成关键词。",
source="rule_preflight",
)
if _matches_application_form_fill(content): if _matches_application_form_fill(content):
return SkillRoute( return SkillRoute(
action=FORM_FILL_WORKFLOW_TYPE, action=FORM_FILL_WORKFLOW_TYPE,
@@ -144,7 +161,9 @@ def _route_with_llm(
return SkillRoute( return SkillRoute(
action=action, action=action,
skill_name="attachment_reader" if action == "attachment_reader" else "", skill_name="attachment_reader" if action == "attachment_reader" else "",
workflow_type=action if action in {"file_summary", "regulatory_review", FORM_FILL_WORKFLOW_TYPE} else "", workflow_type=action
if action in {"file_summary", "regulatory_review", FORM_FILL_WORKFLOW_TYPE, REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE}
else "",
confidence=_float_or_zero(payload.get("confidence")), confidence=_float_or_zero(payload.get("confidence")),
reason=str(payload.get("reason") or ""), reason=str(payload.get("reason") or ""),
source="llm", source="llm",
@@ -152,6 +171,15 @@ def _route_with_llm(
def _route_with_rules(conversation: Conversation, content: str) -> SkillRoute: def _route_with_rules(conversation: Conversation, content: str) -> SkillRoute:
if _matches_regulatory_info_package(content):
return SkillRoute(
action=REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE,
workflow_type=REGULATORY_INFO_PACKAGE_WORKFLOW_TYPE,
confidence=0.7,
reason="命中第1章监管信息材料包生成关键词。",
source="rule_fallback",
)
if _matches_application_form_fill(content): if _matches_application_form_fill(content):
return SkillRoute( return SkillRoute(
action=FORM_FILL_WORKFLOW_TYPE, action=FORM_FILL_WORKFLOW_TYPE,
@@ -210,11 +238,12 @@ def _router_system_prompt() -> str:
return ( return (
"你是审核智能体的工具路由器,只判断是否需要调用工具,不直接回答用户。" "你是审核智能体的工具路由器,只判断是否需要调用工具,不直接回答用户。"
"你必须只输出 JSON 对象,不要输出 Markdown。" "你必须只输出 JSON 对象,不要输出 Markdown。"
"可选 actionnormal_chat、attachment_reader、file_summary、regulatory_review、application_form_fill。" "可选 actionnormal_chat、attachment_reader、file_summary、regulatory_review、application_form_fill、regulatory_info_package"
"attachment_reader 用于用户要求阅读、提取、分析、总结、查看上传附件内容。" "attachment_reader 用于用户要求阅读、提取、分析、总结、查看上传附件内容。"
"file_summary 用于用户要求自动汇总文件目录、页数、清单或生成目录页数报告。" "file_summary 用于用户要求自动汇总文件目录、页数、清单或生成目录页数报告。"
"regulatory_review 用于用户要求法规核查、NMPA核查、完整性核查、章节一致性核查、风险预警或整改建议。" "regulatory_review 用于用户要求法规核查、NMPA核查、完整性核查、章节一致性核查、风险预警或整改建议。"
"application_form_fill 用于用户要求填注册证、生成申报模板、填写对应表格、安全和性能基本原则清单或自动填表。" "application_form_fill 用于用户要求填注册证、生成申报模板、填写对应表格、安全和性能基本原则清单或自动填表。"
"regulatory_info_package 用于用户要求根据说明书生成第1章监管信息、监管信息材料包、申请表、产品列表或声明材料包。"
"normal_chat 用于不需要读取附件或执行工作流的一般问答。" "normal_chat 用于不需要读取附件或执行工作流的一般问答。"
"输出字段action、confidence、reason。" "输出字段action、confidence、reason。"
) )
@@ -268,6 +297,11 @@ def _matches_regulatory_review(content: str) -> bool:
return any(keyword in normalized for keyword in keywords) return any(keyword in normalized for keyword in keywords)
def _matches_regulatory_info_package(content: str) -> bool:
normalized = "".join((content or "").lower().split())
return any("".join(keyword.lower().split()) in normalized for keyword in REGULATORY_INFO_PACKAGE_TRIGGER_KEYWORDS)
def _matches_application_form_fill(content: str) -> bool: def _matches_application_form_fill(content: str) -> bool:
normalized = content.lower() normalized = content.lower()
return any(keyword.lower() in normalized for keyword in FORM_FILL_TRIGGER_KEYWORDS) return any(keyword.lower() in normalized for keyword in FORM_FILL_TRIGGER_KEYWORDS)

View File

@@ -21,6 +21,10 @@ from .application_form_fill.views import (
batch_status as application_form_fill_batch_status, batch_status as application_form_fill_batch_status,
start as application_form_fill_start, start as application_form_fill_start,
) )
from .regulatory_info_package.views import (
batch_status as regulatory_info_package_batch_status,
start as regulatory_info_package_start,
)
from .views import ( from .views import (
knowledge_base_document_detail, knowledge_base_document_detail,
knowledge_base_document_index, knowledge_base_document_index,
@@ -112,6 +116,16 @@ urlpatterns = [
application_form_fill_batch_status, application_form_fill_batch_status,
name="application_form_fill_batch_status", name="application_form_fill_batch_status",
), ),
path(
"api/review-agent/regulatory-info-package/start/",
regulatory_info_package_start,
name="regulatory_info_package_start",
),
path(
"api/review-agent/regulatory-info-package/<int:batch_id>/status/",
regulatory_info_package_batch_status,
name="regulatory_info_package_batch_status",
),
path( path(
"api/review-agent/knowledge-base/status/", "api/review-agent/knowledge-base/status/",
knowledge_base_status, knowledge_base_status,

View File

@@ -16,7 +16,15 @@ from .services import (
send_message, send_message,
stream_message, stream_message,
) )
from .models import ApplicationFormFillBatch, Conversation, FileAttachment, FileSummaryBatch, RegulatoryReviewBatch, WorkflowNodeRun from .models import (
ApplicationFormFillBatch,
Conversation,
FileAttachment,
FileSummaryBatch,
RegulatoryInfoPackageBatch,
RegulatoryReviewBatch,
WorkflowNodeRun,
)
from .knowledge_base import build_knowledge_base_context, search_knowledge_base from .knowledge_base import build_knowledge_base_context, search_knowledge_base
from .knowledge_base import ( from .knowledge_base import (
build_knowledge_base_context_for_user, build_knowledge_base_context_for_user,
@@ -329,6 +337,25 @@ def build_workflow_cards(conversation: Conversation) -> list[dict[str, object]]:
), ),
} }
) )
rip_batches = RegulatoryInfoPackageBatch.objects.filter(conversation=conversation, is_deleted=False)
for batch in rip_batches:
cards.append(
{
"id": batch.pk,
"workflow_type": "regulatory_info_package",
"batch_no": batch.batch_no,
"status": batch.status,
"error_message": batch.error_message,
"risk_label": _format_regulatory_info_package_label(batch),
"created_at": batch.created_at,
"nodes": list(
WorkflowNodeRun.objects.filter(
workflow_type="regulatory_info_package",
workflow_batch_id=batch.pk,
).order_by("id")
),
}
)
return sorted(cards, key=lambda item: item["created_at"], reverse=True)[:5] return sorted(cards, key=lambda item: item["created_at"], reverse=True)[:5]
@@ -374,6 +401,20 @@ def _format_form_fill_label(batch: ApplicationFormFillBatch) -> str:
return " · ".join(parts) return " · ".join(parts)
def _format_regulatory_info_package_label(batch: RegulatoryInfoPackageBatch) -> str:
parts = []
if batch.product_name:
parts.append(batch.product_name)
if batch.generated_files:
success_count = sum(1 for item in batch.generated_files if item.get("status") in {"success", "fallback_success"})
parts.append(f"生成 {success_count}/7")
if batch.missing_fields:
parts.append(f"缺失 {len(batch.missing_fields)}")
if batch.conflict_fields:
parts.append(f"冲突 {len(batch.conflict_fields)}")
return " · ".join(parts)
def build_home_dashboard_context(user) -> dict[str, object]: def build_home_dashboard_context(user) -> dict[str, object]:
conversations = Conversation.objects.filter(user=user) conversations = Conversation.objects.filter(user=user)
active_attachments = FileAttachment.objects.filter(user=user).exclude( active_attachments = FileAttachment.objects.filter(user=user).exclude(

View File

@@ -517,6 +517,8 @@
attributeName = "data-regulatory-status-url-template"; attributeName = "data-regulatory-status-url-template";
} else if (workflow_type === "application_form_fill") { } else if (workflow_type === "application_form_fill") {
attributeName = "data-application-form-fill-status-url-template"; attributeName = "data-application-form-fill-status-url-template";
} else if (workflow_type === "regulatory_info_package") {
attributeName = "data-regulatory-info-package-status-url-template";
} }
return templateUrl(attributeName, "__batch_id__", batchId); return templateUrl(attributeName, "__batch_id__", batchId);
} }

View File

@@ -225,6 +225,11 @@
type="button" type="button"
data-prompt-template="请基于当前对话最近成功汇总的产品资料,自动提取产品关键信息并填入申报文件模板" data-prompt-template="请基于当前对话最近成功汇总的产品资料,自动提取产品关键信息并填入申报文件模板"
>申报文件填表</button> >申报文件填表</button>
<button
class="tool-chip"
type="button"
data-prompt-template="根据说明书生成第1章监管信息"
>第1章监管信息</button>
</div> </div>
<button class="send-button" type="submit" id="sendButton">发送</button> <button class="send-button" type="submit" id="sendButton">发送</button>
</div> </div>
@@ -241,6 +246,7 @@
data-status-url-template="/api/review-agent/file-summary/__batch_id__/status/" data-status-url-template="/api/review-agent/file-summary/__batch_id__/status/"
data-regulatory-status-url-template="/api/review-agent/regulatory-review/__batch_id__/status/" data-regulatory-status-url-template="/api/review-agent/regulatory-review/__batch_id__/status/"
data-application-form-fill-status-url-template="/api/review-agent/application-form-fill/__batch_id__/status/" data-application-form-fill-status-url-template="/api/review-agent/application-form-fill/__batch_id__/status/"
data-regulatory-info-package-status-url-template="/api/review-agent/regulatory-info-package/__batch_id__/status/"
data-events-url-template="/api/review-agent/file-summary/__batch_id__/events/" data-events-url-template="/api/review-agent/file-summary/__batch_id__/events/"
> >
<section class="summary-section upload-section"> <section class="summary-section upload-section">

8
tests/conftest.py Normal file
View File

@@ -0,0 +1,8 @@
import pytest
@pytest.fixture(autouse=True)
def mock_regulatory_info_package_page_count(monkeypatch):
from review_agent.regulatory_info_package.services import package_generate
monkeypatch.setattr(package_generate, "count_document_pages", lambda _path: 1)

View File

@@ -0,0 +1,88 @@
import json
from review_agent.regulatory_info_package.schemas import InstructionExtractResult
from review_agent.regulatory_info_package.services.field_extract import extract_fields_by_rules, run_parallel_extract
def test_extract_fields_by_rules_finds_product_name_and_storage():
instruction = InstructionExtractResult(
source_file_name="目标产品说明书.docx",
paragraphs=["产品名称:新型冠状病毒检测试剂盒", "储存条件2-8℃保存"],
sections={},
tables=[],
component_tables=[],
front_text="产品名称:新型冠状病毒检测试剂盒\n储存条件2-8℃保存",
)
result = extract_fields_by_rules(instruction)
assert result["product_name"]["value"] == "新型冠状病毒检测试剂盒"
assert result["storage_condition"]["value"] == "2-8℃保存"
def test_extract_fields_by_rules_uses_registrant_or_manufacturer_for_applicant():
instruction = InstructionExtractResult(
source_file_name="目标产品说明书.docx",
paragraphs=[
"注册人/售后服务单位名称:卡尤迪生物科技宜兴有限公司",
"生产企业名称:卡尤迪生物科技宜兴有限公司",
"生产企业住所宜兴经济技术开发区杏里路10号宜兴光电产业园4幢101室、102室",
"联系方式: 0510-80330909, 0510-80330919",
"生产地址江苏省宜兴经济技术开发区杏里路10号宜兴光电产业园4幢102室",
],
sections={},
tables=[],
component_tables=[],
front_text="",
)
result = extract_fields_by_rules(instruction)
assert result["applicant_name"]["value"] == "卡尤迪生物科技宜兴有限公司"
assert result["manufacturer_name"]["value"] == "卡尤迪生物科技宜兴有限公司"
assert result["applicant_address"]["value"] == "宜兴经济技术开发区杏里路10号宜兴光电产业园4幢101室、102室"
assert result["applicant_contact"]["value"] == "0510-80330909, 0510-80330919"
assert result["production_address"]["value"] == "江苏省宜兴经济技术开发区杏里路10号宜兴光电产业园4幢102室"
def test_extract_fields_by_rules_serializes_component_table_and_notes():
instruction = InstructionExtractResult(
source_file_name="目标产品说明书.docx",
paragraphs=[],
sections={"【主要组成成分】": "表1 规格A大包装试剂盒组成成分\n注:不同批号试剂盒中各组分不得互换使用。"},
tables=[],
component_tables=[
{
"header": ["组分", "主要组成成分", "规格24人份/盒)", "规格48人份/盒)"],
"rows": [
["PCR反应液 I", "逆转录酶、Taq酶", "840μL/管×1管", "840μL/管×2管"],
["阳性对照品", "含目的片段的假病毒", "600μL/管×2管", "1200μL/管×2管"],
],
}
],
front_text="",
)
result = extract_fields_by_rules(instruction)
payload = json.loads(result["component_table"]["value"])
assert payload["header"][0:2] == ["组分", "主要组成成分"]
assert payload["rows"][0][0] == "PCR反应液 I"
assert result["component_notes"]["value"] == "表1 规格A大包装试剂盒组成成分\n注:不同批号试剂盒中各组分不得互换使用。"
def test_run_parallel_extract_keeps_rule_result_when_llm_fails():
instruction = InstructionExtractResult(
source_file_name="目标产品说明书.docx",
paragraphs=["产品名称:测试产品"],
sections={},
tables=[],
component_tables=[],
front_text="产品名称:测试产品",
)
result = run_parallel_extract(instruction, llm_extract_func=lambda _instruction: (_ for _ in ()).throw(ValueError("bad llm")))
assert result["regex_results"]["product_name"]["value"] == "测试产品"
assert result["llm_results"] == {}
assert result["llm_error"]

View File

@@ -0,0 +1,24 @@
from review_agent.regulatory_info_package.services.field_merge import merge_fields
def test_merge_fields_marks_missing_llm_only_and_conflict():
merged, summary = merge_fields(
{
"product_name": {"value": "规则产品", "evidence": "说明书", "confidence": 0.8, "label": "产品名称"},
"applicant_name": {"value": "", "evidence": "", "confidence": 0.0, "label": "申请人名称"},
"package_specification": {"value": "24人份/盒", "evidence": "表格", "confidence": 0.7, "label": "包装规格"},
},
{
"intended_use": {"value": "用于检测", "evidence": "LLM", "confidence": 0.6, "label": "预期用途"},
"package_specification": {"value": "48人份/盒", "evidence": "LLM", "confidence": 0.6, "label": "包装规格"},
},
)
assert merged["applicant_name"].value == "/"
assert merged["applicant_name"].highlight_reason == "missing"
assert merged["intended_use"].highlight_reason == "llm_only"
assert merged["package_specification"].value == "24人份/盒"
assert merged["package_specification"].highlight_reason == "conflict"
assert any(item["field_key"] == "applicant_name" for item in summary["missing_fields"])
assert len(summary["llm_only_fields"]) == 1
assert len(summary["conflict_fields"]) == 1

View File

@@ -0,0 +1,45 @@
import pytest
from django.urls import reverse
from review_agent.models import Conversation, RegulatoryInfoPackageBatch, WorkflowNodeRun
pytestmark = pytest.mark.django_db
def test_workspace_renders_regulatory_info_package_chip_and_card(client, django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-CARD",
status=RegulatoryInfoPackageBatch.Status.SUCCESS,
generated_files=[{"status": "success"} for _ in range(7)],
)
WorkflowNodeRun.objects.create(
workflow_type="regulatory_info_package",
workflow_batch_id=batch.pk,
node_group="regulatory_info_package",
node_code="zip_export",
node_name="打包下载",
status=WorkflowNodeRun.Status.SUCCESS,
progress=100,
)
client.force_login(user)
response = client.get(f"{reverse('chat')}?conversation={conversation.pk}")
content = response.content.decode("utf-8")
assert "第1章监管信息" in content
assert 'data-workflow-type="regulatory_info_package"' in content
assert "data-regulatory-info-package-status-url-template" in content
assert "RIP-CARD" in content
def test_frontend_selects_regulatory_info_package_status_url():
script = open("static/js/app.js", encoding="utf-8").read()
assert 'workflow_type === "regulatory_info_package"' in script
assert "data-regulatory-info-package-status-url-template" in script

View File

@@ -0,0 +1,48 @@
import pytest
from review_agent.models import Conversation, FileAttachment
from review_agent.regulatory_info_package.services.input_select import select_instruction_input
pytestmark = pytest.mark.django_db
def test_select_instruction_input_prefers_message_filename(django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
selected = FileAttachment.objects.create(
conversation=conversation,
user=user,
original_name="目标产品说明书.docx",
storage_path="uploads/target.docx",
)
FileAttachment.objects.create(
conversation=conversation,
user=user,
original_name="其他说明书.docx",
storage_path="uploads/other.docx",
)
result = select_instruction_input(conversation, "请使用目标产品说明书生成第1章监管信息")
assert result.status == "selected"
assert result.attachment == selected
assert result.file_name == "目标产品说明书.docx"
def test_select_instruction_input_waits_on_multiple_candidates(django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
for name in ["A说明书.docx", "B说明书.docx"]:
FileAttachment.objects.create(
conversation=conversation,
user=user,
original_name=name,
storage_path=f"uploads/{name}",
)
result = select_instruction_input(conversation, "生成第1章监管信息")
assert result.status == "waiting_user"
assert result.candidates == ["A说明书.docx", "B说明书.docx"]

View File

@@ -0,0 +1,16 @@
from pathlib import Path
from review_agent.regulatory_info_package.services.instruction_extract import parse_instruction_docx
def test_parse_instruction_docx_extracts_paragraphs_and_tables():
path = Path("docs/0.原始材料/目标产品说明书.docx")
result = parse_instruction_docx(path)
assert result.source_file_name == "目标产品说明书.docx"
assert result.paragraphs
assert isinstance(result.sections, dict)
assert isinstance(result.tables, list)
assert result.front_text

View File

@@ -0,0 +1,9 @@
from review_agent.regulatory_info_package.services.legacy_doc_document import detect_legacy_doc_capability
def test_detect_legacy_doc_capability_is_stable():
capability = detect_legacy_doc_capability()
assert capability.status in {"available", "unavailable"}
assert capability.adapter in {"WordComDocAdapter", "UnavailableLegacyDocAdapter"}

View File

@@ -0,0 +1,109 @@
import pytest
from django.db import IntegrityError
from review_agent.models import (
Conversation,
ExportedSummaryFile,
FileAttachment,
RegulatoryInfoPackageArtifact,
RegulatoryInfoPackageBatch,
RegulatoryInfoPackageNotificationRecord,
WorkflowNodeRun,
)
pytestmark = pytest.mark.django_db
def test_regulatory_info_package_batch_defaults(django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
attachment = FileAttachment.objects.create(
conversation=conversation,
user=user,
original_name="目标产品说明书.docx",
storage_path="uploads/instruction.docx",
)
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
source_attachment=attachment,
batch_no="RIP-20260610153000-abcdef",
source_file_name=attachment.original_name,
source_storage_path=attachment.storage_path,
)
assert batch.status == RegulatoryInfoPackageBatch.Status.PENDING
assert batch.output_zip_name == "第1章 监管信息(预生成版).zip"
assert batch.generated_files == []
assert batch.missing_fields == []
assert batch.llm_only_fields == []
assert batch.conflict_fields == []
assert batch.risk_notes == []
assert batch.adapter_summary == {}
assert str(batch) == "RIP-20260610153000-abcdef"
def test_regulatory_info_package_artifact_and_notification(django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-20260610153100-abcdef",
)
artifact = RegulatoryInfoPackageArtifact.objects.create(
batch=batch,
artifact_type=RegulatoryInfoPackageArtifact.ArtifactType.ZIP_PACKAGE,
file_format=RegulatoryInfoPackageArtifact.FileFormat.ZIP,
name="主下载包",
file_name="第1章 监管信息(预生成版).zip",
storage_path="media/regulatory_info_package/package.zip",
)
notification = RegulatoryInfoPackageNotificationRecord.objects.create(
batch=batch,
recipient=user,
export_ids=[1, 2],
message_summary="材料包已生成",
send_status=RegulatoryInfoPackageNotificationRecord.SendStatus.SUCCESS,
)
assert artifact.metadata == {}
assert artifact.is_deleted is False
assert notification.channel == RegulatoryInfoPackageNotificationRecord.Channel.MOCK
assert notification.retry_count == 0
def test_exported_summary_file_supports_zip_type():
values = {value for value, _label in ExportedSummaryFile.ExportType.choices}
assert "zip" in values
def test_workflow_node_run_unique_for_workflow_batch(django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-20260610153200-abcdef",
)
WorkflowNodeRun.objects.create(
workflow_type="regulatory_info_package",
workflow_batch_id=batch.pk,
node_group="regulatory_info_package",
node_code="prepare",
node_name="准备资料",
)
with pytest.raises(IntegrityError):
WorkflowNodeRun.objects.create(
workflow_type="regulatory_info_package",
workflow_batch_id=batch.pk,
node_group="regulatory_info_package",
node_code="prepare",
node_name="准备资料",
)

View File

@@ -0,0 +1,17 @@
import pytest
from review_agent.models import Conversation, RegulatoryInfoPackageBatch, RegulatoryInfoPackageNotificationRecord
pytestmark = pytest.mark.django_db
def test_regulatory_info_package_notification_record_defaults(django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(conversation=conversation, user=user, batch_no="RIP-NOTIFY")
record = RegulatoryInfoPackageNotificationRecord.objects.create(batch=batch, recipient=user)
assert record.channel == RegulatoryInfoPackageNotificationRecord.Channel.MOCK
assert record.send_status == RegulatoryInfoPackageNotificationRecord.SendStatus.PENDING

View File

@@ -0,0 +1,281 @@
import json
import pytest
from docx import Document
from pathlib import Path
from django.conf import settings
from django.utils import timezone
from review_agent.models import Conversation, RegulatoryInfoPackageBatch
from review_agent.regulatory_info_package.services.field_merge import merge_fields
from review_agent.regulatory_info_package.services import package_generate
from review_agent.regulatory_info_package.services.package_generate import generate_package_documents
from review_agent.regulatory_info_package.services.template_config import load_template_config
pytestmark = pytest.mark.django_db
def test_template_config_uses_clean_internal_templates():
config = load_template_config()
source_dir = Path(config["source_dir"])
assert source_dir == settings.BASE_DIR / "review_agent" / "regulatory_info_package" / "templates" / "clean"
assert source_dir.exists()
assert len(config["templates"]) == 6
assert all((source_dir / item["source_file"]).exists() for item in config["templates"])
def test_clean_templates_expose_stable_fill_placeholders():
config = load_template_config()
source_dir = Path(config["source_dir"])
expected_by_code = {
"ch1_2_directory": {"{{product_name}}"},
"ch1_4_application_form": {"{{product_name}}", "{{applicant_name}}"},
"ch1_5_product_list": {"{{product_name}}"},
"ch1_11_1_standards": {"{{product_name}}"},
"ch1_11_5_authenticity": {"{{product_name}}"},
"ch1_11_6_conformity": {"{{product_name}}"},
}
for item in config["templates"]:
document = Document(source_dir / item["source_file"])
text = _document_text(document)
for placeholder in expected_by_code[item["code"]]:
assert placeholder in text
def test_directory_template_includes_page_numbers():
config = load_template_config()
source_dir = Path(config["source_dir"])
item = next(template for template in config["templates"] if template["code"] == "ch1_2_directory")
document = Document(source_dir / item["source_file"])
page_numbers = [row.cells[4].text.strip() for row in document.tables[0].rows[1:]]
assert page_numbers == ["1", "1", "1", "1", "1", "1"]
def test_application_form_template_uses_real_checkbox_symbols():
config = load_template_config()
source_dir = Path(config["source_dir"])
item = next(template for template in config["templates"] if template["code"] == "ch1_4_application_form")
text = _document_text(Document(source_dir / item["source_file"]))
assert "{{复选框}}" not in text
assert "{{}}" not in text
assert "" in text
assert "" in text
def test_generate_package_documents_creates_six_results(django_user_model, tmp_path):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-20260610154000-abcdef",
work_dir=str(tmp_path),
)
merged, _summary = merge_fields({"product_name": {"value": "测试产品", "label": "产品名称"}}, {})
results = generate_package_documents(batch, load_template_config(), merged)
assert len(results) == 6
assert all(result.status in {"success", "fallback_success"} for result in results), [
(result.template_code, result.status, result.error_message) for result in results
]
assert all(result.path for result in results)
def test_directory_is_generated_last_with_real_page_counts(django_user_model, tmp_path, monkeypatch):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-20260610154010-abcdef",
work_dir=str(tmp_path),
)
merged, _summary = merge_fields({"product_name": {"value": "测试产品", "label": "产品名称"}}, {})
page_counts = {
"CH1.4 申请表.docx": 3,
"CH1.5 产品列表.docx": 5,
"CH1.11.1 符合标准的清单.docx": 2,
"CH1.11.5 真实性声明.docx": 4,
"CH1.11.6 符合性声明.docx": 6,
}
counted_files = []
def fake_count(path):
counted_files.append(Path(path).name)
return page_counts[Path(path).name]
monkeypatch.setattr(package_generate, "count_document_pages", fake_count, raising=False)
results = generate_package_documents(batch, load_template_config(), merged)
assert results[-1].template_code == "ch1_2_directory"
assert set(counted_files) == set(page_counts)
directory = Document(results[-1].path)
directory_pages = {row.cells[0].text.strip(): row.cells[4].text.strip() for row in directory.tables[0].rows[1:]}
assert directory_pages == {
"CH1.2": "1",
"CH1.4": "3",
"CH1.5": "5",
"CH1.11.1": "2",
"CH1.11.5": "4",
"CH1.11.6": "6",
}
def test_generated_docx_does_not_add_prefill_or_audit_blocks(django_user_model, tmp_path):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-20260610154100-abcdef",
work_dir=str(tmp_path),
)
merged, _summary = merge_fields({"product_name": {"value": "测试产品", "label": "产品名称"}}, {})
results = generate_package_documents(batch, load_template_config(), merged)
for result in results:
document = Document(result.path)
text = _document_text(document)
assert "预生成版" not in text
assert "预生成字段" not in text
assert "component_table" not in text
assert '"header"' not in text
assert "测试产品" in text
def test_generated_docx_replaces_sample_case_content(django_user_model, tmp_path):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-20260610154200-abcdef",
work_dir=str(tmp_path),
)
merged, _summary = merge_fields(
{
"product_name": {"value": "测试产品", "label": "产品名称"},
"package_specification": {"value": "24人份/盒48人份/盒", "label": "包装规格"},
},
{},
)
results = generate_package_documents(batch, load_template_config(), merged)
docx_results = [result for result in results if result.actual_format == "docx"]
for result in docx_results:
document = Document(result.path)
text = "\n".join(paragraph.text for paragraph in document.paragraphs)
for table in document.tables:
for row in table.rows:
text += "\n" + "\t".join(cell.text for cell in row.cells)
assert "呼吸道合胞病毒、肺炎支原体核酸检测试剂盒" not in text
product_list = next(result for result in results if result.template_code == "ch1_5_product_list")
product_doc = Document(product_list.path)
table = product_doc.tables[0]
assert table.rows[1].cells[0].text == "24人份/盒"
assert table.rows[1].cells[1].text == "/"
assert "6018003102" not in "\n".join(cell.text for row in table.rows for cell in row.cells)
def test_generated_docs_fill_clean_template_body(django_user_model, tmp_path):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-20260610154300-abcdef",
work_dir=str(tmp_path),
)
merged, _summary = merge_fields(
{
"product_name": {"value": "甲型流感病毒核酸检测试剂盒", "label": "产品名称"},
"applicant_name": {"value": "星河医疗科技有限公司", "label": "申请人名称"},
"package_specification": {"value": "24人份/盒48人份/盒", "label": "包装规格"},
"standard_no": {"value": "GB/T 29791.1-2013", "label": "标准号"},
},
{},
)
results = generate_package_documents(batch, load_template_config(), merged)
for code in ["ch1_2_directory", "ch1_4_application_form", "ch1_11_5_authenticity", "ch1_11_6_conformity"]:
result = next(item for item in results if item.template_code == code)
text = _document_text(Document(result.path))
assert "甲型流感病毒核酸检测试剂盒" in text
if code == "ch1_4_application_form":
assert "星河医疗科技有限公司" in text
assert "{{" not in text
assert "}}" not in text
today = timezone.localdate().strftime("%Y年%m月%d")
for code in ["ch1_11_1_standards", "ch1_11_5_authenticity", "ch1_11_6_conformity"]:
result = next(item for item in results if item.template_code == code)
text = _document_text(Document(result.path))
assert today in text
assert "xxxx年xx月xx日" not in text
assert "星河医疗科技有限公司" not in text
product_list = next(item for item in results if item.template_code == "ch1_5_product_list")
product_text = _document_text(Document(product_list.path))
assert "24人份/盒" in product_text
assert "48人份/盒" in product_text
def test_product_list_uses_component_table_from_instruction(django_user_model, tmp_path):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-20260610154400-abcdef",
work_dir=str(tmp_path),
)
component_payload = {
"header": ["组分", "主要组成成分", "规格24人份/盒)", "规格48人份/盒)"],
"rows": [
["PCR反应液 I", "逆转录酶、Taq酶", "840μL/管×1管", "840μL/管×2管"],
["阳性对照品", "含目的片段的假病毒", "600μL/管×2管", "1200μL/管×2管"],
],
}
merged, _summary = merge_fields(
{
"product_name": {"value": "新型冠状病毒核酸检测试剂盒", "label": "产品名称"},
"package_specification": {"value": "24人份/盒48人份/盒", "label": "包装规格"},
"component_table": {
"value": json.dumps(component_payload, ensure_ascii=False),
"label": "主要组成成分",
},
"component_notes": {
"value": "注:不同批号试剂盒中各组分不得互换使用。",
"label": "主要组成成分备注",
},
},
{},
)
results = generate_package_documents(batch, load_template_config(), merged)
product_list = next(result for result in results if result.template_code == "ch1_5_product_list")
document = Document(product_list.path)
text = _document_text(document)
assert "PCR反应液 I" in text
assert "840μL/管×1管" in text
assert "840μL/管×2管" in text
assert "注:不同批号试剂盒中各组分不得互换使用。" in text
assert "RSV&MP" not in text
assert "6018003102" not in text
def _document_text(document: Document) -> str:
text = "\n".join(paragraph.text for paragraph in document.paragraphs)
for table in document.tables:
for row in table.rows:
text += "\n" + "\t".join(cell.text for cell in row.cells)
return text

View File

@@ -0,0 +1,13 @@
from review_agent.regulatory_info_package.services.summary import build_assistant_summary
def test_build_assistant_summary_puts_zip_first():
exports = [
{"file_name": "CH1.4 申请表.docx", "download_url": "/docx"},
{"file_name": "第1章 监管信息(预生成版).zip", "download_url": "/zip", "export_type": "zip"},
]
summary = build_assistant_summary(batch_no="RIP-1", exports=exports, failed_files=[])
assert summary.index("第1章 监管信息(预生成版).zip") < summary.index("CH1.4 申请表.docx")

View File

@@ -0,0 +1,46 @@
from pathlib import Path
import pytest
from review_agent.regulatory_info_package.constants import DEFAULT_ZIP_NAME
from review_agent.regulatory_info_package.services.template_config import (
compute_config_hash,
load_template_config,
validate_template_config,
)
def test_template_config_loads_six_templates():
config = load_template_config()
assert config["version"] == "regulatory_info_package_templates_v1"
assert config["zip_name"] == DEFAULT_ZIP_NAME
assert len(config["templates"]) == 6
assert {template["code"] for template in config["templates"]} == {
"ch1_2_directory",
"ch1_4_application_form",
"ch1_5_product_list",
"ch1_11_1_standards",
"ch1_11_5_authenticity",
"ch1_11_6_conformity",
}
assert validate_template_config(config) == []
assert compute_config_hash()
def test_template_config_rejects_duplicate_codes():
config = load_template_config()
config["templates"].append(dict(config["templates"][0]))
errors = validate_template_config(config)
assert any("重复" in error for error in errors)
def test_template_config_sources_exist():
config = load_template_config()
source_dir = Path(config["source_dir"])
assert source_dir.exists()
for template in config["templates"]:
assert (source_dir / template["source_file"]).exists()

View File

@@ -0,0 +1,28 @@
from pathlib import Path
from openpyxl import load_workbook
from review_agent.regulatory_info_package.schemas import MergedField
from review_agent.regulatory_info_package.services.traceability_export import save_traceability_exports
def test_save_traceability_exports_writes_excel_and_json(tmp_path):
fields = {
"product_name": MergedField(
key="product_name",
label="产品名称",
value="测试产品",
source="rule",
evidence="说明书",
confidence=0.9,
)
}
excel_path, json_path = save_traceability_exports(tmp_path, fields)
assert excel_path.name == "traceability.xlsx"
assert json_path.name == "traceability.json"
assert json_path.exists()
workbook = load_workbook(excel_path)
assert workbook.active["A1"].value == "target_file"

View File

@@ -0,0 +1,19 @@
import pytest
from review_agent.models import Conversation
from review_agent.skill_router import route_message_intent
pytestmark = pytest.mark.django_db
def test_fixed_keyword_routes_to_regulatory_info_package(django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
route = route_message_intent(conversation, "请根据说明书生成第1章监管信息")
assert route.action == "regulatory_info_package"
assert route.workflow_type == "regulatory_info_package"
assert route.starts_regulatory_info_package is True

View File

@@ -0,0 +1,140 @@
from pathlib import Path
import pytest
from review_agent.models import (
Conversation,
ExportedSummaryFile,
RegulatoryInfoPackageBatch,
WorkflowNodeRun,
)
pytestmark = pytest.mark.django_db
def test_regulatory_info_package_export_download_checks_owner(client, django_user_model, tmp_path):
owner = django_user_model.objects.create_user(username="owner", password="pass")
other = django_user_model.objects.create_user(username="other", password="pass")
conversation = Conversation.objects.create(user=owner, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=owner,
batch_no="RIP-20260610153300-abcdef",
)
path = tmp_path / "第1章 监管信息(预生成版).zip"
path.write_bytes(b"zip-content")
exported = ExportedSummaryFile.objects.create(
batch=None,
workflow_type="regulatory_info_package",
workflow_batch_id=batch.pk,
export_category="regulatory_info_package",
export_type=ExportedSummaryFile.ExportType.ZIP,
file_name=path.name,
storage_path=str(path),
)
client.force_login(other)
denied = client.get(f"/api/review-agent/file-summary/exports/{exported.pk}/download/")
assert denied.status_code == 404
client.force_login(owner)
allowed = client.get(f"/api/review-agent/file-summary/exports/{exported.pk}/download/")
assert allowed.status_code == 200
assert allowed["Content-Type"] == "application/zip"
@pytest.mark.parametrize(
("file_name", "export_type", "expected"),
[
("CH1.9 产品申报前沟通的说明.doc", ExportedSummaryFile.ExportType.WORD, "application/msword"),
(
"CH1.4 申请表.docx",
ExportedSummaryFile.ExportType.WORD,
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
),
("第1章 监管信息(预生成版).zip", ExportedSummaryFile.ExportType.ZIP, "application/zip"),
],
)
def test_regulatory_info_package_download_mime_by_extension(
client,
django_user_model,
tmp_path,
file_name,
export_type,
expected,
):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no=f"RIP-20260610153400-{Path(file_name).suffix[1:] or 'zip'}",
)
path = tmp_path / file_name
path.write_bytes(b"content")
exported = ExportedSummaryFile.objects.create(
batch=None,
workflow_type="regulatory_info_package",
workflow_batch_id=batch.pk,
export_category="generated_document",
export_type=export_type,
file_name=file_name,
storage_path=str(path),
)
client.force_login(user)
response = client.get(f"/api/review-agent/file-summary/exports/{exported.pk}/download/")
assert response.status_code == 200
assert response["Content-Type"] == expected
def test_regulatory_info_package_status_returns_nodes_and_zip_first(client, django_user_model, tmp_path):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = RegulatoryInfoPackageBatch.objects.create(
conversation=conversation,
user=user,
batch_no="RIP-20260610153500-abcdef",
status=RegulatoryInfoPackageBatch.Status.SUCCESS,
)
WorkflowNodeRun.objects.create(
workflow_type="regulatory_info_package",
workflow_batch_id=batch.pk,
node_group="regulatory_info_package",
node_code="zip_export",
node_name="打包下载",
status=WorkflowNodeRun.Status.SUCCESS,
progress=100,
)
doc = tmp_path / "CH1.4 申请表.docx"
zip_file = tmp_path / "第1章 监管信息(预生成版).zip"
doc.write_bytes(b"doc")
zip_file.write_bytes(b"zip")
ExportedSummaryFile.objects.create(
batch=None,
workflow_type="regulatory_info_package",
workflow_batch_id=batch.pk,
export_category="generated_document",
export_type=ExportedSummaryFile.ExportType.WORD,
file_name=doc.name,
storage_path=str(doc),
)
ExportedSummaryFile.objects.create(
batch=None,
workflow_type="regulatory_info_package",
workflow_batch_id=batch.pk,
export_category="regulatory_info_package",
export_type=ExportedSummaryFile.ExportType.ZIP,
file_name=zip_file.name,
storage_path=str(zip_file),
)
client.force_login(user)
response = client.get(f"/api/review-agent/regulatory-info-package/{batch.pk}/status/")
payload = response.json()
assert payload["batch"]["workflow_type"] == "regulatory_info_package"
assert payload["nodes"][0]["node_code"] == "zip_export"
assert payload["exports"][0]["export_type"] == "zip"

View File

@@ -0,0 +1,92 @@
from pathlib import Path
import pytest
from review_agent.models import Conversation, FileAttachment, Message, RegulatoryInfoPackageBatch, WorkflowNodeRun
from review_agent.regulatory_info_package.constants import (
REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS,
WORKFLOW_TYPE,
)
from review_agent.regulatory_info_package.workflow import (
create_regulatory_info_package_batch,
start_regulatory_info_package_workflow,
)
pytestmark = pytest.mark.django_db
def test_create_regulatory_info_package_batch_initializes_nodes(django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = create_regulatory_info_package_batch(conversation=conversation, user=user)
assert batch.batch_no.startswith("RIP-")
assert batch.work_dir
nodes = WorkflowNodeRun.objects.filter(
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=batch.pk,
).order_by("id")
assert [node.node_code for node in nodes] == [
code for code, _name, _group in REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS
]
def test_create_regulatory_info_package_batch_is_node_idempotent(django_user_model):
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = create_regulatory_info_package_batch(conversation=conversation, user=user)
create_regulatory_info_package_batch(conversation=conversation, user=user, existing_batch=batch)
assert WorkflowNodeRun.objects.filter(
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=batch.pk,
).count() == len(REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS)
def test_empty_workflow_skeleton_completes(django_user_model, settings):
settings.REGULATORY_INFO_PACKAGE_ASYNC = False
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
batch = create_regulatory_info_package_batch(conversation=conversation, user=user)
start_regulatory_info_package_workflow(batch, async_run=False)
batch.refresh_from_db()
assert batch.status == RegulatoryInfoPackageBatch.Status.SUCCESS
assert WorkflowNodeRun.objects.filter(
workflow_type=WORKFLOW_TYPE,
workflow_batch_id=batch.pk,
status=WorkflowNodeRun.Status.SUCCESS,
).count() == len(REGULATORY_INFO_PACKAGE_NODE_DEFINITIONS)
def test_completed_workflow_appends_download_summary_message(django_user_model, settings):
settings.REGULATORY_INFO_PACKAGE_ASYNC = False
user = django_user_model.objects.create_user(username="owner", password="pass")
conversation = Conversation.objects.create(user=user, title="会话")
trigger = Message.objects.create(conversation=conversation, role=Message.Role.USER, content="根据说明书生成第1章监管信息")
source = Path("docs/0.原始材料/目标产品说明书.docx").resolve()
attachment = FileAttachment.objects.create(
conversation=conversation,
user=user,
original_name="目标产品说明书.docx",
storage_path=str(source),
file_size=source.stat().st_size,
)
batch = create_regulatory_info_package_batch(
conversation=conversation,
user=user,
trigger_message=trigger,
source_attachment=attachment,
source_file_name=attachment.original_name,
source_storage_path=attachment.storage_path,
)
start_regulatory_info_package_workflow(batch, async_run=False)
message = conversation.messages.filter(role=Message.Role.ASSISTANT, content__contains=batch.batch_no).latest("id")
assert "第1章 监管信息(预生成版).zip" in message.content
assert "/api/review-agent/file-summary/exports/" in message.content

View File

@@ -0,0 +1,22 @@
import zipfile
from review_agent.regulatory_info_package.schemas import GeneratedFileResult
from review_agent.regulatory_info_package.services.zip_export import create_zip_package
def test_create_zip_package_includes_only_success_files(tmp_path):
success = tmp_path / "ok.docx"
failed = tmp_path / "bad.docx"
success.write_bytes(b"ok")
failed.write_bytes(b"bad")
zip_path = create_zip_package(
tmp_path,
[
GeneratedFileResult("ok", "ok.docx", "docx", "docx", "success", path=str(success)),
GeneratedFileResult("bad", "bad.docx", "docx", "docx", "failed", path=str(failed)),
],
)
with zipfile.ZipFile(zip_path) as archive:
assert archive.namelist() == ["ok.docx"]