DEMO-AGENT/docs/详细设计/1.自动汇总.md

# 自动汇总文件夹文件目录与页数流程详细设计

## 文档信息

| 项目 | 内容 |
| --- | --- |
| 需求分析文档 | docs/需求分析/1.自动汇总.md |
| 功能设计文档 | docs/功能设计/1.自动汇总.md |
| 功能名称 | 自动汇总文件夹文件目录与页数 |
| 所属模块 | 审核智能体 review_agent |
| 设计日期 | 2026-06-05 |
| 设计版本 | V1.0 |

---

## 一、详细设计目标

本详细设计用于指导“自动汇总文件夹文件目录与页数”功能开发落地，覆盖代码目录、数据模型、接口契约、后台工作流、Skill 拆分、轻量依赖、前端三栏布局、SSE 实时状态、异常重试和测试用例。

核心约束：

| 约束 | 说明 |
| --- | --- |
| 对话绑定 | 上传文件与当前 Conversation 绑定，一个对话对应一套文件，不能串文件 |
| 上传即存储 | 用户拖拽或选择文件后立即保存，但不启动工作流 |
| 提示词触发 | 用户发送消息后，根据提示词判断是否启动自动汇总工作流 |
| 后台异步 | 工作流后台执行，右侧第三栏工作流卡片实时更新 |
| 轻量依赖 | 优先使用 Python 内部库和轻量第三方库，不强依赖 LibreOffice |
| 老格式支持 | doc、xls、ppt 进入处理流程，能读到页数则统计，读不到则记录异常 |
| 结果存档 | 批次、文件、节点、事件、明细、导出文件全部入库 |

---

## 二、代码结构设计

### 2.1 目录结构

在现有 `review_agent` 应用内按模块重新划分文件处理能力。Django 模型仍集中放在 `review_agent/models.py`，其余代码放入 `review_agent/file_summary/`。

```text
review_agent/
  models.py
  urls.py
  views.py
  services.py
  file_summary/
    __init__.py
    constants.py
    schemas.py
    storage.py
    workflow.py
    events.py
    urls.py
    views.py
    services/
      __init__.py
      archive.py
      inventory.py
      page_count.py
      product_detect.py
      report.py
      export_excel.py
      workflow_trigger.py
    skills/
      __init__.py
      base.py
      registry.py
      upload_intake.py
      archive_extract.py
      file_inventory.py
      document_page_count.py
      product_detect.py
      summary_report.py
      excel_export.py
```

### 2.2 文件职责

| 文件 | 职责 |
| --- | --- |
| review_agent/models.py | 集中定义 Conversation、Message、文件汇总相关模型 |
| file_summary/constants.py | 状态、节点、文件类型、事件类型常量 |
| file_summary/schemas.py | dataclass 入参出参结构，避免业务层直接传散乱 dict |
| file_summary/storage.py | 上传文件、工作目录、导出文件路径生成与保存 |
| file_summary/workflow.py | WorkflowExecutor，串行执行节点图 |
| file_summary/events.py | 工作流事件持久化与 SSE 格式化 |
| file_summary/views.py | 上传暂存、启动工作流、状态查询、SSE、下载接口 |
| services/archive.py | 压缩包识别、zip/7z/rar 解压 |
| services/inventory.py | 文件遍历与清单生成 |
| services/page_count.py | 文件页数统计与 3 次重试 |
| services/product_detect.py | 产品名识别 |
| services/report.py | Markdown 报告和对话简表生成 |
| services/export_excel.py | Excel 文件导出 |
| services/workflow_trigger.py | 根据提示词判断是否触发自动汇总工作流 |
| skills/base.py | Skill 基类与统一返回结构 |
| skills/registry.py | Skill 注册与按需加载 |
| skills/*.py | 各工作流节点对应 Skill |

---

## 三、依赖设计

### 3.1 requirements 建议

```text
Django==5.2.14
pypdf
python-docx
python-pptx
openpyxl
xlrd
olefile
py7zr
```

### 3.2 格式处理策略

| 格式 | 处理库 | 统计口径 | 失败策略 |
| --- | --- | --- | --- |
| pdf | pypdf | PDF 页面数 | 重试 3 次，仍失败记录异常 |
| docx | python-docx | 优先读取内置页数属性 | 读不到记录“页数不可确定” |
| doc | olefile | 读取 OLE 元数据页数 | 读不到记录“页数不可确定” |
| pptx | python-pptx | 幻灯片数量 | 重试 3 次，仍失败记录异常 |
| ppt | olefile | 读取 OLE 元数据页数/幻灯片数 | 读不到记录“页数不可确定” |
| xlsx | openpyxl | 工作表数量 | 重试 3 次，仍失败记录异常 |
| xls | xlrd | 工作表数量 | 重试 3 次，仍失败记录异常 |

### 3.3 压缩包处理策略

| 格式 | 处理方式 | 说明 |
| --- | --- | --- |
| zip | Python 标准库 zipfile | 必须支持 |
| 7z | py7zr | 必须支持 |
| rar | 优先系统 7z 命令 | Docker 镜像需安装 7-Zip/p7zip |

### 3.4 Docker 部署说明

Demo 运行不强依赖 LibreOffice。若未来要求 doc/docx/ppt/pptx 页数与 Office 打开后的分页完全一致，可在 Docker 镜像中额外安装 LibreOffice headless，再通过“转换 PDF 后统计页数”的增强策略实现。

RAR 解压如需稳定支持，Docker 镜像需要安装 7-Zip/p7zip，并确保 `7z` 命令在 PATH 中可调用。

---

## 四、数据模型详细设计

模型集中放在 `review_agent/models.py`，按“会话模型”和“文件汇总模型”分段。

### 4.1 FileAttachment

用户上传即存储的文件记录。此时尚未启动工作流。

| 字段 | 类型 | 约束 | 说明 |
| --- | --- | --- | --- |
| id | BigAutoField | PK | 主键 |
| conversation | ForeignKey(Conversation) | CASCADE, db_index | 绑定对话 |
| user | ForeignKey(User) | CASCADE, db_index | 上传用户 |
| original_name | CharField(255) | required | 原始文件名 |
| storage_path | CharField(500) | required | 本地保存路径 |
| file_size | BigIntegerField | default=0 | 文件大小 |
| content_type | CharField(120) | blank | MIME 类型 |
| upload_status | CharField(20) | choices | uploaded、bound、deleted |
| created_at | DateTimeField | auto_now_add | 上传时间 |

索引：

```text
(conversation, created_at)
(user, created_at)
```

### 4.2 FileSummaryBatch

一次自动汇总工作流批次。

| 字段 | 类型 | 约束 | 说明 |
| --- | --- | --- | --- |
| id | BigAutoField | PK | 主键 |
| conversation | ForeignKey(Conversation) | CASCADE, db_index | 绑定对话 |
| user | ForeignKey(User) | CASCADE, db_index | 执行用户 |
| trigger_message | ForeignKey(Message) | SET_NULL, null | 触发工作流的用户消息 |
| batch_no | CharField(64) | unique | 批次编号 |
| product_name | CharField(200) | blank | 产品名称 |
| status | CharField(20) | choices | pending、running、success、failed |
| total_files | IntegerField | default=0 | 文件总数 |
| supported_files | IntegerField | default=0 | 支持统计数 |
| success_files | IntegerField | default=0 | 成功数 |
| failed_files | IntegerField | default=0 | 失败数 |
| unsupported_files | IntegerField | default=0 | 不支持数 |
| uncertain_files | IntegerField | default=0 | 页数不可确定数 |
| total_pages | IntegerField | default=0 | 总页数 |
| work_dir | CharField(500) | blank | 工作目录 |
| error_message | TextField | blank | 批次错误 |
| created_at | DateTimeField | auto_now_add | 创建时间 |
| started_at | DateTimeField | null | 开始时间 |
| finished_at | DateTimeField | null | 结束时间 |

### 4.3 FileSummaryBatchAttachment

批次与上传文件的绑定表，确保工作流只读取本批次文件。

| 字段 | 类型 | 约束 | 说明 |
| --- | --- | --- | --- |
| id | BigAutoField | PK | 主键 |
| batch | ForeignKey(FileSummaryBatch) | CASCADE | 批次 |
| attachment | ForeignKey(FileAttachment) | CASCADE | 上传文件 |
| created_at | DateTimeField | auto_now_add | 绑定时间 |

唯一约束：

```text
unique(batch, attachment)
```

### 4.4 FileSummaryItem

文件明细记录。

| 字段 | 类型 | 约束 | 说明 |
| --- | --- | --- | --- |
| id | BigAutoField | PK | 主键 |
| batch | ForeignKey(FileSummaryBatch) | CASCADE, db_index | 所属批次 |
| file_index | IntegerField | required | 文件序号 |
| directory_level | CharField(300) | blank | 目录层级 |
| file_name | CharField(255) | required | 文件名 |
| file_type | CharField(20) | required | 扩展名 |
| relative_path | CharField(500) | required | 相对路径 |
| storage_path | CharField(500) | required | 实际处理路径 |
| page_count | IntegerField | null | 页数 |
| statistics_status | CharField(20) | choices | success、failed、unsupported、uncertain、skipped |
| retry_count | IntegerField | default=0 | 重试次数 |
| error_message | TextField | blank | 异常说明 |
| created_at | DateTimeField | auto_now_add | 创建时间 |
| updated_at | DateTimeField | auto_now | 更新时间 |

唯一约束：

```text
unique(batch, relative_path)
```

### 4.5 WorkflowNodeRun

工作流节点状态记录。

| 字段 | 类型 | 约束 | 说明 |
| --- | --- | --- | --- |
| id | BigAutoField | PK | 主键 |
| batch | ForeignKey(FileSummaryBatch) | CASCADE, db_index | 批次 |
| node_code | CharField(40) | required | 节点编码 |
| node_name | CharField(80) | required | 节点名称 |
| status | CharField(20) | choices | pending、running、retrying、success、failed、skipped |
| progress | IntegerField | default=0 | 进度百分比 |
| message | TextField | blank | 节点说明 |
| started_at | DateTimeField | null | 开始时间 |
| finished_at | DateTimeField | null | 完成时间 |

唯一约束：

```text
unique(batch, node_code)
```

### 4.6 WorkflowEvent

SSE 事件持久化记录，用于页面刷新后恢复和调试。

| 字段 | 类型 | 约束 | 说明 |
| --- | --- | --- | --- |
| id | BigAutoField | PK | 主键 |
| batch | ForeignKey(FileSummaryBatch) | CASCADE, db_index | 批次 |
| event_type | CharField(40) | required | 事件类型 |
| payload | JSONField | default=dict | 事件载荷 |
| created_at | DateTimeField | auto_now_add | 创建时间 |

### 4.7 ExportedSummaryFile

导出文件记录。

| 字段 | 类型 | 约束 | 说明 |
| --- | --- | --- | --- |
| id | BigAutoField | PK | 主键 |
| batch | ForeignKey(FileSummaryBatch) | CASCADE, db_index | 批次 |
| export_type | CharField(20) | choices | markdown、excel |
| file_name | CharField(255) | required | 文件名 |
| storage_path | CharField(500) | required | 保存路径 |
| status | CharField(20) | choices | success、failed |
| error_message | TextField | blank | 异常 |
| created_at | DateTimeField | auto_now_add | 生成时间 |

下载链接运行时根据 `export_id` 生成，不建议长期存储静态 URL。

---

## 五、常量与状态设计

### 5.1 支持格式

```python
SUPPORTED_PAGE_TYPES = {"pdf", "doc", "docx", "xls", "xlsx", "ppt", "pptx"}
ARCHIVE_TYPES = {"zip", "7z", "rar"}
```

### 5.2 工作流节点

```python
WORKFLOW_NODES = [
    ("upload", "上传中"),
    ("extract", "解压中"),
    ("inventory", "扫描中"),
    ("page_count", "解析页数中"),
    ("product_detect", "识别产品名中"),
    ("report", "输出 Markdown 中"),
    ("excel_export", "输出 Excel 中"),
    ("completed", "已完成"),
]
```

### 5.3 触发词规则

`workflow_trigger.py` 先用规则判断，后续可升级为 LLM 意图识别。

```python
SUMMARY_TRIGGER_KEYWORDS = [
    "自动汇总",
    "文件目录",
    "页数",
    "统计文件",
    "汇总目录",
    "目录与页数",
]
```

规则：

| 条件 | 结果 |
| --- | --- |
| 当前对话存在未绑定或最近上传文件，且提示词命中关键词 | 启动自动汇总工作流 |
| 未命中关键词 | 走普通 LLM 对话 |
| 命中关键词但没有上传文件 | AI 回复提示“请先上传文件或压缩包” |

---

## 六、服务与方法签名

### 6.1 storage.py

```python
def save_attachment(conversation, user, uploaded_file) -> FileAttachment:
    """保存上传文件并绑定当前对话。"""

def build_batch_work_dir(batch: FileSummaryBatch) -> Path:
    """生成批次工作目录。"""

def build_export_path(batch: FileSummaryBatch, suffix: str) -> Path:
    """生成导出文件路径。"""
```

存储目录：

```text
media/review_agent/
  user_{user_id}/
    conversation_{conversation_id}/
      attachments/
      batches/
        batch_{batch_id}/
          input/
          extracted/
          exports/
```

### 6.2 archive.py

```python
def is_archive(path: Path) -> bool:
    """判断是否压缩包。"""

def extract_archive(source: Path, target_dir: Path) -> list[Path]:
    """解压 zip、7z、rar，返回解压后的文件路径列表。"""

def extract_zip(source: Path, target_dir: Path) -> list[Path]:
    """使用 zipfile 解压。"""

def extract_7z(source: Path, target_dir: Path) -> list[Path]:
    """使用 py7zr 解压。"""

def extract_rar(source: Path, target_dir: Path) -> list[Path]:
    """优先调用系统 7z 命令解压 rar。"""
```

安全规则：

| 规则 | 说明 |
| --- | --- |
| 路径穿越检查 | 解压后的最终路径必须仍在 target_dir 内 |
| 文件名清理 | 保留原名，但禁止绝对路径和上级目录跳转 |
| 解压失败 | 抛出 ArchiveExtractError，批次失败 |

### 6.3 inventory.py

```python
def scan_files(batch: FileSummaryBatch, roots: list[Path]) -> list[FileSummaryItem]:
    """扫描目录或散装文件，创建 FileSummaryItem。"""

def build_directory_level(relative_path: Path) -> str:
    """根据相对路径生成目录层级。"""

def normalize_file_type(path: Path) -> str:
    """返回小写扩展名，不含点。"""
```

### 6.4 page_count.py

```python
def count_pages(item: FileSummaryItem) -> PageCountResult:
    """根据文件类型分发页数统计。"""

def count_pages_with_retry(item: FileSummaryItem, max_retry: int = 3) -> PageCountResult:
    """失败最多重试 3 次。"""

def count_pdf(path: Path) -> int:
    """使用 pypdf 统计 PDF 页数。"""

def count_docx(path: Path) -> PageCountResult:
    """使用 python-docx 读取内置页数属性。"""

def count_doc(path: Path) -> PageCountResult:
    """使用 olefile 读取老 doc 的 OLE 元数据页数。"""

def count_xlsx(path: Path) -> int:
    """使用 openpyxl 统计工作表数量。"""

def count_xls(path: Path) -> int:
    """使用 xlrd 统计工作表数量。"""

def count_pptx(path: Path) -> int:
    """使用 python-pptx 统计幻灯片数量。"""

def count_ppt(path: Path) -> PageCountResult:
    """使用 olefile 读取老 ppt 的 OLE 元数据页数或幻灯片数。"""
```

`PageCountResult`：

```python
@dataclass
class PageCountResult:
    status: str
    page_count: int | None = None
    error_message: str = ""
```

状态规则：

| 情况 | status | page_count |
| --- | --- | --- |
| 成功读取页数 | success | 整数 |
| 不支持类型 | unsupported | None |
| 文件可读但页数无元数据 | uncertain | None |
| 解析异常且重试失败 | failed | None |

### 6.5 product_detect.py

```python
def detect_product_name(batch: FileSummaryBatch) -> ProductDetectResult:
    """从目录名、文件名和少量元数据中识别产品名。"""

def update_conversation_title(batch: FileSummaryBatch, product_name: str) -> None:
    """按规则更新对话标题。"""
```

产品名识别优先级：

| 优先级 | 来源 |
| --- | --- |
| 1 | 顶层目录名 |
| 2 | 文件名中包含“产品”“试剂盒”“说明书”等关键词的片段 |
| 3 | docx 文档属性 title |
| 4 | PDF 元数据 title |

### 6.6 report.py

```python
def build_summary_stats(batch: FileSummaryBatch) -> dict:
    """汇总统计数据。"""

def build_chat_markdown(batch: FileSummaryBatch) -> str:
    """生成对话框展示 Markdown 简表。"""

def build_full_markdown_report(batch: FileSummaryBatch) -> str:
    """生成完整 Markdown 报告。"""

def save_markdown_report(batch: FileSummaryBatch) -> ExportedSummaryFile:
    """保存 Markdown 报告并创建导出记录。"""
```

### 6.7 export_excel.py

```python
def build_excel_workbook(batch: FileSummaryBatch) -> Workbook:
    """构建 Excel Workbook。"""

def save_excel(batch: FileSummaryBatch) -> ExportedSummaryFile:
    """保存 Excel 并创建导出记录。"""
```

工作表：

| Sheet | 字段 |
| --- | --- |
| 汇总信息 | 批次编号、产品名、文件总数、成功数、失败数、不可确定数、总页数 |
| 文件明细 | 序号、目录层级、文件名、类型、页数、相对路径、状态、重试次数、异常说明 |

---

## 七、Skill 详细设计

### 7.1 BaseSkill

```python
class BaseSkill:
    name: str
    node_code: str

    def run(self, context: WorkflowContext) -> SkillResult:
        raise NotImplementedError
```

`WorkflowContext`：

```python
@dataclass
class WorkflowContext:
    batch_id: int
    conversation_id: int
    user_id: int
    message_id: int | None = None
```

`SkillResult`：

```python
@dataclass
class SkillResult:
    success: bool
    message: str = ""
    data: dict = field(default_factory=dict)
```

### 7.2 Skill 列表

| Skill 类名 | 节点 | 调用服务 |
| --- | --- | --- |
| UploadIntakeSkill | upload | storage.py |
| ArchiveExtractSkill | extract | archive.py |
| FileInventorySkill | inventory | inventory.py |
| DocumentPageCountSkill | page_count | page_count.py |
| ProductDetectSkill | product_detect | product_detect.py |
| SummaryReportSkill | report | report.py |
| ExcelExportSkill | excel_export | export_excel.py |

---

## 八、工作流执行器详细设计

### 8.1 执行入口

```python
def start_file_summary_workflow(batch_id: int) -> None:
    thread = threading.Thread(
        target=WorkflowExecutor().run,
        args=(batch_id,),
        daemon=True,
    )
    thread.start()
```

### 8.2 执行伪代码

```python
class WorkflowExecutor:
    def run(self, batch_id: int) -> None:
        batch = FileSummaryBatch.objects.get(pk=batch_id)
        self.mark_batch_running(batch)
        self.emit("workflow_started", batch, {"batch_id": batch.id})

        try:
            for node_code in self.resolve_nodes(batch):
                self.run_node(batch, node_code)
            self.mark_batch_success(batch)
            self.emit("workflow_completed", batch, self.build_completed_payload(batch))
        except Exception as exc:
            self.mark_batch_failed(batch, str(exc))
            self.emit("workflow_failed", batch, {"message": str(exc)})
```

### 8.3 节点跳过规则

| 节点 | 跳过条件 |
| --- | --- |
| extract | 当前批次没有压缩包 |
| product_detect | 没有任何可用于识别的文件名、目录名或元数据 |

---

## 九、接口详细设计

### 9.1 上传暂存接口

```text
POST /api/review-agent/conversations/{conversation_id}/attachments/
Content-Type: multipart/form-data
```

请求：

| 参数 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| files[] | File[] | 是 | 一个或多个文件 |

响应：

```json
{
  "attachments": [
    {
      "id": 101,
      "original_name": "注册资料.zip",
      "file_size": 204800,
      "upload_status": "uploaded"
    }
  ]
}
```

权限：

```text
conversation.user 必须等于 request.user
```

### 9.2 发送消息并按需触发工作流

沿用现有 `POST /chat/stream/` SSE 能力，在 `stream_chat` 中增加判断：

```text
用户发送 prompt
-> 保存 Message
-> 判断 prompt 是否命中自动汇总工作流
-> 命中则创建 FileSummaryBatch 并启动后台工作流
-> SSE 返回 workflow_meta
-> 未命中则走原 LLM 流式回复
```

新增 SSE meta：

```json
{
  "conversation_id": 1,
  "title": "新对话",
  "workflow": {
    "type": "file_summary",
    "batch_id": 12,
    "status": "running"
  }
}
```

### 9.3 查询批次状态

```text
GET /api/review-agent/file-summary/{batch_id}/
```

响应：

```json
{
  "batch": {
    "id": 12,
    "batch_no": "FS202606050001",
    "status": "running",
    "product_name": "",
    "total_files": 24,
    "success_files": 10,
    "failed_files": 1,
    "uncertain_files": 2,
    "total_pages": 180
  },
  "nodes": [
    {
      "node_code": "page_count",
      "node_name": "解析页数中",
      "status": "running",
      "progress": 45,
      "message": "正在解析 11/24"
    }
  ],
  "exports": []
}
```

### 9.4 工作流事件流

```text
GET /api/review-agent/file-summary/{batch_id}/events/?after={event_id}
```

响应类型：`text/event-stream`

事件：

```text
event: node_progress
data: {"event_id": 301, "batch_id": 12, "node_code": "page_count", "status": "running", "progress": 45, "message": "正在解析 11/24"}
```

### 9.5 下载导出文件

```text
GET /api/review-agent/file-summary/exports/{export_id}/download/
```

权限：

```text
ExportedSummaryFile -> batch -> conversation -> user 必须为当前用户
```

---

## 十、前端详细设计

### 10.1 三栏布局

页面调整为三栏：

| 区域 | 内容 |
| --- | --- |
| 左侧栏 | 对话历史 |
| 中间栏 | 聊天消息、输入框 |
| 右侧栏上半部分 | 拖拽式文件导入区 |
| 右侧栏下半部分 | 工作流卡片列表 |

HTML 结构建议：

```html
<main class="workspace three-column">
  <aside class="sidebar"></aside>
  <section class="chat-shell"></section>
  <aside class="workflow-panel">
    <section class="upload-dropzone" id="uploadDropzone"></section>
    <section class="workflow-card-list" id="workflowCardList"></section>
  </aside>
</main>
```

### 10.2 上传交互

JS 方法：

```javascript
function bindUploadDropzone()
function uploadConversationFiles(files)
function renderAttachmentList(attachments)
```

流程：

```text
用户拖拽或选择文件
-> POST attachments 接口
-> 保存成功后右侧上传区展示文件名
-> 不启动工作流
-> 用户发送提示词
-> 命中工作流后创建工作流卡片
```

### 10.3 工作流卡片

JS 方法：

```javascript
function createWorkflowCard(batch)
function updateWorkflowNode(batchId, nodePayload)
function markWorkflowCompleted(batchId, payload)
function markWorkflowFailed(batchId, payload)
function connectWorkflowEvents(batchId)
function restoreWorkflowCards()
```

卡片结构：

```html
<article class="workflow-card" data-batch-id="12">
  <header>
    <strong>文件目录与页数汇总</strong>
    <span class="workflow-status">运行中</span>
  </header>
  <ol class="workflow-nodes">
    <li data-node-code="upload">上传中</li>
    <li data-node-code="extract">解压中</li>
    <li data-node-code="inventory">扫描中</li>
    <li data-node-code="page_count">解析页数中</li>
    <li data-node-code="product_detect">识别产品名中</li>
    <li data-node-code="report">输出 Markdown 中</li>
    <li data-node-code="excel_export">输出 Excel 中</li>
  </ol>
</article>
```

### 10.4 Markdown 渲染

现有消息使用 `nl2br`，无法正常渲染 Markdown 表格。需要改造：

| 消息类型 | 渲染策略 |
| --- | --- |
| 普通用户消息 | escapeHtml + nl2br |
| 普通助手消息 | 安全 Markdown 渲染 |
| 文件汇总结果 | 安全 Markdown 渲染，允许 table、a、strong、code |

可选方案：

| 方案 | 说明 |
| --- | --- |
| 前端 marked + DOMPurify | 渲染体验好，但增加前端依赖 |
| 后端 markdown + bleach | 后端输出安全 HTML，前端直接展示 |

Demo 建议使用前端 `marked` + `DOMPurify` CDN 或本地静态文件。

---

## 十一、对话标题更新设计

产品名识别成功后更新标题：

```python
def update_conversation_title(batch, product_name):
    conversation = batch.conversation
    if conversation.title.startswith("新对话"):
        conversation.title = f"{product_name}-文件汇总"[:120]
        conversation.save(update_fields=["title", "updated_at"])
```

规则：

| 场景 | 处理 |
| --- | --- |
| 新对话默认标题 | 更新为产品名 |
| 用户已有自定义标题 | 不覆盖 |
| 产品名为空 | 不更新 |

---

## 十二、测试设计

### 12.1 单元测试

| 用例 | 目标 |
| --- | --- |
| test_trigger_keywords | 提示词命中时触发自动汇总 |
| test_save_attachment_binds_conversation | 上传文件绑定当前对话 |
| test_zip_extract_safe_path | zip 解压禁止路径穿越 |
| test_scan_files_builds_relative_path | 扫描生成正确相对路径 |
| test_count_pdf_pages | PDF 页数统计 |
| test_count_xlsx_sheets | xlsx 工作表数量统计 |
| test_count_pptx_slides | pptx 幻灯片数量统计 |
| test_retry_three_times | 单文件失败重试 3 次 |
| test_uncertain_old_doc | 老 doc 元数据缺失时标记 uncertain |

### 12.2 接口测试

| 用例 | 目标 |
| --- | --- |
| test_upload_attachment_api | 上传接口返回 attachment_id |
| test_upload_permission_denied | 不能向他人对话上传文件 |
| test_stream_triggers_workflow | 发送命中提示词后返回 workflow meta |
| test_batch_status_permission | 不能查询他人批次 |
| test_export_download_permission | 不能下载他人导出文件 |

### 12.3 集成测试

| 用例 | 目标 |
| --- | --- |
| test_file_summary_zip_workflow | zip 上传后完整工作流成功 |
| test_file_summary_multi_file_workflow | 多文件上传后完整工作流成功 |
| test_single_file_failure_not_blocking | 单文件失败不阻断批次 |
| test_workflow_events_created | 节点事件按顺序写入数据库 |
| test_markdown_and_excel_exports | Markdown 与 Excel 文件生成成功 |

### 12.4 前端验证

| 用例 | 目标 |
| --- | --- |
| 拖拽上传 | 右侧上传区展示文件列表 |
| 提示词触发 | 发送“自动汇总文件目录与页数”后创建工作流卡片 |
| 状态实时更新 | SSE 事件驱动节点状态变化 |
| 页面刷新恢复 | 刷新后右侧卡片恢复当前批次状态 |
| Markdown 表格 | 对话消息中表格和下载链接正常显示 |

---

## 十三、开发顺序

1. 增加依赖与模型字段，生成迁移。
2. 实现文件上传暂存接口和存储目录策略。
3. 实现 workflow_trigger，根据提示词决定是否启动工作流。
4. 实现 SkillRegistry、WorkflowExecutor 和 WorkflowEvent。
5. 实现压缩包解压、文件扫描、页数统计服务。
6. 实现 Markdown 报告与 Excel 导出。
7. 改造前端三栏布局、拖拽上传区和工作流卡片。
8. 增加 Markdown 渲染能力。
9. 补齐权限测试、工作流测试和前端手工验证。

---

## 十四、参考依据

本设计采用轻量 Python 库优先方案，依据如下：

| 能力 | 依据 |
| --- | --- |
| PDF 页数 | pypdf 的 PdfReader 可读取 pages |
| docx 元数据 | python-docx 支持 core properties |
| pptx 幻灯片 | python-pptx 可读取 presentation slides |
| xlsx 工作表 | openpyxl 可读取 workbook worksheets |
| xls 工作表 | xlrd 支持读取历史 xls 工作簿 |
| 老 Office 元数据 | olefile 可读取 OLE2 复合文档结构 |
| 7z 解压 | py7zr 支持 7z 压缩格式处理 |
| rar 解压 | rarfile 通常依赖外部 unrar/unar/7z 工具，故本设计优先系统 7z |