Files

bruce b7a3d512c0 docs(materials): 整理文档目录并补充法规材料

2026-06-05 23:39:38 +08:00

26 KiB

Raw Blame History

自动汇总文件夹文件目录与页数流程详细设计

文档信息

项目	内容
需求分析文档	docs/需求分析/1.自动汇总.md
功能设计文档	docs/功能设计/1.自动汇总.md
功能名称	自动汇总文件夹文件目录与页数
所属模块	审核智能体 review_agent
设计日期	2026-06-05
设计版本	V1.0

一、详细设计目标

本详细设计用于指导“自动汇总文件夹文件目录与页数”功能开发落地，覆盖代码目录、数据模型、接口契约、后台工作流、Skill 拆分、轻量依赖、前端三栏布局、SSE 实时状态、异常重试和测试用例。

核心约束：

约束	说明
对话绑定	上传文件与当前 Conversation 绑定，一个对话对应一套文件，不能串文件
上传即存储	用户拖拽或选择文件后立即保存，但不启动工作流
提示词触发	用户发送消息后，根据提示词判断是否启动自动汇总工作流
后台异步	工作流后台执行，右侧第三栏工作流卡片实时更新
轻量依赖	优先使用 Python 内部库和轻量第三方库，不强依赖 LibreOffice
老格式支持	doc、xls、ppt 进入处理流程，能读到页数则统计，读不到则记录异常
结果存档	批次、文件、节点、事件、明细、导出文件全部入库

二、代码结构设计

2.1 目录结构

在现有 review_agent 应用内按模块重新划分文件处理能力。Django 模型仍集中放在 review_agent/models.py，其余代码放入 review_agent/file_summary/。

review_agent/
  models.py
  urls.py
  views.py
  services.py
  file_summary/
    __init__.py
    constants.py
    schemas.py
    storage.py
    workflow.py
    events.py
    urls.py
    views.py
    services/
      __init__.py
      archive.py
      inventory.py
      page_count.py
      product_detect.py
      report.py
      export_excel.py
      workflow_trigger.py
    skills/
      __init__.py
      base.py
      registry.py
      upload_intake.py
      archive_extract.py
      file_inventory.py
      document_page_count.py
      product_detect.py
      summary_report.py
      excel_export.py

2.2 文件职责

文件	职责
review_agent/models.py	集中定义 Conversation、Message、文件汇总相关模型
file_summary/constants.py	状态、节点、文件类型、事件类型常量
file_summary/schemas.py	dataclass 入参出参结构，避免业务层直接传散乱 dict
file_summary/storage.py	上传文件、工作目录、导出文件路径生成与保存
file_summary/workflow.py	WorkflowExecutor，串行执行节点图
file_summary/events.py	工作流事件持久化与 SSE 格式化
file_summary/views.py	上传暂存、启动工作流、状态查询、SSE、下载接口
services/archive.py	压缩包识别、zip/7z/rar 解压
services/inventory.py	文件遍历与清单生成
services/page_count.py	文件页数统计与 3 次重试
services/product_detect.py	产品名识别
services/report.py	Markdown 报告和对话简表生成
services/export_excel.py	Excel 文件导出
services/workflow_trigger.py	根据提示词判断是否触发自动汇总工作流
skills/base.py	Skill 基类与统一返回结构
skills/registry.py	Skill 注册与按需加载
skills/*.py	各工作流节点对应 Skill

三、依赖设计

3.1 requirements 建议

Django==5.2.14
pypdf
python-docx
python-pptx
openpyxl
xlrd
olefile
py7zr

3.2 格式处理策略

格式	处理库	统计口径	失败策略
pdf	pypdf	PDF 页面数	重试 3 次，仍失败记录异常
docx	python-docx	优先读取内置页数属性	读不到记录“页数不可确定”
doc	olefile	读取 OLE 元数据页数	读不到记录“页数不可确定”
pptx	python-pptx	幻灯片数量	重试 3 次，仍失败记录异常
ppt	olefile	读取 OLE 元数据页数/幻灯片数	读不到记录“页数不可确定”
xlsx	openpyxl	工作表数量	重试 3 次，仍失败记录异常
xls	xlrd	工作表数量	重试 3 次，仍失败记录异常

3.3 压缩包处理策略

格式	处理方式	说明
zip	Python 标准库 zipfile	必须支持
7z	py7zr	必须支持
rar	优先系统 7z 命令	Docker 镜像需安装 7-Zip/p7zip

3.4 Docker 部署说明

Demo 运行不强依赖 LibreOffice。若未来要求 doc/docx/ppt/pptx 页数与 Office 打开后的分页完全一致，可在 Docker 镜像中额外安装 LibreOffice headless，再通过“转换 PDF 后统计页数”的增强策略实现。

RAR 解压如需稳定支持，Docker 镜像需要安装 7-Zip/p7zip，并确保 7z 命令在 PATH 中可调用。

四、数据模型详细设计

模型集中放在 review_agent/models.py，按“会话模型”和“文件汇总模型”分段。

4.1 FileAttachment

用户上传即存储的文件记录。此时尚未启动工作流。

字段	类型	约束	说明
id	BigAutoField	PK	主键
conversation	ForeignKey(Conversation)	CASCADE, db_index	绑定对话
user	ForeignKey(User)	CASCADE, db_index	上传用户
original_name	CharField(255)	required	原始文件名
storage_path	CharField(500)	required	本地保存路径
file_size	BigIntegerField	default=0	文件大小
content_type	CharField(120)	blank	MIME 类型
upload_status	CharField(20)	choices	uploaded、bound、deleted
created_at	DateTimeField	auto_now_add	上传时间

索引：

(conversation, created_at)
(user, created_at)

4.2 FileSummaryBatch

一次自动汇总工作流批次。

字段	类型	约束	说明
id	BigAutoField	PK	主键
conversation	ForeignKey(Conversation)	CASCADE, db_index	绑定对话
user	ForeignKey(User)	CASCADE, db_index	执行用户
trigger_message	ForeignKey(Message)	SET_NULL, null	触发工作流的用户消息
batch_no	CharField(64)	unique	批次编号
product_name	CharField(200)	blank	产品名称
status	CharField(20)	choices	pending、running、success、failed
total_files	IntegerField	default=0	文件总数
supported_files	IntegerField	default=0	支持统计数
success_files	IntegerField	default=0	成功数
failed_files	IntegerField	default=0	失败数
unsupported_files	IntegerField	default=0	不支持数
uncertain_files	IntegerField	default=0	页数不可确定数
total_pages	IntegerField	default=0	总页数
work_dir	CharField(500)	blank	工作目录
error_message	TextField	blank	批次错误
created_at	DateTimeField	auto_now_add	创建时间
started_at	DateTimeField	null	开始时间
finished_at	DateTimeField	null	结束时间

4.3 FileSummaryBatchAttachment

批次与上传文件的绑定表，确保工作流只读取本批次文件。

字段	类型	约束	说明
id	BigAutoField	PK	主键
batch	ForeignKey(FileSummaryBatch)	CASCADE	批次
attachment	ForeignKey(FileAttachment)	CASCADE	上传文件
created_at	DateTimeField	auto_now_add	绑定时间

唯一约束：

unique(batch, attachment)

4.4 FileSummaryItem

文件明细记录。

字段	类型	约束	说明
id	BigAutoField	PK	主键
batch	ForeignKey(FileSummaryBatch)	CASCADE, db_index	所属批次
file_index	IntegerField	required	文件序号
directory_level	CharField(300)	blank	目录层级
file_name	CharField(255)	required	文件名
file_type	CharField(20)	required	扩展名
relative_path	CharField(500)	required	相对路径
storage_path	CharField(500)	required	实际处理路径
page_count	IntegerField	null	页数
statistics_status	CharField(20)	choices	success、failed、unsupported、uncertain、skipped
retry_count	IntegerField	default=0	重试次数
error_message	TextField	blank	异常说明
created_at	DateTimeField	auto_now_add	创建时间
updated_at	DateTimeField	auto_now	更新时间

唯一约束：

unique(batch, relative_path)

4.5 WorkflowNodeRun

工作流节点状态记录。

字段	类型	约束	说明
id	BigAutoField	PK	主键
batch	ForeignKey(FileSummaryBatch)	CASCADE, db_index	批次
node_code	CharField(40)	required	节点编码
node_name	CharField(80)	required	节点名称
status	CharField(20)	choices	pending、running、retrying、success、failed、skipped
progress	IntegerField	default=0	进度百分比
message	TextField	blank	节点说明
started_at	DateTimeField	null	开始时间
finished_at	DateTimeField	null	完成时间

唯一约束：

unique(batch, node_code)

4.6 WorkflowEvent

SSE 事件持久化记录，用于页面刷新后恢复和调试。

字段	类型	约束	说明
id	BigAutoField	PK	主键
batch	ForeignKey(FileSummaryBatch)	CASCADE, db_index	批次
event_type	CharField(40)	required	事件类型
payload	JSONField	default=dict	事件载荷
created_at	DateTimeField	auto_now_add	创建时间

4.7 ExportedSummaryFile

导出文件记录。

字段	类型	约束	说明
id	BigAutoField	PK	主键
batch	ForeignKey(FileSummaryBatch)	CASCADE, db_index	批次
export_type	CharField(20)	choices	markdown、excel
file_name	CharField(255)	required	文件名
storage_path	CharField(500)	required	保存路径
status	CharField(20)	choices	success、failed
error_message	TextField	blank	异常
created_at	DateTimeField	auto_now_add	生成时间

下载链接运行时根据 export_id 生成，不建议长期存储静态 URL。

五、常量与状态设计

5.1 支持格式

SUPPORTED_PAGE_TYPES = {"pdf", "doc", "docx", "xls", "xlsx", "ppt", "pptx"}
ARCHIVE_TYPES = {"zip", "7z", "rar"}

5.2 工作流节点

WORKFLOW_NODES = [
    ("upload", "上传中"),
    ("extract", "解压中"),
    ("inventory", "扫描中"),
    ("page_count", "解析页数中"),
    ("product_detect", "识别产品名中"),
    ("report", "输出 Markdown 中"),
    ("excel_export", "输出 Excel 中"),
    ("completed", "已完成"),
]

5.3 触发词规则

workflow_trigger.py 先用规则判断，后续可升级为 LLM 意图识别。

SUMMARY_TRIGGER_KEYWORDS = [
    "自动汇总",
    "文件目录",
    "页数",
    "统计文件",
    "汇总目录",
    "目录与页数",
]

规则：

条件	结果
当前对话存在未绑定或最近上传文件，且提示词命中关键词	启动自动汇总工作流
未命中关键词	走普通 LLM 对话
命中关键词但没有上传文件	AI 回复提示“请先上传文件或压缩包”

六、服务与方法签名

6.1 storage.py

def save_attachment(conversation, user, uploaded_file) -> FileAttachment:
    """保存上传文件并绑定当前对话。"""

def build_batch_work_dir(batch: FileSummaryBatch) -> Path:
    """生成批次工作目录。"""

def build_export_path(batch: FileSummaryBatch, suffix: str) -> Path:
    """生成导出文件路径。"""

存储目录：

media/review_agent/
  user_{user_id}/
    conversation_{conversation_id}/
      attachments/
      batches/
        batch_{batch_id}/
          input/
          extracted/
          exports/

6.2 archive.py

def is_archive(path: Path) -> bool:
    """判断是否压缩包。"""

def extract_archive(source: Path, target_dir: Path) -> list[Path]:
    """解压 zip、7z、rar，返回解压后的文件路径列表。"""

def extract_zip(source: Path, target_dir: Path) -> list[Path]:
    """使用 zipfile 解压。"""

def extract_7z(source: Path, target_dir: Path) -> list[Path]:
    """使用 py7zr 解压。"""

def extract_rar(source: Path, target_dir: Path) -> list[Path]:
    """优先调用系统 7z 命令解压 rar。"""

安全规则：

规则	说明
路径穿越检查	解压后的最终路径必须仍在 target_dir 内
文件名清理	保留原名，但禁止绝对路径和上级目录跳转
解压失败	抛出 ArchiveExtractError，批次失败

6.3 inventory.py

def scan_files(batch: FileSummaryBatch, roots: list[Path]) -> list[FileSummaryItem]:
    """扫描目录或散装文件，创建 FileSummaryItem。"""

def build_directory_level(relative_path: Path) -> str:
    """根据相对路径生成目录层级。"""

def normalize_file_type(path: Path) -> str:
    """返回小写扩展名，不含点。"""

6.4 page_count.py

def count_pages(item: FileSummaryItem) -> PageCountResult:
    """根据文件类型分发页数统计。"""

def count_pages_with_retry(item: FileSummaryItem, max_retry: int = 3) -> PageCountResult:
    """失败最多重试 3 次。"""

def count_pdf(path: Path) -> int:
    """使用 pypdf 统计 PDF 页数。"""

def count_docx(path: Path) -> PageCountResult:
    """使用 python-docx 读取内置页数属性。"""

def count_doc(path: Path) -> PageCountResult:
    """使用 olefile 读取老 doc 的 OLE 元数据页数。"""

def count_xlsx(path: Path) -> int:
    """使用 openpyxl 统计工作表数量。"""

def count_xls(path: Path) -> int:
    """使用 xlrd 统计工作表数量。"""

def count_pptx(path: Path) -> int:
    """使用 python-pptx 统计幻灯片数量。"""

def count_ppt(path: Path) -> PageCountResult:
    """使用 olefile 读取老 ppt 的 OLE 元数据页数或幻灯片数。"""

PageCountResult：

@dataclass
class PageCountResult:
    status: str
    page_count: int | None = None
    error_message: str = ""

状态规则：

情况	status	page_count
成功读取页数	success	整数
不支持类型	unsupported	None
文件可读但页数无元数据	uncertain	None
解析异常且重试失败	failed	None

6.5 product_detect.py

def detect_product_name(batch: FileSummaryBatch) -> ProductDetectResult:
    """从目录名、文件名和少量元数据中识别产品名。"""

def update_conversation_title(batch: FileSummaryBatch, product_name: str) -> None:
    """按规则更新对话标题。"""

产品名识别优先级：

优先级	来源
1	顶层目录名
2	文件名中包含“产品”“试剂盒”“说明书”等关键词的片段
3	docx 文档属性 title
4	PDF 元数据 title

6.6 report.py

def build_summary_stats(batch: FileSummaryBatch) -> dict:
    """汇总统计数据。"""

def build_chat_markdown(batch: FileSummaryBatch) -> str:
    """生成对话框展示 Markdown 简表。"""

def build_full_markdown_report(batch: FileSummaryBatch) -> str:
    """生成完整 Markdown 报告。"""

def save_markdown_report(batch: FileSummaryBatch) -> ExportedSummaryFile:
    """保存 Markdown 报告并创建导出记录。"""

6.7 export_excel.py

def build_excel_workbook(batch: FileSummaryBatch) -> Workbook:
    """构建 Excel Workbook。"""

def save_excel(batch: FileSummaryBatch) -> ExportedSummaryFile:
    """保存 Excel 并创建导出记录。"""

工作表：

Sheet	字段
汇总信息	批次编号、产品名、文件总数、成功数、失败数、不可确定数、总页数
文件明细	序号、目录层级、文件名、类型、页数、相对路径、状态、重试次数、异常说明

七、Skill 详细设计

7.1 BaseSkill

class BaseSkill:
    name: str
    node_code: str

    def run(self, context: WorkflowContext) -> SkillResult:
        raise NotImplementedError

WorkflowContext：

@dataclass
class WorkflowContext:
    batch_id: int
    conversation_id: int
    user_id: int
    message_id: int | None = None

SkillResult：

@dataclass
class SkillResult:
    success: bool
    message: str = ""
    data: dict = field(default_factory=dict)

7.2 Skill 列表

Skill 类名	节点	调用服务
UploadIntakeSkill	upload	storage.py
ArchiveExtractSkill	extract	archive.py
FileInventorySkill	inventory	inventory.py
DocumentPageCountSkill	page_count	page_count.py
ProductDetectSkill	product_detect	product_detect.py
SummaryReportSkill	report	report.py
ExcelExportSkill	excel_export	export_excel.py

八、工作流执行器详细设计

8.1 执行入口

def start_file_summary_workflow(batch_id: int) -> None:
    thread = threading.Thread(
        target=WorkflowExecutor().run,
        args=(batch_id,),
        daemon=True,
    )
    thread.start()

8.2 执行伪代码

class WorkflowExecutor:
    def run(self, batch_id: int) -> None:
        batch = FileSummaryBatch.objects.get(pk=batch_id)
        self.mark_batch_running(batch)
        self.emit("workflow_started", batch, {"batch_id": batch.id})

        try:
            for node_code in self.resolve_nodes(batch):
                self.run_node(batch, node_code)
            self.mark_batch_success(batch)
            self.emit("workflow_completed", batch, self.build_completed_payload(batch))
        except Exception as exc:
            self.mark_batch_failed(batch, str(exc))
            self.emit("workflow_failed", batch, {"message": str(exc)})

8.3 节点跳过规则

节点	跳过条件
extract	当前批次没有压缩包
product_detect	没有任何可用于识别的文件名、目录名或元数据

九、接口详细设计

9.1 上传暂存接口

POST /api/review-agent/conversations/{conversation_id}/attachments/
Content-Type: multipart/form-data

请求：

参数	类型	必填	说明
files[]	File[]	是	一个或多个文件

响应：

{
  "attachments": [
    {
      "id": 101,
      "original_name": "注册资料.zip",
      "file_size": 204800,
      "upload_status": "uploaded"
    }
  ]
}

权限：

conversation.user 必须等于 request.user

9.2 发送消息并按需触发工作流

沿用现有 POST /chat/stream/ SSE 能力，在 stream_chat 中增加判断：

用户发送 prompt
-> 保存 Message
-> 判断 prompt 是否命中自动汇总工作流
-> 命中则创建 FileSummaryBatch 并启动后台工作流
-> SSE 返回 workflow_meta
-> 未命中则走原 LLM 流式回复

新增 SSE meta：

{
  "conversation_id": 1,
  "title": "新对话",
  "workflow": {
    "type": "file_summary",
    "batch_id": 12,
    "status": "running"
  }
}

9.3 查询批次状态

GET /api/review-agent/file-summary/{batch_id}/

响应：

{
  "batch": {
    "id": 12,
    "batch_no": "FS202606050001",
    "status": "running",
    "product_name": "",
    "total_files": 24,
    "success_files": 10,
    "failed_files": 1,
    "uncertain_files": 2,
    "total_pages": 180
  },
  "nodes": [
    {
      "node_code": "page_count",
      "node_name": "解析页数中",
      "status": "running",
      "progress": 45,
      "message": "正在解析 11/24"
    }
  ],
  "exports": []
}

9.4 工作流事件流

GET /api/review-agent/file-summary/{batch_id}/events/?after={event_id}

响应类型：text/event-stream

事件：

event: node_progress
data: {"event_id": 301, "batch_id": 12, "node_code": "page_count", "status": "running", "progress": 45, "message": "正在解析 11/24"}

9.5 下载导出文件

GET /api/review-agent/file-summary/exports/{export_id}/download/

权限：

ExportedSummaryFile -> batch -> conversation -> user 必须为当前用户

十、前端详细设计

10.1 三栏布局

页面调整为三栏：

区域	内容
左侧栏	对话历史
中间栏	聊天消息、输入框
右侧栏上半部分	拖拽式文件导入区
右侧栏下半部分	工作流卡片列表

HTML 结构建议：

<main class="workspace three-column">
  <aside class="sidebar"></aside>
  <section class="chat-shell"></section>
  <aside class="workflow-panel">
    <section class="upload-dropzone" id="uploadDropzone"></section>
    <section class="workflow-card-list" id="workflowCardList"></section>
  </aside>
</main>

10.2 上传交互

JS 方法：

function bindUploadDropzone()
function uploadConversationFiles(files)
function renderAttachmentList(attachments)

流程：

用户拖拽或选择文件
-> POST attachments 接口
-> 保存成功后右侧上传区展示文件名
-> 不启动工作流
-> 用户发送提示词
-> 命中工作流后创建工作流卡片

10.3 工作流卡片

JS 方法：

function createWorkflowCard(batch)
function updateWorkflowNode(batchId, nodePayload)
function markWorkflowCompleted(batchId, payload)
function markWorkflowFailed(batchId, payload)
function connectWorkflowEvents(batchId)
function restoreWorkflowCards()

卡片结构：

<article class="workflow-card" data-batch-id="12">
  <header>
    <strong>文件目录与页数汇总</strong>
    <span class="workflow-status">运行中</span>
  </header>
  <ol class="workflow-nodes">
    <li data-node-code="upload">上传中</li>
    <li data-node-code="extract">解压中</li>
    <li data-node-code="inventory">扫描中</li>
    <li data-node-code="page_count">解析页数中</li>
    <li data-node-code="product_detect">识别产品名中</li>
    <li data-node-code="report">输出 Markdown 中</li>
    <li data-node-code="excel_export">输出 Excel 中</li>
  </ol>
</article>

10.4 Markdown 渲染

现有消息使用 nl2br，无法正常渲染 Markdown 表格。需要改造：

消息类型	渲染策略
普通用户消息	escapeHtml + nl2br
普通助手消息	安全 Markdown 渲染
文件汇总结果	安全 Markdown 渲染，允许 table、a、strong、code

可选方案：

方案	说明
前端 marked + DOMPurify	渲染体验好，但增加前端依赖
后端 markdown + bleach	后端输出安全 HTML，前端直接展示

Demo 建议使用前端 marked + DOMPurify CDN 或本地静态文件。

十一、对话标题更新设计

产品名识别成功后更新标题：

def update_conversation_title(batch, product_name):
    conversation = batch.conversation
    if conversation.title.startswith("新对话"):
        conversation.title = f"{product_name}-文件汇总"[:120]
        conversation.save(update_fields=["title", "updated_at"])

规则：

场景	处理
新对话默认标题	更新为产品名
用户已有自定义标题	不覆盖
产品名为空	不更新

十二、测试设计

12.1 单元测试

用例	目标
test_trigger_keywords	提示词命中时触发自动汇总
test_save_attachment_binds_conversation	上传文件绑定当前对话
test_zip_extract_safe_path	zip 解压禁止路径穿越
test_scan_files_builds_relative_path	扫描生成正确相对路径
test_count_pdf_pages	PDF 页数统计
test_count_xlsx_sheets	xlsx 工作表数量统计
test_count_pptx_slides	pptx 幻灯片数量统计
test_retry_three_times	单文件失败重试 3 次
test_uncertain_old_doc	老 doc 元数据缺失时标记 uncertain

12.2 接口测试

用例	目标
test_upload_attachment_api	上传接口返回 attachment_id
test_upload_permission_denied	不能向他人对话上传文件
test_stream_triggers_workflow	发送命中提示词后返回 workflow meta
test_batch_status_permission	不能查询他人批次
test_export_download_permission	不能下载他人导出文件

12.3 集成测试

用例	目标
test_file_summary_zip_workflow	zip 上传后完整工作流成功
test_file_summary_multi_file_workflow	多文件上传后完整工作流成功
test_single_file_failure_not_blocking	单文件失败不阻断批次
test_workflow_events_created	节点事件按顺序写入数据库
test_markdown_and_excel_exports	Markdown 与 Excel 文件生成成功

12.4 前端验证

用例	目标
拖拽上传	右侧上传区展示文件列表
提示词触发	发送“自动汇总文件目录与页数”后创建工作流卡片
状态实时更新	SSE 事件驱动节点状态变化
页面刷新恢复	刷新后右侧卡片恢复当前批次状态
Markdown 表格	对话消息中表格和下载链接正常显示

十三、开发顺序

增加依赖与模型字段，生成迁移。
实现文件上传暂存接口和存储目录策略。
实现 workflow_trigger，根据提示词决定是否启动工作流。
实现 SkillRegistry、WorkflowExecutor 和 WorkflowEvent。
实现压缩包解压、文件扫描、页数统计服务。
实现 Markdown 报告与 Excel 导出。
改造前端三栏布局、拖拽上传区和工作流卡片。
增加 Markdown 渲染能力。
补齐权限测试、工作流测试和前端手工验证。

十四、参考依据

本设计采用轻量 Python 库优先方案，依据如下：

能力	依据
PDF 页数	pypdf 的 PdfReader 可读取 pages
docx 元数据	python-docx 支持 core properties
pptx 幻灯片	python-pptx 可读取 presentation slides
xlsx 工作表	openpyxl 可读取 workbook worksheets
xls 工作表	xlrd 支持读取历史 xls 工作簿
老 Office 元数据	olefile 可读取 OLE2 复合文档结构
7z 解压	py7zr 支持 7z 压缩格式处理
rar 解压	rarfile 通常依赖外部 unrar/unar/7z 工具，故本设计优先系统 7z

26 KiB Raw Blame History Unescape Escape

自动汇总文件夹文件目录与页数流程详细设计

文档信息

一、详细设计目标

二、代码结构设计

2.1 目录结构

2.2 文件职责

三、依赖设计

3.1 requirements 建议

3.2 格式处理策略

3.3 压缩包处理策略

3.4 Docker 部署说明

四、数据模型详细设计

4.1 FileAttachment

4.2 FileSummaryBatch

4.3 FileSummaryBatchAttachment

4.4 FileSummaryItem

4.5 WorkflowNodeRun

4.6 WorkflowEvent

4.7 ExportedSummaryFile

五、常量与状态设计

5.1 支持格式

5.2 工作流节点

5.3 触发词规则

六、服务与方法签名

6.1 storage.py

6.2 archive.py

6.3 inventory.py

6.4 page_count.py

6.5 product_detect.py

6.6 report.py

6.7 export_excel.py

七、Skill 详细设计

7.1 BaseSkill

7.2 Skill 列表

八、工作流执行器详细设计

8.1 执行入口

8.2 执行伪代码

8.3 节点跳过规则

九、接口详细设计

9.1 上传暂存接口

9.2 发送消息并按需触发工作流

9.3 查询批次状态

9.4 工作流事件流

9.5 下载导出文件

十、前端详细设计

10.1 三栏布局

10.2 上传交互

10.3 工作流卡片

10.4 Markdown 渲染

十一、对话标题更新设计

十二、测试设计

12.1 单元测试

12.2 接口测试

12.3 集成测试

12.4 前端验证

十三、开发顺序

十四、参考依据

26 KiB

Raw Blame History