Docx Cn技能使用说明

2026-03-28 新闻来源：网淘吧围观:122

电脑广告

手机广告

DOCX文件的创建、编辑与分析

概述

.docx文件是一个包含XML文件的ZIP压缩包。

快速参考

任务	方法
读取/分析内容	`使用 pandoc`或解压以获取原始XML
创建新文档	使用`docx-js`- 请参见下文“创建新文档”
编辑现有文档	解压 → 编辑XML → 重新打包 - 请参见下文“编辑现有文档”

将.doc转换为.docx

旧版.doc文件在编辑前必须进行转换：

python scripts/office/soffice.py --headless --convert-to docx document.doc

读取内容

# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md

# Raw XML access
python scripts/office/unpack.py document.docx unpacked/

转换为图像

python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page

接受修订

要生成一份接受所有修订的干净文档（需要LibreOffice）：

python scripts/accept_changes.py input.docx output.docx

创建新文档

使用JavaScript生成.docx文件，然后进行验证。安装：npm install -g docx

设置

const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
        Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
        TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
        VerticalAlign, PageNumber, PageBreak } = require('docx');

const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));

验证

创建文件后，对其进行验证。如果验证失败，解压文件，修复XML，然后重新打包。

python scripts/office/validate.py doc.docx

页面尺寸

// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [{
  properties: {
    page: {
      size: {
        width: 12240,   // 8.5 inches in DXA
        height: 15840   // 11 inches in DXA
      },
      margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
    }
  },
  children: [/* content */]
}]

常见页面尺寸（DXA单位，1440 DXA = 1英寸）：

纸张类型	宽度	高度	内容宽度（1英寸页边距）
US Letter（美国信纸）	12,240	15,840	9,360
A4（默认）	11,906	16,838	9,026

横向页面方向：docx-js 会在内部交换宽度和高度，因此传入纵向尺寸并让其处理交换：

size: {
  width: 12240,   // Pass SHORT edge as width
  height: 15840,  // Pass LONG edge as height
  orientation: PageOrientation.LANDSCAPE  // docx-js swaps them in the XML
},
// Content width = 15840 - left margin - right margin (uses the long edge)

样式（覆盖内置标题样式）

将Arial设为默认字体（普遍支持）。标题保持黑色以保证可读性。

const doc = new Document({
  styles: {
    default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
    paragraphStyles: [
      // IMPORTANT: Use exact IDs to override built-in styles
      { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 32, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC
      { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 28, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
    ]
  },
  sections: [{
    children: [
      new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
    ]
  }]
});

列表（切勿使用Unicode项目符号）

// ❌ WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("• Item")] })  // BAD
new Paragraph({ children: [new TextRun("\u2022 Item")] })  // BAD

// ✅ CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
  numbering: {
    config: [
      { reference: "bullets",
        levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
      { reference: "numbers",
        levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
    ]
  },
  sections: [{
    children: [
      new Paragraph({ numbering: { reference: "bullets", level: 0 },
        children: [new TextRun("Bullet item")] }),
      new Paragraph({ numbering: { reference: "numbers", level: 0 },
        children: [new TextRun("Numbered item")] }),
    ]
  }]
});

// ⚠️ Each reference creates INDEPENDENT numbering
// Same reference = continues (1,2,3 then 4,5,6)
// Different reference = restarts (1,2,3 then 1,2,3)

表格

关键：表格需要双重宽度设置- 两者均需设置表格上的 `columnWidths`以及每个单元格上的 `width`缺少任一项，表格在某些平台上可能渲染不正确。

// CRITICAL: Always set table width for consistent rendering
// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };

new Table({
  width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)
  columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)
  rows: [
    new TableRow({
      children: [
        new TableCell({
          borders,
          width: { size: 4680, type: WidthType.DXA }, // Also set on each cell
          shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
          margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)
          children: [new Paragraph({ children: [new TextRun("Cell")] })]
        })
      ]
    })
  ]
})

表格宽度计算：

始终使用`WidthType.DXA`—`WidthType.PERCENTAGE`在Google文档中会出错。

// Table width = sum of columnWidths = content width
// US Letter with 1" margins: 12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360]  // Must sum to table width

宽度规则：

始终使用`WidthType.DXA`——切勿使用`WidthType.PERCENTAGE`（与Google文档不兼容）
表格宽度必须等于`columnWidths` 的总和
单元格宽度必须与对应的列宽
单元格边距是内部填充区域——它们会减小内容区域，而非增加单元格宽度
对于全宽表格：使用内容宽度（页面宽度减去左右边距）

图像

// CRITICAL: type parameter is REQUIRED
new Paragraph({
  children: [new ImageRun({
    type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
    data: fs.readFileSync("image.png"),
    transformation: { width: 200, height: 150 },
    altText: { title: "Title", description: "Desc", name: "Name" } // All three required
  })]
})

分页符

// CRITICAL: PageBreak must be inside a Paragraph
new Paragraph({ children: [new PageBreak()] })

// Or use pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })

页眉/页脚

sections: [{
  properties: {
    page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch
  },
  headers: {
    default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
  },
  footers: {
    default: new Footer({ children: [new Paragraph({
      children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
    })] })
  },
  children: [/* content */]
}]

docx-js 关键规则

必须显式设置页面尺寸- docx-js 默认使用 A4 纸张；美国文档需使用 US Letter（12240 x 15840 DXA）
横向页面：传入纵向尺寸参数- docx-js 会在内部交换宽高值；需将短边作为宽度、长边作为高度传入，并设置orientation: PageOrientation.LANDSCAPE
切勿使用\n- 使用独立的段落元素
切勿使用Unicode项目符号- 使用LevelFormat.BULLET配合编号配置
分页符必须置于段落中- 独立使用会导致无效XML
ImageRun需要类型- 必须指定png/jpg等格式
始终为表格设置宽度使用DXA单位- 切勿使用WidthType.PERCENTAGE(在Google文档中会失效)
表格需要双重宽度设置-columnWidths数组与单元格宽度两者必须匹配
表格宽度 = 列宽总和- 对于DXA，确保它们精确累加
始终添加单元格边距- 使用边距：{ 上：80，下：80，左：120，右：120 }以获得可读的填充
使用ShadingType.CLEAR- 表格底纹切勿使用SOLID
目录仅需HeadingLevel- 标题段落不使用自定义样式
覆盖内置样式- 使用精确ID："Heading1"、"Heading2"等
包含outlineLevel- 目录必需（H1为0，H2为1，依此类推）

编辑现有文档

按顺序遵循所有3个步骤

步骤1：解包

python scripts/office/unpack.py document.docx unpacked/

提取XML、美化打印、合并相邻运行，并将智能引号转换为XML实体（“等），以便在编辑后保留。使用--merge-runs false以跳过运行合并。

步骤2：编辑XML

编辑unpacked/word/目录中的文件。具体模式请参考下方的XML参考。

对于跟踪的更改和评论，使用"Claude"作为作者，除非用户明确要求使用其他名称。

直接使用编辑工具进行字符串替换。不要编写Python脚本。脚本会引入不必要的复杂性。编辑工具能准确显示正在替换的内容。

关键：新内容使用智能引号。添加包含撇号或引号的文本时，使用XML实体来生成智能引号：

<!-- Use these entities for professional typography -->
<w:t>Here&#x2019;s a quote: &#x201C;Hello&#x201D;</w:t>

实体	字符
`‘`	‘（左单引号）
`’`	’（右单引号/撇号）
`“`	“（左双引号）
`”`	”（右双引号）

添加注释：使用comment.py来处理多个XML文件中的样板内容（文本必须是预先转义过的XML）：

python scripts/comment.py unpacked/ 0 "Comment text with &amp; and &#x2019;"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0  # reply to comment 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author"  # custom author name

然后在document.xml中添加标记（请参阅XML参考中的注释部分）。

第三步：打包

python scripts/office/pack.py unpacked/ output.docx --original document.docx

进行自动修复验证，压缩XML，并创建DOCX文件。使用--validate false来跳过验证。

自动修复将修复：

durableId>= 0x7FFFFFFF（重新生成有效ID）
缺失的xml:space="preserve"在<w:t>包含空格的情况下

自动修复不会修复：

格式错误的XML、无效的元素嵌套、缺失的关系、违反模式规范

常见陷阱

替换整个<w:r>元素：添加修订标记时，应整体替换<w:r>...</w:r>区块为<w:del>...<w:ins>...作为同级元素。切勿在文本运行内部插入修订标记。
保持<w:rPr>格式设置：将原始文本运行的<w:rPr>区块复制到修订文本运行中，以维持加粗、字体大小等格式。

XML参考

模式合规性

元素在<w:pPr>中的顺序：<w:pStyle>、<w:numPr>、<w:spacing>、<w:ind><w:jc>,<w:rPr>最后的
空白字符：添加xml:space="preserve"到<w:t>包含首尾空格的
修订会话标识符：必须是8位十六进制数（例如，00AB1234）

跟踪更改

插入：

<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:t>inserted text</w:t></w:r>
</w:ins>

删除：

<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>

在<w:del>内：使用<w:delText>代替<w:t>，以及<w:delInstrText>代替<w:instrText>.

最小化编辑- 仅标记变更内容：

<!-- Change "30 days" to "60 days" -->
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
  <w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
  <w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>

删除整个段落/列表项- 当删除段落中所有内容时，同时将段落标记为已删除，使其与下一段落合并。在<w:del/>内部<w:pPr><w:rPr>中添加：

<w:p>
  <w:pPr>
    <w:numPr>...</w:numPr>  <!-- list numbering if present -->
    <w:rPr>
      <w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
    </w:rPr>
  </w:pPr>
  <w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
    <w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
  </w:del>
</w:p>

若未在<w:del/>中<w:pPr><w:rPr>添加，则接受更改后会留下空段落/列表项。

拒绝其他作者的插入- 将删除内容嵌套在其插入内容内：

<w:ins w:author="Jane" w:id="5">
  <w:del w:author="Claude" w:id="10">
    <w:r><w:delText>their inserted text</w:delText></w:r>
  </w:del>
</w:ins>

恢复其他作者的删除- 在其删除内容后添加插入（不修改其删除内容）：

<w:del w:author="Jane" w:id="5">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
  <w:r><w:t>deleted text</w:t></w:r>
</w:ins>

批注

运行comment.py后（参见步骤2），将标记添加到document.xml。对于回复，请使用--parent在父级标记内放置标志和嵌套标记。

重要提示：<w:commentRangeStart>和<w:commentRangeEnd>是<w:r>的同级元素，从不位于<w:r>内部。

<!-- Comment markers are direct children of w:p, never inside w:r -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>deleted</w:delText></w:r>
</w:del>
<w:r><w:t> more text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>

<!-- Comment 0 with reply 1 nested inside -->
<w:commentRangeStart w:id="0"/>
  <w:commentRangeStart w:id="1"/>
  <w:r><w:t>text</w:t></w:r>
  <w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>

图片

将图像文件添加到word/media/
在word/_rels/document.xml.rels中添加关系：

<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>

在[Content_Types].xml中添加内容类型：

<Default Extension="png" ContentType="image/png"/>

在document.xml中引用：

<w:drawing>
  <wp:inline>
    <wp:extent cx="914400" cy="914400"/>  <!-- EMUs: 914400 = 1 inch -->
    <a:graphic>
      <a:graphicData uri=".../picture">
        <pic:pic>
          <pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
        </pic:pic>
      </a:graphicData>
    </a:graphic>
  </wp:inline>
</w:drawing>

依赖项

pandoc：文本提取
docx：npm install -g docx（新文档）
LibreOffice：PDF转换（通过scripts/office/soffice.py自动配置用于沙盒环境）
Poppler：pdftoppm用于图像处理

免责申明

部分文章来自各大搜索引擎，如有侵权，请与我联系删除。

打赏

文章底部电脑广告

手机广告位-内容正文底部

标签

上一篇：openclaw-feeds技能使用说明下一篇：ClankdIn技能使用说明