开发者指南

Aspose.Note FOSS for Python 是一个免费、开源的库,用于读取 Microsoft OneNote .one 部分文件,且不依赖 Microsoft Office。它在 aspose.note 包下提供了简洁的公共 API,模型参考了 Aspose.Note for .NET 接口。该库适用于文档自动化、内容索引、数据提取管道和归档工作流。

本开发者指南涵盖了版本 26.3.1 中可用的完整公共 API 接口,并提供了每个主要功能的可运行代码示例。

文档加载

从文件路径或二进制流加载 .one 文件。Document 类是所有操作的入口点。

从文件路径加载

from aspose.note import Document

doc = Document("MyNotes.one")

从二进制流加载

在从云存储、HTTP 响应或内存缓冲区读取时很有用:

from pathlib import Path
from aspose.note import Document

with Path("MyNotes.one").open("rb") as f:
    doc = Document(f)

加载选项

使用 LoadOptions 在加载时设置可选参数:

from aspose.note import Document, LoadOptions

opts = LoadOptions()
opts.LoadHistory = True   # Include page history in the DOM
doc = Document("MyNotes.one", opts)

注意: DocumentPasswordLoadOptions 上为 API 兼容性而存在,但不支持加密文档。尝试加载加密文件会引发 IncorrectPasswordException


文档结构 (DOM)

OneNote 文档模型是树形结构:

Document
  └── Page (0..n)
        ├── Title
        │     ├── TitleText (RichText)
        │     ├── TitleDate (RichText)
        │     └── TitleTime (RichText)
        └── Outline (0..n)
              └── OutlineElement (0..n)
                    ├── RichText
                    ├── Image
                    ├── Table
                    │     └── TableRow
                    │           └── TableCell
                    │                 └── RichText / Image
                    └── AttachedFile

每个节点都公开 ParentNode 和一个 Document 属性,该属性可向上遍历到根节点。复合节点支持子节点迭代,FirstChildLastChildAppendChildLastInsertChildRemoveChildGetChildNodes(Type)


遍历页面

页面是 Document 的直接子项。直接遍历它们或使用 GetChildNodes

from aspose.note import Document, Page

doc = Document("MyNotes.one")

for page in doc:
    title = page.Title.TitleText.Text if page.Title and page.Title.TitleText else "(untitled)"
    author = page.Author or "(unknown)"
    print(f"  {title}  [by {author}]")

页面元数据:

属性类型描述
TitleTitle | None页面标题块
Authorstr | None作者字符串
CreationTimedatetime | None页面创建时间
LastModifiedTimedatetime | None最后修改时间
Levelint | None子页面缩进级别

文本提取

提取所有纯文本

from aspose.note import Document, RichText

doc = Document("MyNotes.one")
all_text = [rt.Text for rt in doc.GetChildNodes(RichText) if rt.Text]
print("\n".join(all_text))

检查格式化运行

每个 RichText 包含一个 TextRun 段的列表。每个运行都有其自己的 TextStyle

from aspose.note import Document, RichText

doc = Document("FormattedNotes.one")
for rt in doc.GetChildNodes(RichText):
    for run in rt.TextRuns:
        style = run.Style
        flags = []
        if style.IsBold: flags.append("bold")
        if style.IsItalic: flags.append("italic")
        if style.IsHyperlink: flags.append(f"link={style.HyperlinkAddress}")
        print(f"{run.Text!r:40s} [{', '.join(flags)}]")

提取超链接

from aspose.note import Document, RichText

doc = Document("MyNotes.one")
for rt in doc.GetChildNodes(RichText):
    for run in rt.TextRuns:
        if run.Style.IsHyperlink and run.Style.HyperlinkAddress:
            print(run.Text, "->", run.Style.HyperlinkAddress)

图像提取

from aspose.note import Document, Image

doc = Document("MyNotes.one")
for i, img in enumerate(doc.GetChildNodes(Image), start=1):
    name = img.FileName or f"image_{i}.bin"
    with open(name, "wb") as f:
        f.write(img.Bytes)
    print(f"Saved {name}  ({img.Width}x{img.Height})")

图像属性:FileName, Bytes, Width, Height, AlternativeTextTitle, AlternativeTextDescription, HyperlinkUrl, Tags


表格解析

from aspose.note import Document, Table, TableRow, TableCell, RichText

doc = Document("MyNotes.one")
for table in doc.GetChildNodes(Table):
    print("Column widths:", [col.Width for col in table.Columns])
    for r, row in enumerate(table.GetChildNodes(TableRow), start=1):
        cells = row.GetChildNodes(TableCell)
        row_text = [
            " ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
            for cell in cells
        ]
        print(f"Row {r}:", row_text)

附件文件

from aspose.note import Document, AttachedFile

doc = Document("NotesWithAttachments.one")
for i, af in enumerate(doc.GetChildNodes(AttachedFile), start=1):
    name = af.FileName or f"attachment_{i}.bin"
    with open(name, "wb") as f:
        f.write(af.Bytes)
    print(f"Saved: {name}")

标签和编号列表

检查 NoteTag 项

from aspose.note import Document, RichText, Image, Table

doc = Document("TaggedNotes.one")
for rt in doc.GetChildNodes(RichText):
    for tag in rt.Tags:
        print(f"RichText tag: {tag.Label} icon={tag.Icon}")
for img in doc.GetChildNodes(Image):
    for tag in img.Tags:
        print(f"Image tag: {tag.Label}")

检查编号列表

from aspose.note import Document, OutlineElement

doc = Document("NumberedNotes.one")
for oe in doc.GetChildNodes(OutlineElement):
    nl = oe.NumberList
    if nl:
        print(f"format={nl.Format!r}")

DocumentVisitor 模式

使用 DocumentVisitor 实现一个遍历整个文档树的访问器:

from aspose.note import Document, DocumentVisitor, Page, RichText, Image

class ContentCounter(DocumentVisitor):
    def __init__(self):
        self.pages = 0
        self.texts = 0
        self.images = 0

    def VisitPageStart(self, page: Page) -> None:
        self.pages += 1

    def VisitRichTextStart(self, rt: RichText) -> None:
        self.texts += 1

    def VisitImageStart(self, img: Image) -> None:
        self.images += 1

doc = Document("MyNotes.one")
counter = ContentCounter()
doc.Accept(counter)
print(f"Pages: {counter.pages}, Texts: {counter.texts}, Images: {counter.images}")

PDF 导出

PDF 导出需要可选的 ReportLab 依赖。使用以下方式安装:

pip install "aspose-note[pdf]"

基本 PDF 导出

from aspose.note import Document, SaveFormat

doc = Document("MyNotes.one")
doc.Save("output.pdf", SaveFormat.Pdf)

PDF 导出选项

import io
from aspose.note import Document, SaveFormat
from aspose.note.saving import PdfSaveOptions

doc = Document("MyNotes.one")

##With save options
opts = PdfSaveOptions()
doc.Save("output.pdf", opts)

##Save to in-memory stream
buf = io.BytesIO()
doc.Save(buf, PdfSaveOptions())
pdf_bytes = buf.getvalue()

注意: PdfSaveOptions.PageIndexPageCount 字段存在,但在 v26.3.1 中未转发给 PDF 导出器。整个文档始终会被导出。


当前限制

AreaStatus
Reading .one files完全支持
PDF export (via ReportLab)支持
Writing back to .one未实现
Encrypted documents不支持(raises IncorrectPasswordException
HTML / image / ONE save formatsDeclared for API compatibility; raise UnsupportedSaveFormatException

可用指南


另请参见

 中文