开发者指南
Aspose.Note FOSS for Python 是一个免费、开源的库,用于读取 Microsoft OneNote .one 部分文件,且不依赖 Microsoft Office。它在 aspose.note 包下提供了简洁的公共 API,模型参考了 Aspose.Note for .NET 接口。该库适用于文档自动化、内容索引、数据提取管道和归档工作流。
本开发者指南涵盖了版本 26.3.1 中可用的完整公共 API 接口,并提供了每个主要功能的可运行代码示例。
文档加载
从文件路径或二进制流加载 .one 文件。Document 类是所有操作的入口点。
从文件路径加载
from aspose.note import Document
doc = Document("MyNotes.one")从二进制流加载
在从云存储、HTTP 响应或内存缓冲区读取时很有用:
from pathlib import Path
from aspose.note import Document
with Path("MyNotes.one").open("rb") as f:
doc = Document(f)加载选项
使用 LoadOptions 在加载时设置可选参数:
from aspose.note import Document, LoadOptions
opts = LoadOptions()
opts.LoadHistory = True # Include page history in the DOM
doc = Document("MyNotes.one", opts)注意:
DocumentPassword在LoadOptions上为 API 兼容性而存在,但不支持加密文档。尝试加载加密文件会引发IncorrectPasswordException。
文档结构 (DOM)
OneNote 文档模型是树形结构:
Document
└── Page (0..n)
├── Title
│ ├── TitleText (RichText)
│ ├── TitleDate (RichText)
│ └── TitleTime (RichText)
└── Outline (0..n)
└── OutlineElement (0..n)
├── RichText
├── Image
├── Table
│ └── TableRow
│ └── TableCell
│ └── RichText / Image
└── AttachedFile每个节点都公开 ParentNode 和一个 Document 属性,该属性可向上遍历到根节点。复合节点支持子节点迭代,FirstChild、LastChild、AppendChildLast、InsertChild、RemoveChild 和 GetChildNodes(Type)。
遍历页面
页面是 Document 的直接子项。直接遍历它们或使用 GetChildNodes:
from aspose.note import Document, Page
doc = Document("MyNotes.one")
for page in doc:
title = page.Title.TitleText.Text if page.Title and page.Title.TitleText else "(untitled)"
author = page.Author or "(unknown)"
print(f" {title} [by {author}]")页面元数据:
| 属性 | 类型 | 描述 |
|---|---|---|
Title | Title | None | 页面标题块 |
Author | str | None | 作者字符串 |
CreationTime | datetime | None | 页面创建时间 |
LastModifiedTime | datetime | None | 最后修改时间 |
Level | int | None | 子页面缩进级别 |
文本提取
提取所有纯文本
from aspose.note import Document, RichText
doc = Document("MyNotes.one")
all_text = [rt.Text for rt in doc.GetChildNodes(RichText) if rt.Text]
print("\n".join(all_text))检查格式化运行
每个 RichText 包含一个 TextRun 段的列表。每个运行都有其自己的 TextStyle:
from aspose.note import Document, RichText
doc = Document("FormattedNotes.one")
for rt in doc.GetChildNodes(RichText):
for run in rt.TextRuns:
style = run.Style
flags = []
if style.IsBold: flags.append("bold")
if style.IsItalic: flags.append("italic")
if style.IsHyperlink: flags.append(f"link={style.HyperlinkAddress}")
print(f"{run.Text!r:40s} [{', '.join(flags)}]")提取超链接
from aspose.note import Document, RichText
doc = Document("MyNotes.one")
for rt in doc.GetChildNodes(RichText):
for run in rt.TextRuns:
if run.Style.IsHyperlink and run.Style.HyperlinkAddress:
print(run.Text, "->", run.Style.HyperlinkAddress)图像提取
from aspose.note import Document, Image
doc = Document("MyNotes.one")
for i, img in enumerate(doc.GetChildNodes(Image), start=1):
name = img.FileName or f"image_{i}.bin"
with open(name, "wb") as f:
f.write(img.Bytes)
print(f"Saved {name} ({img.Width}x{img.Height})")图像属性:FileName, Bytes, Width, Height, AlternativeTextTitle, AlternativeTextDescription, HyperlinkUrl, Tags。
表格解析
from aspose.note import Document, Table, TableRow, TableCell, RichText
doc = Document("MyNotes.one")
for table in doc.GetChildNodes(Table):
print("Column widths:", [col.Width for col in table.Columns])
for r, row in enumerate(table.GetChildNodes(TableRow), start=1):
cells = row.GetChildNodes(TableCell)
row_text = [
" ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
for cell in cells
]
print(f"Row {r}:", row_text)附件文件
from aspose.note import Document, AttachedFile
doc = Document("NotesWithAttachments.one")
for i, af in enumerate(doc.GetChildNodes(AttachedFile), start=1):
name = af.FileName or f"attachment_{i}.bin"
with open(name, "wb") as f:
f.write(af.Bytes)
print(f"Saved: {name}")标签和编号列表
检查 NoteTag 项
from aspose.note import Document, RichText, Image, Table
doc = Document("TaggedNotes.one")
for rt in doc.GetChildNodes(RichText):
for tag in rt.Tags:
print(f"RichText tag: {tag.Label} icon={tag.Icon}")
for img in doc.GetChildNodes(Image):
for tag in img.Tags:
print(f"Image tag: {tag.Label}")检查编号列表
from aspose.note import Document, OutlineElement
doc = Document("NumberedNotes.one")
for oe in doc.GetChildNodes(OutlineElement):
nl = oe.NumberList
if nl:
print(f"format={nl.Format!r}")DocumentVisitor 模式
使用 DocumentVisitor 实现一个遍历整个文档树的访问器:
from aspose.note import Document, DocumentVisitor, Page, RichText, Image
class ContentCounter(DocumentVisitor):
def __init__(self):
self.pages = 0
self.texts = 0
self.images = 0
def VisitPageStart(self, page: Page) -> None:
self.pages += 1
def VisitRichTextStart(self, rt: RichText) -> None:
self.texts += 1
def VisitImageStart(self, img: Image) -> None:
self.images += 1
doc = Document("MyNotes.one")
counter = ContentCounter()
doc.Accept(counter)
print(f"Pages: {counter.pages}, Texts: {counter.texts}, Images: {counter.images}")PDF 导出
PDF 导出需要可选的 ReportLab 依赖。使用以下方式安装:
pip install "aspose-note[pdf]"基本 PDF 导出
from aspose.note import Document, SaveFormat
doc = Document("MyNotes.one")
doc.Save("output.pdf", SaveFormat.Pdf)PDF 导出选项
import io
from aspose.note import Document, SaveFormat
from aspose.note.saving import PdfSaveOptions
doc = Document("MyNotes.one")
##With save options
opts = PdfSaveOptions()
doc.Save("output.pdf", opts)
##Save to in-memory stream
buf = io.BytesIO()
doc.Save(buf, PdfSaveOptions())
pdf_bytes = buf.getvalue()注意:
PdfSaveOptions.PageIndex和PageCount字段存在,但在 v26.3.1 中未转发给 PDF 导出器。整个文档始终会被导出。
当前限制
| Area | Status |
|---|---|
Reading .one files | 完全支持 |
| PDF export (via ReportLab) | 支持 |
Writing back to .one | 未实现 |
| Encrypted documents | 不支持(raises IncorrectPasswordException) |
| HTML / image / ONE save formats | Declared for API compatibility; raise UnsupportedSaveFormatException |
可用指南
- Features Overview: 完整功能列表及证据
- Getting Started: 前置条件、安装和第一步
- Installation: pip 安装及可选依赖