Conversion and Optimization
Conversion and Optimization
Aspose.PDF FOSS for .NET provides converters for transforming PDFs to HTML,
Markdown, SVG, and plain text. The optimization subsystem reduces file size
and ensures PDF/A compliance through PdfFormatConversionOptions.
PDF to HTML
PdfToHtmlConverter exports PDF pages as HTML documents.
var converter = new PdfToHtmlConverter();
converter.SaveAsHtml("input.pdf", "output.html");
// Or save each page separately
converter.SaveAllPagesAsHtml("input.pdf", "output_dir");HtmlSaveOptions provides control over image handling, font embedding, and
layout strategy.
PDF to Markdown
PdfToMarkdownConverter exports PDF content as Markdown text.
var converter = new PdfToMarkdownConverter("input.pdf");
converter.SaveAsMarkdown("output.md");
// Single page
converter.SavePageAsMarkdown(1, "page1.md");PDF to SVG
PdfToSvgConverter renders each page as a scalable vector graphic.
var converter = new PdfToSvgConverter();
converter.SaveAllPagesAsSvg("input.pdf", "output_dir");PDF to text
PdfToTextConverter extracts plain text from PDF pages.
var converter = new PdfToTextConverter();
converter.SaveAsText("input.pdf", "output.txt");PDF/A compliance
PdfFormatConversionOptions validates and converts documents to PDF/A
standards.
using var doc = Document.Open(pdfBytes);
var options = new PdfFormatConversionOptions(
"log.xml",
PdfFormat.PDF_A_1B,
ConvertErrorAction.Delete);
doc.Convert(options);
doc.Save("pdfa.pdf");Heading-level control
HeadingLevels configures which heading levels are recognized during
HTML or Markdown conversion.
var levels = new HeadingLevels();
levels.AddLevels(1, 3); // Recognize H1 through H3Tips and Best Practices
- Use
PdfToHtmlConverterfor web publishing andPdfToMarkdownConverterfor documentation workflows. - PDF/A conversion may remove features (JavaScript, encryption) that violate the standard — use
ConvertErrorAction.DeleteorConvertErrorAction.None. - For large documents, convert page-by-page to manage memory.
HtmlSaveOptionscontrols whether images are embedded inline or saved as external files.- SVG output is ideal for high-resolution display of individual pages.
Common Issues
| Issue | Cause | Fix |
|---|---|---|
| HTML output missing images | Images not embedded; external paths incorrect | Configure HtmlSaveOptions for embedded images |
| PDF/A conversion removes annotations | Annotations not allowed in target PDF/A profile | Use PDF/A-2 or PDF/A-3 which allow annotations |
| Text extraction loses formatting | Plain-text output has no formatting by design | Use HTML or Markdown conversion instead |
FAQ
Which PDF/A profiles are supported?
PDF/A-1A, PDF/A-1B, PDF/A-2A, PDF/A-2B, PDF/A-3A, and PDF/A-3B profiles
are supported through PdfFormat enumeration values.
Can I convert a single page to HTML?
Yes. Use PdfToHtmlConverter.SavePageAsHtml.
Does Markdown conversion preserve tables?
The converter attempts to render table structures as Markdown tables, but complex layouts may require post-processing.
API Reference Summary
| Class / Method | Description |
|---|---|
PdfToHtmlConverter | Convert PDF to HTML |
PdfToHtmlConverter.SaveAsHtml | Export entire document as HTML |
PdfToMarkdownConverter | Convert PDF to Markdown |
PdfToSvgConverter | Convert PDF pages to SVG |
PdfToTextConverter | Extract plain text from PDF |
HtmlSaveOptions | Options for HTML export (images, fonts, layout) |
HeadingLevels | Configure recognized heading levels |
PdfFormatConversionOptions | PDF/A validation and conversion options |