Conversion and Optimization

Conversion and Optimization

Conversion and Optimization

Aspose.PDF FOSS for .NET provides converters for transforming PDFs to HTML, Markdown, SVG, and plain text. The optimization subsystem reduces file size and ensures PDF/A compliance through PdfFormatConversionOptions.


PDF to HTML

PdfToHtmlConverter exports PDF pages as HTML documents.

var converter = new PdfToHtmlConverter();
converter.SaveAsHtml("input.pdf", "output.html");

// Or save each page separately
converter.SaveAllPagesAsHtml("input.pdf", "output_dir");

HtmlSaveOptions provides control over image handling, font embedding, and layout strategy.


PDF to Markdown

PdfToMarkdownConverter exports PDF content as Markdown text.

var converter = new PdfToMarkdownConverter("input.pdf");
converter.SaveAsMarkdown("output.md");

// Single page
converter.SavePageAsMarkdown(1, "page1.md");

PDF to SVG

PdfToSvgConverter renders each page as a scalable vector graphic.

var converter = new PdfToSvgConverter();
converter.SaveAllPagesAsSvg("input.pdf", "output_dir");

PDF to text

PdfToTextConverter extracts plain text from PDF pages.

var converter = new PdfToTextConverter();
converter.SaveAsText("input.pdf", "output.txt");

PDF/A compliance

PdfFormatConversionOptions validates and converts documents to PDF/A standards.

using var doc = Document.Open(pdfBytes);

var options = new PdfFormatConversionOptions(
    "log.xml",
    PdfFormat.PDF_A_1B,
    ConvertErrorAction.Delete);

doc.Convert(options);
doc.Save("pdfa.pdf");

Heading-level control

HeadingLevels configures which heading levels are recognized during HTML or Markdown conversion.

var levels = new HeadingLevels();
levels.AddLevels(1, 3);  // Recognize H1 through H3

Tips and Best Practices

  • Use PdfToHtmlConverter for web publishing and PdfToMarkdownConverter for documentation workflows.
  • PDF/A conversion may remove features (JavaScript, encryption) that violate the standard — use ConvertErrorAction.Delete or ConvertErrorAction.None.
  • For large documents, convert page-by-page to manage memory.
  • HtmlSaveOptions controls whether images are embedded inline or saved as external files.
  • SVG output is ideal for high-resolution display of individual pages.

Common Issues

IssueCauseFix
HTML output missing imagesImages not embedded; external paths incorrectConfigure HtmlSaveOptions for embedded images
PDF/A conversion removes annotationsAnnotations not allowed in target PDF/A profileUse PDF/A-2 or PDF/A-3 which allow annotations
Text extraction loses formattingPlain-text output has no formatting by designUse HTML or Markdown conversion instead

FAQ

Which PDF/A profiles are supported?

PDF/A-1A, PDF/A-1B, PDF/A-2A, PDF/A-2B, PDF/A-3A, and PDF/A-3B profiles are supported through PdfFormat enumeration values.

Can I convert a single page to HTML?

Yes. Use PdfToHtmlConverter.SavePageAsHtml.

Does Markdown conversion preserve tables?

The converter attempts to render table structures as Markdown tables, but complex layouts may require post-processing.


API Reference Summary

Class / MethodDescription
PdfToHtmlConverterConvert PDF to HTML
PdfToHtmlConverter.SaveAsHtmlExport entire document as HTML
PdfToMarkdownConverterConvert PDF to Markdown
PdfToSvgConverterConvert PDF pages to SVG
PdfToTextConverterExtract plain text from PDF
HtmlSaveOptionsOptions for HTML export (images, fonts, layout)
HeadingLevelsConfigure recognized heading levels
PdfFormatConversionOptionsPDF/A validation and conversion options

See Also

 English