Skip to content
Discussion options

You must be logged in to vote

A few things that might help here:

HTML consistency — Docling's export_to_html() produces a structured HTML output based on the document element types it detects (headings, paragraphs, tables, lists, etc.) [1]. For documents that follow the same layout, the output structure should be consistent, but it ultimately depends on how well Docling's layout model recognizes each element. If your legal documents have a uniform format (e.g., Word templates), you should get fairly reliable results.

Custom tags like <clause> — Docling does not natively support custom HTML tags. The HTML exporter uses a fixed set of standard HTML tags [1]. For your use case, the recommended approach would be to post-…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by gustavotrapp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant