-
|
Hi, We're scanning documents to JPG images and convert them to PDFs for OCR and long time storage. Now my question is, what's the drawback of generating PDF/A instead of PDF? PDF/A is not the default with OCRmyPDF so I thought there must be a reason why? Would be nice to learn some things about it so I can make a good decision how to archive our documents. Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Since you're archiving, you do want PDF/A. The biggest issue that standard PDFs have is that they can reference fonts on the user's system that might or might not be installed; and the OS might or might not substitute it with another. The result is that standard PDFs are not guaranteed to display correctly on all systems. There are other issues with PDF too, but all of them have the general shape of "some kind of external dependency", and PDF/A prevents them. OCRmyPDF's behavior has always been to try to produce PDF/A if it can, and give you a standard PDF when it can't. If you insist on a PDF/A, there's an error code issue to signal "succeeded except for PDF/A conversion". A drawback of PDF/A is that it displays a banner in Acrobat and some other PDF software, which can annoy or confuse unfamiliar users. Another drawback is that PDF/A cannot be encrypted - but that usually makes sense for archiving. For all versions up to v17, we would run the PDF through Ghostscript asking it to produce a PDF/A, which usually succeeds but cannot perform the conversion in all cases. In v17 Ghostscript is being made optional, although most standard installations use it. OCRmyPDF now tries to produce PDF/A without Ghostscript if it can "prove" the file is a PDF/A, and if Ghostscript is not installed, then it just has fewer options. |
Beta Was this translation helpful? Give feedback.
Since you're archiving, you do want PDF/A. The biggest issue that standard PDFs have is that they can reference fonts on the user's system that might or might not be installed; and the OS might or might not substitute it with another. The result is that standard PDFs are not guaranteed to display correctly on all systems. There are other issues with PDF too, but all of them have the general shape of "some kind of external dependency", and PDF/A prevents them.
OCRmyPDF's behavior has always been to try to produce PDF/A if it can, and give you a standard PDF when it can't. If you insist on a PDF/A, there's an error code issue to signal "succeeded except for PDF/A conversion".
A drawback of …