My use case is to perform OCR on a PDF. I’m converting each PDF page to an image using Pixmap and then passing it to Tesseract.js for OCR. Tesseract outputs a PDF page with the recognized text placed as an invisible layer. How can I extract this text along with its exact positions and overlay it onto the original PDF?
My use case is to perform OCR on a PDF. I’m converting each PDF page to an image using Pixmap and then passing it to Tesseract.js for OCR. Tesseract outputs a PDF page with the recognized text placed as an invisible layer. How can I extract this text along with its exact positions and overlay it onto the original PDF?