Common Issues
- PyMuPDF not installed —
pip install pymupdf. The skill raisesRuntimeErrorwith install instructions if missing. - Password-protected PDFs — Not currently supported. Decrypt with
qpdf --decrypt input.pdf output.pdffirst. - Scanned-only PDFs (no text layer) — OCR fallback uses
pytesseractif installed (pip install pytesseract). Without it, extraction returns[OCR not available]. - Tables not detected — PyMuPDF’s
find_tables()requires structured table layouts. Hand-drawn or image-based tables won’t extract. Consider exporting from the source app instead. - OCR disabled / slow — OCR is only triggered when a page has fewer than 20 characters of extractable text. For large scanned docs, expect slower extraction. Set
page_start/page_endto limit scope.
