Describe the proposed feature
Analog to pdftk it would be a nice feature if the following syntax would be allowed --pages 3-end when you know, that the first 2 pages are not used.
As far I can tell the changes that need to be done in
|
def _pages_from_ranges(ranges: str) -> set[int]: |
|
pages: list[int] = [] |
|
page_groups = ranges.replace(' ', '').split(',') |
|
for group in page_groups: |
|
if not group: |
|
continue |
|
try: |
|
start, end = group.split('-') |
|
except ValueError: |
|
pages.append(int(group) - 1) |
|
else: |
|
try: |
|
new_pages = list(range(int(start) - 1, int(end))) |
|
if not new_pages: |
|
raise BadArgsError( |
|
f"invalid page subrange '{start}-{end}'" |
|
) from None |
|
pages.extend(new_pages) |
|
except ValueError: |
|
raise BadArgsError(f"invalid page subrange '{group}'") from None |
|
|
|
if not pages: |
|
raise BadArgsError( |
|
f"The string of page ranges '{ranges}' did not contain any recognizable " |
|
f"page ranges." |
|
) |
|
|
|
if not monotonic(pages): |
|
log.warning( |
|
"List of pages to process contains duplicate pages, or pages that are " |
|
"out of order" |
|
) |
|
if any(page < 0 for page in pages): |
|
raise BadArgsError("pages refers to a page number less than 1") |
|
|
|
log.debug("OCRing only these pages: %s", pages) |
|
return set(pages) |
(Need to figure out what the last page of the document is and replace the number with it :D)
Ideas, thoughts on this topic?
Describe the proposed feature
Analog to pdftk it would be a nice feature if the following syntax would be allowed
--pages 3-endwhen you know, that the first 2 pages are not used.As far I can tell the changes that need to be done in
OCRmyPDF/src/ocrmypdf/_validation.py
Lines 155 to 191 in 8930efe
Ideas, thoughts on this topic?