Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spans bbox have 0 instead of real values #418

Closed
sshustov opened this issue Dec 19, 2019 · 3 comments
Closed

Spans bbox have 0 instead of real values #418

sshustov opened this issue Dec 19, 2019 · 3 comments
Assignees
Labels

Comments

@sshustov
Copy link

Please provide all mandatory information!

Describe the bug (mandatory)

For some reason I noticed several documents that have wrong bbox value

To Reproduce (mandatory)

You can use getText('dict') on attached document. I also attached the screenshot (you will find that 1 'bbox' value is correct, but another one is not

Expected behavior (optional)

Both bbox should have some numbers, not 0

Screenshots (optional)

example.pdf
example_from_debug

Your configuration (mandatory)

  • ubuntu 16.04 x64
  • Python version 2.7
  • PyMuPDF version,1.14.21
@JorjMcKie
Copy link
Collaborator

cannot reproduce your problem - see attached logfile:

logfile.zip

@sshustov
Copy link
Author

sshustov commented Dec 19, 2019

I see, can you please check line bbox.
For all other files these tuples are not equal to (0, 0, 0, 0)
Try this:

doc = fitz.open('/home/sshustov/Downloads/example.pdf')
page = doc[0]
blocks = page.getText('dict')['blocks']
for block in blocks:
    print(block['bbox'])
    for line in block['lines']:
        print(line['bbox'])

@JorjMcKie
Copy link
Collaborator

You are right, there is an error in this version.
It works correctly in version 1.16.x. Just tested with 1.16.9 (current in PyPI).
Please upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants