All questions about EDocman extension

Empty pdf's

  • bestcons
  • Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
2 days 13 hours ago #175565 by bestcons
Empty pdf's was created by bestcons
We have a large number of pdfs, handled via Edocman. They are indexed and we see them in Joomla Smart Search if they contain the searched phrase.
However if we View the Search results, a small number of pdf's have 'empty pages' although the number of pages is correct, i.e. they do not show the pdf. Searching through these 'empty pages' shows vaque where results are located. The Download is OK.
I hope you have an explanation and can offer a solution.

Please Log in or Create an account to join the conversation.

More
2 days 3 hours ago #175569 by Dang Thuc Dam
Replied by Dang Thuc Dam on topic Empty pdf's
Hi,
We would like to clarify that if a PDF file contains only images and does not have any embedded text, the system is unable to read or index the content of the file for search purposes. As a result, these files may appear as “empty pages” in the search results because there is no searchable text available.
Additionally, if the PDF files are secured, restricted from reading, or encrypted, the system may also be unable to access and display their content. This can further result in blank or empty pages when viewing search results, even though the files can still be downloaded.
We recommend checking whether the affected PDF files contain selectable text or are protected in any way.
Thanks
Dam

Please Log in or Create an account to join the conversation.

  • bestcons
  • Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 day 23 hours ago #175575 by bestcons
Replied by bestcons on topic Empty pdf's
They certainly are. It all concerns scanned Newspapers, including text and images. All are handled after the scanning the same way, according to a strict protocol. The pages are not empty, it seems as if they have white coloured text. Searching through these pages is recognized, as 'the result' jumps to right page.

Please Log in or Create an account to join the conversation.

More
1 day 2 hours ago - 1 day 2 hours ago #175617 by Dang Thuc Dam
Replied by Dang Thuc Dam on topic Empty pdf's
Hi,
If the PDF files contain scanned images, it is not possible to read or extract the text content directly, as the text is part of the image. Currently, there is no tool available in our system that can read the contents of scanned image files.

Solution: Converting Scanned PDFs to Searchable Text PDFs
Before uploading your scanned PDF files to Edocman, you will need to use Optical Character Recognition (OCR) software to convert the images into searchable text. This process creates a text layer within the PDF, allowing Edocman’s PDF Indexer to read and index the content.

Recommended OCR Tools:
  • Adobe Acrobat (Paid, highly accurate)
  • ABBYY FineReader (Paid, professional-grade OCR)
  • Online OCR services (Free or paid, such as onlineocr.net )
Steps:
  1. Open your scanned PDF file in your chosen OCR software.
  2. Run the OCR function to recognize and convert the images into text.
  3. Save the resulting PDF file. It should now contain selectable and searchable text.
  4. Upload the processed PDF to Edocman as usual.
Once you follow these steps, Edocman’s PDF Indexer will be able to read and index the text content for efficient searching.

Thanks
Dam
Last edit: 1 day 2 hours ago by Dang Thuc Dam.

Please Log in or Create an account to join the conversation.

  • bestcons
  • Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
22 hours 49 minutes ago #175623 by bestcons
Replied by bestcons on topic Empty pdf's
They are all text searchable. We used Abby FineReader. All documents followed the same process.
You are familiar with our website: www.dexxxxxxxx.yy . So please search for CVO07 you find 2 results. You can see with the green View button the empty/blank results. If you download the documents, they are searchable..
Searching for CVO01 delivers numerous correct examples.

Please Log in or Create an account to join the conversation.

More
21 hours 23 minutes ago #175628 by Dang Thuc Dam
Replied by Dang Thuc Dam on topic Empty pdf's
Hi,
Please submit ticket in category: Edocman and send us some pdf files for debugging.
Thanks
Dam

Please Log in or Create an account to join the conversation.

Moderators: Dang Thuc Dam