All questions about EDocman extension

EDocman indexing PDF

  • Greg Ellis
  • Topic Author
  • Offline
  • New Member
  • New Member
More
10 years 11 months ago #29638 by Greg Ellis
EDocman indexing PDF was created by Greg Ellis
I have spent the last day trying to work out how to implement indexing of PDF docs in edocman. Trawling the forum leaves me confused. What is the recommended approach to indexing content in PDFs. I have a number of existing PDF files to load into categories in edocman and so far have loaded one. I have downloaded and installed the indexer plugin and enabled it. I have then deleted the document, reloaded manually and then create the edocman document via load from server. How do I now check if it is indexed. I have enabled the search module - will this search the document content - or is it just a keyword search. I am also unsure about whether I need to install pdf2text and where to put it. Do I have to execute this seperately on all documents? This really need a tutorial for the process as it seems several people want to do this. I have supplied a website and login details for you previously for my previous topic.

Please Log in or Create an account to join the conversation.

  • Tuan Pham Ngoc
  • Away
  • Administrator
  • Administrator
More
10 years 11 months ago #29664 by Tuan Pham Ngoc
Replied by Tuan Pham Ngoc on topic Re: EDocman indexing PDF
Hi Greg

At the moment, the extension only index the documents which is uploaded, it doesn't index the document which is load from server (you uploaded via FTP). I think I will improve it in this case. Please give me this weekend to work on this improvement.

Regards,

Tuan

Please Log in or Create an account to join the conversation.

  • Greg Ellis
  • Topic Author
  • Offline
  • New Member
  • New Member
More
10 years 11 months ago #30199 by Greg Ellis
Replied by Greg Ellis on topic Re: EDocman indexing PDF
Hello Tuan,

I am now trying to create an index for a pdf document again. I uploaded the document again via the upload link but it still has not indexed the PDF file. I have installed the indexer plugin and enabled it and I am using the edocman search module to search. I do have jifile installed but the search module is disabled.

I have also changed the details for the document and saved them.

Where is the index stored for the documents - this should be visible if you create an index.

The forum has several entries but they are very confusing as to what components are required to be able to search the content of the files.

Greg Ellis

Please Log in or Create an account to join the conversation.

  • Thimothe Lamoureux (wCube Media)
  • Offline
  • New Member
  • New Member
More
10 years 11 months ago #30214 by Thimothe Lamoureux (wCube Media)
Replied by Thimothe Lamoureux (wCube Media) on topic Re: EDocman indexing PDF
Hi Greg,

I resolved this issue last week so I can point a few things to look for.

The indexed text is stored in the edocman_documents table of your database. Have a look to see if there is anything in there.

My issue was with my host which didn't have the poppler library installed. Eventhough pdftext lib is included with the indexer, it wouldn't work with my host without having poppler installed on the server (took me a day to figure this one out). Also, make sure your host authorize popen functions which calls external scripts.

I'm not using JiFile because apparently it doesn't handle edocman's ACL restrictions and the indexer plugin alone works well.

These are a few good areas to start looking into. Hope it fixes your issue.

Please Log in or Create an account to join the conversation.

  • Greg Ellis
  • Topic Author
  • Offline
  • New Member
  • New Member
More
10 years 11 months ago #30428 by Greg Ellis
Replied by Greg Ellis on topic Re: EDocman indexing PDF
Thanks for the pointers - my testing ISP (free hosting) has disabled popen so cannot test on this server. I have moved to my own localhost ubuntu server. I installed xpdf which I believe has the poppler libraries within it. I have not disabled popen. I then loaded 2 documents - one via admin backend in the edocman component and one using the upload feature in the front end. These appear in the data base - with the indexed content as NULL. When I search on the content for them I get a null result - this is using the edocman search module.

When does the indexing occur and is there another file created?

Please Log in or Create an account to join the conversation.

  • Tuan Pham Ngoc
  • Away
  • Administrator
  • Administrator
More
10 years 11 months ago #30514 by Tuan Pham Ngoc
Replied by Tuan Pham Ngoc on topic Re: EDocman indexing PDF
That means the documents were not indexed. Did you upload the file when you create the document ? Or you choose a file from existing folder ?

Tuan

Please Log in or Create an account to join the conversation.

  • Greg Ellis
  • Topic Author
  • Offline
  • New Member
  • New Member
More
10 years 11 months ago #30565 by Greg Ellis
Replied by Greg Ellis on topic Re: EDocman indexing PDF
As in the last post I uploaded the file twice with different names - once from the front end using the UPLOAD command button and once from the admin backend. Both were uploads. I had previously deleted the file from the edocman folder.

Please Log in or Create an account to join the conversation.

  • Greg Ellis
  • Topic Author
  • Offline
  • New Member
  • New Member
More
10 years 11 months ago #30817 by Greg Ellis
Replied by Greg Ellis on topic Re: EDocman indexing PDF
I have now installed jifile and it is indexing files. I still cannot get your indexer to work. Is there any possibility to integrate the 2 easily?

Please Log in or Create an account to join the conversation.

  • Tuan Pham Ngoc
  • Away
  • Administrator
  • Administrator
More
10 years 11 months ago #30858 by Tuan Pham Ngoc
Replied by Tuan Pham Ngoc on topic Re: EDocman indexing PDF
Hi Greg

At the moment, there is no integration between JIFile and EDocman, so I am afraid of it is not possible. However, there was an attempt from one of our customers to integrate between the two. You can see it here joomdonation.com/74-edocman/21043-edocma...er/Page-2.html#26877

So maybe you can try to contact that customer and ask him about it ? I will try to integrate EDocman with JIFile in the future. At the moment, I don't have enough time !

Tuan

Please Log in or Create an account to join the conversation.

More
10 years 7 months ago - 10 years 7 months ago #34982 by wuegi
Replied by wuegi on topic Re: EDocman indexing PDF
Hi

At my site, the EDocman indexer works fine if i upload a single file. In the db in the table "jos_edocman_documents" i see the field "indexed_content" in which i see the indexed content.

But there are few small things missing.

If i modify the document in EDocman Back-end, i can't see the field "indexed_content". This is a need for administrative users. It would be very nice, if the visibility of this field can be changed under "Configuration - Fields".

In the Backend Table view for Documents, the adminuser should see, if there is content in the field "indexed_content" or if it's NULL. Can you please add a column for that?

The search plugin needs also to be extended to search inside the field "indexed_content".

I think implementing these features is not time intensive. What do you think?

Kindest regards

Rolf
Last edit: 10 years 7 months ago by wuegi.

Please Log in or Create an account to join the conversation.

Moderators: Mr. Dam