We use FTSearch within an web-agent and the result on a Windows-Server in fine, but LINUX-Server does not find anything. This occurs, if there are some special characters in the searchstring like - or +.
In this example I search for “oral-b” within a specific form.
Result on Windows-Server = 31 documents
Result on LINUX-Server = 0 documents
Database runs in a cluster, FT-Index is fine. The used function is
Set searchCollection = db.FTSearch(searchQuery, maxdoc, FT_Scores, extend)
Ok, figured out some more details. If a word with a - in it and there’s only one character before or behind it, the search on LINUX-Server does not find the result.
Samples
oral-b could not be found with “oral-b”
oral-b could not be found with “oral-”
a-silikone could not be found with “a-silikone”
occlu-print could be found with
“occlu-print”
“clu-pri”
“occlu-p”
and more…
As mentioned, this only happens on a LINUX based server with db.FTSearch, on a WINDOWS server this error does not occur.
So this problem could be based on FTSearch or different indexing of the database on LINUX servers.
Summary of above posting- Searching a full text index of a database on a Linux server fails to find search terms in .docx and .pdf files. Searching a local replica of the server database that is indexed finds terms in .docx and .pdf.
Since the above post I’ve done additional testing.
I tried moving the full text index directories and files that are working on the local XP Machine (Notes client 8.5.2) over to the Linux server (Domino 8.5.2)and changed ownership appropriately.
I then searched on the server and found that searches for terms that are in PDF or .docx files are not found while terms in .doc and .txt file attachments are found.
Next, I deleted and then recreated the FTI on the Linux server and moved it over to the local machine. When I searched on the local machine terms were found in .doc and .txt files but not in .docx and .pdf.
I understand that the keyview filters added support for MS Office 2007 documents starting in 8.0.2. What’s concerning is that in addition to not finding terms in 2007 office documents, I’m not seeing them in any PDF version either.
This seems to me like it’s related to Linux somehow. It would be nice to know if anyone else is having this issue or if it’s just me. I know little about Linux and I get the feeling that I may be missing something simple.
Subject: Fix full text search results within PDF and DocX attachments
We found that several installations of Lotus Domino 8.5.1 and 8.5.2 had issues when searching a full-text indexed database with attachment conversion filters turned on. When searching for content within the attachment, no results were displayed.
This is due to a wrong characterset in the keyview settings. This can be fixed by adding the following notes.ini entries:
FT_BINARY_FILTER_OFF=0
OS400_KEYVIEW_CSID=0052
PLATFORM_CSID=052
Where 0052 stands for ISO 1252 West European Latin. After changing the ini-settings, please completely remove the FT-index and re-create it.
Hi, Thanks for your nice sharing.When it comes to pdf processing, I have another question to ask you. I wonder whether text extraction from pdf files is much simpler than pdf to text conversion process. Thre’s something wrong with my pdf viewer. I want to look for a method to help with the relevant process. It will be better if it also offers free trial package for users to check.
If so, I will try it later and send you feedback soon. Any suggestion will be appreciated. Thanks in advance.