FTSearch Windows and Linux - different result

We use FTSearch within an web-agent and the result on a Windows-Server in fine, but LINUX-Server does not find anything. This occurs, if there are some special characters in the searchstring like - or +.

In this example I search for “oral-b” within a specific form.

Result on Windows-Server = 31 documents

Result on LINUX-Server = 0 documents

Database runs in a cluster, FT-Index is fine. The used function is

Set searchCollection = db.FTSearch(searchQuery, maxdoc, FT_Scores, extend)

lngSearchCount = searchCollection.Count

maxdoc is 50

FT_Scores is 8

extend is nothing

We are using FP4.

Any ideas?

Subject: Use quotes around your search terms.

And, are you sure both DBs are full-text indexed? This doesn’t sound likely to be an OS difference – the search code is the same on all platforms.

Subject: More details on this problem

Ok, figured out some more details. If a word with a - in it and there’s only one character before or behind it, the search on LINUX-Server does not find the result.

Samples

oral-b could not be found with “oral-b

oral-b could not be found with “oral-

a-silikone could not be found with “a-silikone

occlu-print could be found with

occlu-print

clu-pri

occlu-p

and more…

As mentioned, this only happens on a LINUX based server with db.FTSearch, on a WINDOWS server this error does not occur.

So this problem could be based on FTSearch or different indexing of the database on LINUX servers.

Subject: I’m having issues with full text searches on Linux too

Post: http://www-10.lotus.com/ldd/nd85forum.nsf/DateAllThreadedWeb/979fb170cc4de6ac852577b3004b1513?OpenDocument

Summary of above posting- Searching a full text index of a database on a Linux server fails to find search terms in .docx and .pdf files. Searching a local replica of the server database that is indexed finds terms in .docx and .pdf.

Since the above post I’ve done additional testing.

I tried moving the full text index directories and files that are working on the local XP Machine (Notes client 8.5.2) over to the Linux server (Domino 8.5.2)and changed ownership appropriately.

I then searched on the server and found that searches for terms that are in PDF or .docx files are not found while terms in .doc and .txt file attachments are found.

Next, I deleted and then recreated the FTI on the Linux server and moved it over to the local machine. When I searched on the local machine terms were found in .doc and .txt files but not in .docx and .pdf.

I understand that the keyview filters added support for MS Office 2007 documents starting in 8.0.2. What’s concerning is that in addition to not finding terms in 2007 office documents, I’m not seeing them in any PDF version either.

This seems to me like it’s related to Linux somehow. It would be nice to know if anyone else is having this issue or if it’s just me. I know little about Linux and I get the feeling that I may be missing something simple.

Subject: Same Result with 8.51 FP5 and 8.52

Figured iut, that the problem is still there. Opened a PMR.

Subject: Fix full text search results within PDF and DocX attachments

We found that several installations of Lotus Domino 8.5.1 and 8.5.2 had issues when searching a full-text indexed database with attachment conversion filters turned on. When searching for content within the attachment, no results were displayed.

This is due to a wrong characterset in the keyview settings. This can be fixed by adding the following notes.ini entries:

FT_BINARY_FILTER_OFF=0

OS400_KEYVIEW_CSID=0052

PLATFORM_CSID=052

Where 0052 stands for ISO 1252 West European Latin. After changing the ini-settings, please completely remove the FT-index and re-create it.

See also:

Subject: How about?

Hi, Thanks for your nice sharing.When it comes to pdf processing, I have another question to ask you. I wonder whether text extraction from pdf files is much simpler than pdf to text conversion process. Thre’s something wrong with my pdf viewer. I want to look for a method to help with the relevant process. It will be better if it also offers free trial package for users to check.

If so, I will try it later and send you feedback soon. Any suggestion will be appreciated. Thanks in advance.

Best regards,

Pan