Search Index for Files cannot keep up

We are seeing this message in the search indexer logs

[5/18/22 8:31:20:150 CEST] 0000049d FilesPostProc I com.ibm.connections.search.index.process.postprocessors.FilesPostProcessor index CLFRW1081I: Files content indexing has completed in 19,370 ms. From an initial 2,594 documents waiting to be processed, 1,974 were processed and 20 have completed content indexing.

This is constant and users are complaining they cannot find their recent uploaded documents, we increased the number of index threads to 3, but it does not seem to help. What else can be done to increase the speed of indexing?

Hi Wannes,

We have the same issue, but what might help you is to increase the timeout of indexing task.

Thanks, any tips on where to do so?

Hi Wannes,

is a setting in the scheduled tasks

wsadmin>SearchService.listFileContentIndexingTasks()
File Content Indexing Task Name: bld-f1-file-indexing-task, Schedule: 0 11,31,51 7-20 ? * MON,TUE,WED,THU,FRI, Start-By Schedule: 0 13,33,54 7-20 ? * MON,TUE,WED,THU,FRI, Services all_configured , Time Limit 300 , Enabled true
File Content Indexing Task Name: bld-f2-file-indexing-task, Schedule: 0 11,41 0,2,3,4,5,6,21,22,23 ? * MON,TUE,WED,THU,FRI, Start-By Schedule: 0 15,45 0,2,3,4,5,6,21,22,23 ? * MON,TUE,WED,THU,FRI, Services all_configured , Time Limit 900 , Enabled true
File Content Indexing Task Name: bld-f3-file-indexing-task, Schedule: 0 15 0,2-23 ? * SAT,SUN, Start-By Schedule: 0 20 0,2-23 ? * SAT,SUN, Services all_configured , Time Limit 1,800 , Enabled true

We have 3 tasks , during office hours , after hours , and weekend.

Just implemented this, let's see if it catches up tonight and over the weekend, thanks!

Hi Wannes,

In one of my cases, my customer had lots of documents(some documents are with big file size) to be indexed. Index may stuck at some point and won't process further, and may also had high CPU issue as well.

They got issue fixed by editing pageSize to a smaller number in search-config.xml with following:

From

<crawlerSettings persistenceLocation="${CRAWLER_PAGE_PERSISTENCE_DIR}"
pageSize="500" maxCrawlerThreads="3" />

To

<crawlerSettings persistenceLocation="${CRAWLER_PAGE_PERSISTENCE_DIR}"
pageSize="100" maxCrawlerThreads="3" />

Hope my suggestion can help you a little bit.

Thanks

Rock