Domino hungs on system databases being locked

Hi

The problem is that administration processes such as compacting databases produce dead locks. For example compact is scheduled in program documents to compact databases . When compact starts, server hungs because other processes want to write to log.nsf and are not allowed because of dead locks. Server does not respond to client requests neither to console commands.

Killing domino and restarting it from the scratch solves problem until program document starts compact again :frowning:

12/27/2010 09:10:24 PM Compacting log.nsf (xxxxLT01’s Log), -S 5 -B log.nsf

Clearing DBIID F1825326 for DB D:\Lotus\Domino\Data\log.nsf

LkMgr BEGIN Long Held Lock Dump ------------------

Lock(Mode=X * LockID(DB DB=D:\Lotus\Domino\Data\log.nsf)) Waiters countNonIntentLocks = 2 countIntentLocks = 0, queuLength = 45

Req(Status=Granted Mode=X Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [122C:0004-0A34])

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:169

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:01AF-08B4] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0125-0C78] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0291-0CA8] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0460-0C98] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:04BD-0B34] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:01CE-0C5C] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0630-08C8] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:026E-0CB0] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0194-0308] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0124-08FC] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:04DB-0C88] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:011C-0B04] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:007C-0DA0] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0077-0AE0] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0660-0C80] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:016D-0C44] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0187-0C04] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:055E-0C64] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0287-0C20] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:04A3-098C] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0687-0894] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0263-0C2C] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0F80:0002-12EC] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0476-0C40] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:027C-0904] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0225-0C7C] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:04DF-08C0] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:04EE-08BC] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:01B9-0A14] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:00E1-0C4C] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0130-0C14] Delay=1min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0635-0B98] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0676-08B8] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0566-0768] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:00CA-0A08] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:068A-0698] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:01C5-0CA0] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0143-0C74] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:064A-0B40] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:00F4-099C] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:054A-0B2C] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:046E-0C94] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:0542-0C18] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:04A0-0CAC] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:01CB-06E8] Delay=0min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Granted Mode=S Class=Manual Nest=1 Cnt=1

   Tran=0 Func=N/A  [122C:0004-0A34])

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

index_buf_c:2008

index_dbslot_c:733

index_ehashr6_c:5728

index_ehashr6_c:5658

dbunid_c:1783

dbfixup_c:653

LkMgr END Long Held Lock Dump ------------------

Lock(Mode=X * LockID(DB DB=D:\Lotus\Domino\Data\log.nsf)) Waiters countNonIntentLocks = 2 countIntentLocks = 0, queuLength = 55

Lock(Mode=X * LockID(DB DB=D:\Lotus\Domino\Data\log.nsf)) Waiters countNonIntentLocks = 2 countIntentLocks = 0, queuLength = 65

Lock(Mode=X * LockID(DB DB=D:\Lotus\Domino\Data\log.nsf)) Waiters countNonIntentLocks = 2 countIntentLocks = 0, queuLength = 74

Thus queuLength is growing, server does not respond to clients neither to console commands.

Other databases such as admin4.nsf and other are also affected:

LkMgr BEGIN Long Held Lock Dump ------------------

Lock(Mode=X * LockID(DB DB=D:\Lotus\Domino\Data\admin4.nsf)) Waiters countNonIntentLocks = 2 countIntentLocks = 0, queuLength = 3

Req(Status=Granted Mode=X Class=Manual Nest=0 Cnt=2

   Tran=0 Func=N/A  [0D8C:0004-0E60])

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:169

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0D14:0002-0000] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0FBC:0002-0FC0] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

nsfsem1_c:1139

Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=1

   Tran=0 Func=N/A  [0BB0:009A-0000] Delay=2min)

rm_lkmgr_cpp:2029

rm_lkmgr_cpp:1281

nsfsem1_c:533

inplace_c:121

srstart_c:214

It seems that IBM has several things to fix in 8.5.2 release…

8.5.1 FP4 had the same problems, even mail databases were affected.

Regards

Ramunas

Subject: take a look at this technote

Your log details seem similar to the bug mentioned in this technote:Domino server hangs due to long held lock

http://www-01.ibm.com/support/docview.wss?uid=swg21449337

The bug should be fixed in the 852 FP1 which is available

good luck !

Giannandrea

Subject: 8.5.2 FP1 has the same errors

I have all servers FP1 applied. All of them have the same problem when compact is running.

Unfortunately compact is not the only process that provokes this error to appear. I have issue when no compact nor daos resync nor other admin tasks were running and “LkMgr BEGIN Long Held Lock Dump” . As always server died after some hours only printing those messages and not responding.

Subject: 8.5.2 FP2 has the same errors

LkMgr BEGIN Long Held Lock Dump ------------------Lock(Mode=SIX* LockID(DB DB=D:\Lotus\Domino\Data\daoscat.nsf)) Waiters countNonIntentLocks = 1 countIntentLocks = 2, queuLength = 27

Req(Status=Granted Mode=SIX Class=Manual Nest=1 Cnt=1

   Tran=161199388 Func=N/A nsfsem1.c:1314 [1280:0027-109C])

Subject: I’m seeing this too in 8.5.2fp2

The symptom is that the server ‘stalls’ for about two minutes while this dump occurs, then it resumes accepting requests as if it never happened.

We have a PMR open, but IBM can’t seem to solve it. Any ideas?

[123C:0011-1284] 08/31/2011 12:47:45 PM Router: Transferred 1 messages to SERVERNAME.COM (host SERVERNAME.COM) via SMTP

[0BEC:0129-0E1C] LkMgr BEGIN Long Held Lock Dump ------------------

[0BEC:0129-0E1C] Lock(Mode=S * LockID(DB DB=F:\Lotus\Domino\DATA\mail.box)) Waiters countNonIntentLocks = 1 countIntentLocks = 0, queuLength = 2

[0BEC:0129-0E1C] Req(Status=Granted Mode=S Class=Manual Nest=0 Cnt=1

  Tran=0 Func=N/A dbbitmap.c:1215 [0BEC:0119-0D84]) 

[0BEC:0129-0E1C] rm_lkmgr_cpp:2043

rm_lkmgr_cpp:1293

nsfsem1_c:533

nsfsem1_c:1139

dbbitmap_c:1202

[0BEC:0129-0E1C] Req(Status=Waiting Mode=SIX Class=Manual Nest=0 Cnt=0

  Tran=0 Func=N/A ntupwrap.c:843 [1278:0045-16F4] Delay=0min) 

[0BEC:0129-0E1C] rm_lkmgr_cpp:2043

rm_lkmgr_cpp:1293

nsfsem1_c:169

nsfsem1_c:1018

ntupwrap_c:835

[0BEC:0129-0E1C] Req(Status=Waiting Mode=S Class=Manual Nest=0 Cnt=0

  Tran=0 Func=N/A nsfsem4.c:532 [1278:0022-1624] Delay=0min) 

[0BEC:0129-0E1C] rm_lkmgr_cpp:2043

rm_lkmgr_cpp:1293

nsfsem1_c:533

nsfsem1_c:1139

[0BEC:0129-0E1C] LkMgr END Long Held Lock Dump ------------------

[123C:0028-1324] Deferring DbClose of F:\Lotus\Domino\DATA\mail.box until transaction end, err=0 [123C:0028-1324]

[0BEC:10ED-12DC] 08/31/2011 12:49:00 PM Failing over from SERVER/DOMAIN!!catalog.nsf, directing open to CLUSTERPARTNER/DOMAIN

Subject: I’m seeing this too on 8.5.3FP3

I am also facing some issue on domino 8.5.3FP3 on linux for zmachine.

Subject: Same issue … get rid of after new log.nsf

Not a solution but server runs after new creation of log.nsf.

Subject: LkMgr in 9.0.1 FP1

I have the same problems in 9.0.1 FP1 , even databases are affected, mail.box include.I have add some memory and CPU, it’s better but not a good solution. I can fix some database, with load compact -D <database.nsf> and load updall -C <database.nsf> but it’s very long

Subject: Happening in 7.0.2 as well

Just this week we are noticing the same thing. Compact job starts at 3 am and compacts the log.nsf file. Long held lock dumps occur for 4 hours with queue lengths reaching > 40. By 7 am our users start to come in and say the system is slow and unresponsive.

Restarting always helps. We had the issue on Monday, then again on Thursday. Both times it was compacting the log.nsf file. The other days inbetween it never tried to compact the log.nsf file.

As a countermeasure today we had shutdown domino, renamed the log.nsf file. Added Log_DisableTXNLogging=1 to the notes.ini file and started Domino. Keeping fingers crossed.

http://www-10.lotus.com/ldd/dominowiki.nsf/dx/log_disabletxnlogging

Good luck.