Server Crash

We have experienced a number of Sever crash’s over the last week and when examining the NSD logs there are a number of errors as follows:

Invalid stack frame detected: Invalid frame pointer (BP): 0

or

Invalid stack frame detected: Unable to read process memory for frame

But for all Highlight errors we are seeing the same as below:

High number of handles (15309) for shared memory block 0x824b - BLK_OPENED_NOTE

When checking the web there are a number of articles covering the use of Notes.ini settings SERVER_MAX_NOTEOPEN_MEMORY_MB to avoid server crash’s.

The only thing is there are no recommendations as to what this setting should be, for example we have 4gb installed in our server but don’t have a clue what this setting should be.

I would really appreciate if anybody can help with this?

Subject: Server Crash

Hi,

Post the FATAL THREAD section of the NSD files with the few lines under it

For example:

############################################################

FATAL THREAD 1/2 [ domdsm:107a8:67076]

FP=0x0012a34c, PC=0x60069a86, SP=0x0012a2f0, stksize=92

EAX=0x01a3088c, EBX=0x01a48650, ECX=0x0000bffe, EDX=0x01a325e6

ESI=0x00004000, EDI=0x0032ff78, CS=0x0000001b, SS=0x00000023

DS=0x00000023, ES=0x00000023, FS=0x0000003b, GS=0x00000000 Flags=0x00010206

Exception code: c0000005 (ACCESS_VIOLATION)

############################################################

@[ 1] 0x60069a86 nnotes._fdDelete@4+118 (4000,12a368,608770ef,4000)

@[ 2] 0x60069a0c nnotes._OSFileClose@4+12 (4000,12adc0,12a380,60872ea7)

@[ 3] 0x608770ef nnotes._sqloclose+15 (4000,12b59c,5010,0)

@[ 4] 0x60872ea7 nnotes._sqlpgclf+71 (12adc0,1aaa2cb,70,1aaa2f3)

@[ 5] 0x60885458 nnotes._sqlpgfdl+616 (12b9a4,1aaa2cb,41,12be44)

@[ 6] 0x60876a6f nnotes._sqlpgcdl+927 (12bf44,12b9a4,0,12be44).

Date

Topic

Frequently asked question - How to analyze Notes/Domino NSDs? (Jean-Yves Riverin)

Or call Lotus support

JYR

Subject: RE: Server Crash

Hi,

Below are a few examples of the Fatal Thread log:

1st one:

############################################################

FATAL THREAD 33/57 [ nhttp: 0500: 0f00]

FP=0x1135fdc4, PC=0x601913e7, SP=0x1135f24c

stkbase=11360000, total stksize=262144, used stksize=3508

EAX=0x010f088c, EBX=0x00000000, ECX=0x00bc0000, EDX=0x00bc0000

ESI=0x1135f958, EDI=0x00000000, CS=0x0000001b, SS=0x00000023

DS=0x00000023, ES=0x00000023, FS=0x0000003b, GS=0x00000000 Flags=0x00010202

Exception code: c0000005 (ACCESS_VIOLATION)

############################################################

@[ 1] 0x601913e7 nnotes._Panic@4+631 (60a30013)

@[ 2] 0x6019111c nnotes._Halt@4+28 (113501b1)

@[ 3] 0x6011396d nnotes._AccessAllProtected@0+77 ()

@[ 4] 0x6004e19f nnotes._AccessAll@8+47 (1,1)

@[ 5] 0x6004f095 nnotes._ProcessGlobalEvent@4+21 (1002cfc)

@[ 6] 0x6004efa1 nnotes._OSProcessShouldQuit@0+49 ()

@[ 7] 0x600d531d nnotes._OSWaitEvent@8+29 (f2edcf0,bb8)

@[ 8] 0x1000f087 nhttpstack.HTEvent::Wait+23 (bb8,3)

@[ 9] 0x1002b8e9 nhttpstack.HTWorkerThread::ThreadMain+105 (f2edcd4,0)

@[10] 0x60114e64 nnotes._ThreadWrapper@4+212 (0)

@[11] 0x77e6608b KERNEL32.GetModuleFileNameA+235

Invalid stack frame detected: Unable to read process memory for frame

2nd one:

############################################################

FATAL THREAD 59/88 [ nserver: 03d8: 0f0c]

FP=0x1579fd50, PC=0x601913e7, SP=0x1579f1d8

stkbase=157a0000, total stksize=262144, used stksize=3624

EAX=0x00d5088c, EBX=0x00000000, ECX=0x00820000, EDX=0x00820000

ESI=0x1579f8e4, EDI=0x00000000, CS=0x0000001b, SS=0x00000023

DS=0x00000023, ES=0x00000023, FS=0x0000003b, GS=0x00000000 Flags=0x00010202

Exception code: c0000005 (ACCESS_VIOLATION)

############################################################

@[ 1] 0x601913e7 nnotes._Panic@4+631 (60a30013)

@[ 2] 0x6019111c nnotes._Halt@4+28 (157901b1)

@[ 3] 0x6011396d nnotes._AccessAllProtected@0+77 ()

@[ 4] 0x6004e19f nnotes._AccessAll@8+47 (1,1)

@[ 5] 0x6004f095 nnotes._ProcessGlobalEvent@4+21 (c62cfc)

@[ 6] 0x6004efa1 nnotes._OSProcessShouldQuit@0+49 ()

@[ 7] 0x100016cb nserverl._Scheduler@4+763 (0)

@[ 8] 0x60114e64 nnotes._ThreadWrapper@4+212 (0)

@[ 9] 0x77e6608b KERNEL32.GetModuleFileNameA+235

Invalid stack frame detected: Unable to read process memory for frame

3rd one

############################################################

FATAL THREAD 3/11 [ namgr: 0f3c: 14c8]

FP=0x61e6fe38, PC=0x601913e7, SP=0x61e6f2c0

stkbase=61e70000, total stksize=262144, used stksize=3392

EAX=0x010e088c, EBX=0x00000000, ECX=0x008c0000, EDX=0x008c0000

ESI=0x61e6f9cc, EDI=0x00000000, CS=0x0000001b, SS=0x00000023

DS=0x00000023, ES=0x00000023, FS=0x0000003b, GS=0x00000000 Flags=0x00010202

Exception code: c0000005 (ACCESS_VIOLATION)

############################################################

@[ 1] 0x601913e7 nnotes._Panic@4+631 (60a30013)

@[ 2] 0x6019111c nnotes._Halt@4+28 (61e601b1)

@[ 3] 0x6011396d nnotes._AccessAllProtected@0+77 ()

@[ 4] 0x6004e19f nnotes._AccessAll@8+47 (1,1)

@[ 5] 0x6004f095 nnotes._ProcessGlobalEvent@4+21 (d02cfc)

@[ 6] 0x6004efa1 nnotes._OSProcessShouldQuit@0+49 ()

@[ 7] 0x77e6608b KERNEL32.GetModuleFileNameA+235 ()

Invalid stack frame detected: Invalid frame pointer (BP): 0

regards

Kevin

Subject: RE: Server Crash

First, JYR’s NSD FAQ is darn good.

Second, I have linked to an NSD analyzer tool in this post:

http://www.ns-tech.com/blog/geldred.nsf/d6plinks/GELD-77E2B7

Finally, as we all recommend (and JYR did as well, in his reply), call Lotus Support if you can’t get your answers either in this thread or in the Support pages (link is on the right).

Gregg

Subject: RE: Server Crash

JYR’s NSD FAQ is darn good.

As many said :-), you should always call IBM even if you have a (not perfect) solution. In this way, a PMR will be open and IBM will be aware of the problem to provide hopefully a definitive solution.

JYR

Subject: RE: Server Crash

Thanks for your help, unfortunately our company has cancelled IBM support contract so I cant call them.

I have already loaded NSD files into NSD analyser tool but cannot see the dB causing the issue yet.

Subject: RE: Server Crash

You should have a section called Open database with the name of the databases that were open by the process at the moment of your crash.

JYR

Subject: RE: Server Crash

Hi JYR,

I have already checked the open database by process section and unfortunately the only consistent database is names.nsf.

Kevin

Subject: RE: Server Crash

if you want, send me your nsd, i could give a try

jriverin

at

g mail dot com

JYR

Subject: RE: Server Crash

JYR, that is extremely nice of you. Good luck with your analysis.

Gregg

Subject: RE: Server Crash

Bah, it costs nothing to try

I’m doing overtime, it’s gonna change my mind

JYR

Subject: RE: Server Crash

Domino crash or high shared memory usage may occur when dealing with large document

Solution This issue has been reported to Quality Engineering as SPR# RGET6MS6WB. Improvements have been made in Domino 8.0 which will help reduce the observation of this issue.
Workaround:
Attempt to identify the document in question and modify it as applicable or remove it from being routed.

Normally, you should be able , with the NSD file, to identify the database and the document.

JYR

Subject: Some observations

The Fatal stacks you posted are almost identical, though from 3 different processes - server, http and amgr. The critical call, just before the halt/panic is AccessAll/AccessAllProtected. The purpose of this function is to map all Domino shared memory into the process. For example, some process allocates some shared memory. It could be a large or small allocation, but there is not enough free memory in the Domino pool so Domino has to go get another hunk of shared memory from the operating system (note that a large allocation is more likely to cause this to happen than a small allocation). Now, it is the responsibility for all server tasks to check at least once every 5 seconds to see if it is time to quit (calls OSProcessShouldQuit). This is also a convenient time to see if any process has created a new shared memory hunk and map that into the process.

Now as you know, Windows 32bit operating systems allow 2Gb of address space for user data in a process. In order to map in the new shared memory hunk Windows has to find an available slice of that 2gb space at least as big as the hunk. If it can’t do so then the map call fails and the process, by design, Panics. The fact that 3 distinct processes Paniced in this way tells you that this is exactly what happened, and that the allocator was some process other than those 3.

So now what you’ve got to do is examine the NSD log to find out where all your memory is going so that there isn’t enough left to do the mapping. The other possibility is that some application caused an enormous allocation which no process is ever going to succeed in mapping. It is the job of the analyst (in this case you since you can’t call support) to figure it out. You seem to be off to a good start by noticing the large amount of BLK_OPENED_NOTE handles. Question is though, how much memory does it consume? The top 10 section may tell you. You can also look for a line like this:

   P-g-- 0x824b count= 95, size=  3450462, BLK_OPENED_NOTE

Once you determine and validate the cause of high memory usage you can then take steps to correct the problem. Note that the number of possible causes likely approaches infinity. The SPR mentioned, RGET6MS6WB, is only one of the possible causes. The memcheck section of the NSD log contains lots of information about what is in your memory space and should help you track down the cause.

Two additional blurbs. First, if you run under Windows 2003 64 bit then you will automatically get the full 4Gb user address space (nearly) if you are running Domino 7.01 or later. This will help a lot, but if you have a memory leak will only delay the inevitable crash.

Second, ignore the lines at the end of the stacks about Invalid stack frame detected: Invalid frame pointer (BP): 0 or Invalid stack frame detected: Unable to read process memory for frame messages. Windows 32 doesn’t provide a deterministic method for indicating you’ve reached the top of the stack and these messages were intended to indicate why NSD decided to stop. They ended up causing more confusion than being helpful so they were removed in later releases.

Hope this helps