Hello,I am just wondering if a server task or extension manager can cause a Domino server to crash if IBM technicians can gather enough information to troubleshoot the problem?
I am not saying that a problem can or can not exist within a third party product. But when Domino decides it needs to crash because it ran out of memory, handles, disk space, a program does something wrong, etc., ultimately Domino decided that enough is enough and then call the NSD program. Can the information from the why the Domino server decided to crash be obtained? Wouldn’t this information be useful to the third party vender?
We hear this frequently when a fatal_error occurs on a third party product to contact that vender. But if we knew more details as to why Domino decided to crash it could point to exactly or more closely to where in the third party product the problem exist.
There should be something beneficial from the Domino side that can help speed the troubleshooting efforts of the third party vender. I am just wondering what other people’s thoughts on this are.
An example of where this information would be very useful is something like this: A particular task has a memory leak and does not release handles properly. Let’s say it uses all of the handles and then another task comes along and says I need a handle, Domino past it’s limitation and when the crash occurs it looks like the second task caused the crash even though the first task actually caused the problem.
We have a problem where a product causes a crash but this clearly only started when using 6.5.2, 6.5.3, and 6.5.3FP1. Is there anything that should be looked at from the migration stand point like if a server has been upgraded from R5 to R6, databases changed, user ids not matching pubic key, etc. Or could it only be simply that something changed within 6.5.x versions that cause problem?
Subject: Can IBM troubleshoot any Domino Server crash?
NSD reports give you the chance of telling what causes a crash.
given your example of an extension manager:
having some experience as programmer you will be able to tell from looking at nsd-file that extmgr was involved. there’s for sure a chance that extmgr just called a notes-function that was e.g. buggy, but you’ve got callstack that helps a bit in deciding who’s to blame here. lotus support for sure can go deeper inside such an issue, because they have the source code and exact line where crash occured.
next thing: i don’t know if default nsd reports contain these values, but you can setup nsd to include a complete list of open handles and resources, so you (or some developer) should be able to tell if e.g. an extmgr doesn’t free handles to domino objects properly.
since server tasks are not running independent from each other it’s not unlikely that bad things in one can cause another to crash.
but i really think that since you have that suspicion it’s maybe a good idea to discuss this issue with third party vendor.
or you could add in someone with experience in troubleshooting such things (aix, nsd, addins) like Daniel Nashed…
Subject: RE: Can IBM troubleshoot any Domino Server crash?
Thanks for the repsonse.
I just don’t know, in a case I am looking at the crashes happen all over the place. If the crashes were more consistant it might make more sense and easier to get developers to put extra debugging code in one maybe two areas. But when the crashes occur in many different areas of the code and at different times and the fact that all cases are coming back with 6.5.2, 6.5.3, and 6.5.3FP1 it is a hard fact to not recognize. But I also realize that these versions may not be at fault as well because something could be related to the migration. But being I have never done a migration from a R5.0.12 to R6.5.2 I am not sure of what things could need to be done for certain problems that may occur.
For example, I seen an Acces_Violation and the task crashed because of it, and in the console we saw a message saying the public key did not match the user.id. Simply we could determine where the problem is and add a check for access, but the warning in the console should also be contended with by an administrator to get that problem resolved as well.
Subject: RE: Can IBM troubleshoot any Domino Server crash?
I would like to be able to work with someone who could look at the memory dumps, NSDs, log files, etc. that can help me get to the bottom of this problem but I am not really sure how. Our developer opened a PMR with IBM about this problem but later I found out that she specifically only asked a couple questions as to why something was in the NSD to see if it might lead to why the crash occurred. I think it would be better to take a sample set of files for one specific crash and have a IBM technician use the proprietary tools necessary to give hopefully some possitive feedback other than the fatal_error happened on your server task and so it is your problem. Plus this will get data from two directions.