Data Directory Size

We have an issue where we have 2 servers in a cluster and they keep crashing on a regular basis (maximum up time of 10days)

We believe it is load on the server so a question more than anything is what would be the maximum size of the data directory?

We have some very large mail files (>10GB)

Some 2400 nsf files (mail and romaing databases)

Total data size is over 1.4TB

Transaction logging enabled

Servers have been rebuilt several times (OS/Domino Install)

Happened when both Physical and now on Virtual

Any advice is welcome

Subject: Data Directory Size

Not knowing the OS you are running there are some issues we have seen when your data drive is 500+ GB in size due to the OS Page Pool limitations. The Page Pool size has been increased significantly in the 64 Bit version which will allow for larger environments in the future.

http://www-10.lotus.com/ldd/dominowiki.nsf/dx/Knowledge_Collection__Hardware_or_Operating_System_error-Insufficient_system_resources_exist_to_complete_the_requested_service

http://www-01.ibm.com/support/docview.wss?uid=swg21093511

Given the size of your environment one suggestion across the board is to ensure there is sufficient disk space on the server (15-20%) free to and ensure there are no issues with your disk.

If you continue to crash it is strongly suggested to open a ticket with IBM to review the NSD files.

You can also post the Fatal Thread from the NSD to see if it is a known issue.

Open the NSD and Search for “Fatal”.

Here is a template I use to step through the NSD’s on Windows Servers:

  1. The first thing that I do when analyzing an NSD is look at the Name of the Server, Date & Time, OS Version and Notes Version.

  2. Once we have verified the Build I then search for “OS Process” to see what is running on the server, when the OS was rebooted and we can see when the server crashed outlined in RED.

  3. Now that we have reviewed the OS process we want to see what task the server crash on so we go to the top of the NSD and search for the word “FATAL” which will bring us down to the fatal thread. We then review the fatal thread to see if it a known issue and we continue our search this time searching using the 8 digits after the failing task following: “xxxx: xxxx” example would be “16e0: 16ac”

Example:

############################################################

thread 10/175: [ nSERVER: 16e0: 16ac]

FP=0x211cf0a8, PC=0x7c82860c, SP=0x211cf038

stkbase=0x211d0000, total stksize=262144, used stksize=4040

############################################################

  1. After a couple of passes through the Fatal Thread stack frames we continue Searching on “xxxx: xxxx” using example “16e0: 16ac” and search until we reach the MM/OS section

  2. Once we reach the MM/OS section we can see when the server was started and again we see the reference to the FATAL Stack : “xxxx: xxxx” using example “16e0: 16ac” and continue to search the NSD until we hit the Vthread Mapped to Pthread.

6.) Now that we have reached the Vthread mapped to the Pthread we can see x # of databases if any. One being xxx.nsf which shows x # of documents open (class=0001) or say Agents open (class=0200) or a specific form (class=0004) in this database at the time of the crash and this is the database I would focus on. I would run maintenance on these database if you continued to see the crash with this database.

7.) To Review the Shared Memory I go back to the Top of the NSD and search “-dpool” to review the shared memory