DDM Health.MemoryUtil.Value and Delete

I am using DDM to monitor Health statistics on several of our servers at Dennys. I have been using the Health.MemoryUtil.Value statistic to track memory utilization. It appears to have a direct impact on the Health.Overall.Value. If MemoryUtil.Value goes up to 97%, then the Health.Overall.Value follows.

Our main Mail server and Application server climbed to 97% Health.MemoryUtil.Value this morning and have stayed there all day. There is a corresponding 97% Health.Overall.Value (which is bad!).

The problem is that the server appears to be responding well at these times. Is Health.Overall.Value incorrect? Is the memory truly locked up and unavailable or is it that the “Garbage Collector” has not added it back to the shared memory pool.

I just ran a test on our Development server and monitored with DDM. It copied 150,000 documents to another database, and even though I used Delete to release document memory before setting the document variable to the next document, memory utilization still climbed to 97%. Why?

I just read in a thread dated Mar 2008 that Andre Guirard said that memory managment is being handled different now in later versions of Notes (Wer’e running 7.0.3) and Delete is not important anymore.)

How is memory being managed now when we are running schedule LotusScript agents? Are there any new articles that explain what is going on? Does DDM show reliable statistics? Which DDM Memory statistics should I be watching?

Subject: DDM Health.MemoryUtil.Value and Delete

Hi Furman

We have the exact same experience here at our serverpark. Most of our servers, use 97% memory. Have you got a solution, to cfg. the monotoring to reflect the correct numbers or have you just disabled memory monitoring (I think I’ll do that soon)

Regards

Claus

Subject: re: DDM Health.MemoryUtil.Value

The Health statistics are generated by the Domino Administrator. Quite a bit of documentation can be found in the administrator help under Monitoring / Server Health Monitoring (SHM). This client-side feature was introduced with N/D 6 as a separate product named IBM Tivoli Analyzer for Lotus Domino. For N/D 7 this feature was rolled into the core product. See http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp

Specific details about how SHM the memory assessment is generated can be found at the bottom of this post. The algorithm sensitivity can be adjusted via the Configuration / Index Thresholds in the client dommon.nsf. SHM algorithms have not been updated since they were first introduced. If the SHM memory reports are misleading for a particular server, the memory component can be disabled for that server in dommon.nsf under Configuration / Server Components.

DDM is a completely separate server-side feature introduced in N/D 7. The DDM memory probes and their results are easier to understand than the SHM indices. Look in the administrator help under Monitoring / Domino Domain Monitoring for details.

Memory Utilization

When Memory Utilization is included in the Health Report

This component appears if ALL of the following are true:

	1. Domino version is R5.0.2 or greater

	2. Platform Stats are Enabled 

	3. OS = Windows NT/2000

		OS/400

		Solaris (D6 only)

		AIX (D6 only)

Note: 	For Solaris version 5.8, the Memory component may always = 0 because the Scan Rate metric	

		used in Memory analysis appears  to always = 0 

Windows NT and Windows 2000

Statistics used:

Amount of Free/Available Memory 

	Platform.Memory.KBFree			R5.x

	Platform.Memory.RAM.AvailMBytes  	R6

Amount of Installed Memory

	Mem.PhysicalRAM				R5.x

	Platform.Memory.RAM.TotalMBytes	R6

Note: For Win32 platforms, the Memory Utilization component of the Server Health Monitor

is based on available physical memory. For the sake of simplicity, call the Free Memory statistic

“RAM.Available” (in MB) and the Installed Memory stat “RAM.Total” (also in MB).

For a system with RAM.Total > 2 GB, the maximum usable amount of Memory is actually about 2.1 GB,

in which case, the reported RAM.Usable is misleading. For example, a system with 8 GB RAM, and 1.9 GB used,

will report RAM.Usable = 8 GB - 1.9 GB = 6.1 GB, but on a small amount of the 6.1 GB (~200 MB) is really usable.

So, if the reported RAM.Total > 2.1 GB, the SHM adjusts RAM.Available as follows

	RAM.Available = 2150 - (RAM.Total - RAM.Available)).  



Memory Utilization Rating = 

0 				if RAM.Available  >= 100 MB 

100 - RAM.Available 		if RAM.Available  < 100 MB 





	Memory Utilization = 

	0 				if RAM.Usable >= 100 MB 

	100 - RAM.Usable 		if RAM.Usable < 100 MB 



Server health component thresholds are the values at which a component reading is considered 

Significant (Yellow) and Critical (Red).  The Memory Utilization thresholds are defined in the Server 

Health Profile documents.  These values are initially set to platform-specific defaults, but are 

modifiable (per-platform) by the Administrator.  For the purpose of this document,  let us identify 

the Memory Utilization thresholds as YellowU and RedU.  So, given threshold Memory Utilization

values of 50 and 90, which translates to 50 MB available/usable and 10 MB available/usable, we have

	0 MB Usable <= Critical < 10 MB Usable <= Warning < 50 MB Usable

Solaris

For Solaris, a more useful metric for Memory analysis may be the “Scan Rate”, which

is provided in the Rnext Domino Platform Statistics for Solaris under the name.

Platform.Memory.ScanRatePagesPerSec.

The threshold values for Scan Rate are YellowS(Significant) = 200, RedS(Critical) = 400. These

values are based on the experience of running performance tests, and examining Scan Rate

values as the load on the server is increased.

In order to normalize the Scan Rate to a 0 - 100 based value that is compatible with the

threshold settings for Memory Utilization, this metric must undergo a number of adjustments:

Memory Utilization = 

	

	ScanRate * (YellowU / YellowS)	 				

				if ScanRate <= YellowS  (GREEN)



YellowU + ((RedU - YellowU) * (ScanRate - YellowS) / (RedS - YellowS))

				if YellowS < ScanRate < RedS (YELLOW)



	MIN(97, RedU + ((ScanRate - RedS) * RedU) / RedS	

				if ScanRate >= RedS (RED)	



	examples

		Scan Rate		Memory Utilization			Condition

		0			0 * (50/200) = 0				Healthy

		100			100 * (50/200) = 25 			Healthy

		200			200 * (50/200) = 50			Warning

		300			50 + (90-50)*(300-200) / (400 - 200) = 70	Warning

		400			min(97, 90 + (400-400)*90/400) = 90	Critical

		500			min(97, 90+(500-400)*90/400) = 97	Critical

AIX

For AIX, a more useful metric for Memory analysis may be the ratio of “Scan Rate” to “PagesFreedRate”,

both of which are provided in the Rnext Domino Platform Statistics for AIX.

Platform.Memory.ScanRatePagesPerSec and Platform.Memory.PagesFreedRatePerSec.

For simplicity, call this ratio the “Scan Ratio”.

The threshold values for Scan Ratio are YellowS(Significant) = 5, RedS(Critical) = 9.

In order to normalize the Scan Ratio to 0 - 100 based value that is compatible with the

threshold settings for Memory Utilization, this metric must undergo a number of adjustments:

	Memory Utilization = 

	

	ScanRatio * (YellowU / YellowS)	 				

				if ScanRatio <= YellowS  (GREEN)



YellowU + ((RedU - YellowU) * (ScanRatio - YellowS) / (RedS - YellowS))

				if YellowS < ScanRatio < RedS (YELLOW)



	MIN(100, RedU + ((ScanRatio - RedS) * RedU) / RedS	

				if ScanRate >= RedS (RED)	



	examples

		Scan Ratio		Memory Utilization			Condition

		0			0 * (50/5) = 0				Healthy

		2			2 * (50/5) = 20				Healthy

		4			4 * (50/5) = 40				Healthy

		6			50 + (90-50)*(6-5)/(9-5) = 60		Warning

		8			50 + (90-50)*(8-5) /(9-5) = 80		Warning

		9			min(97, 90 + (9-9)*90/9) = 90		Critical

		9.5			min(97, 90 + (9.5-9)*90/9) = 95		Critical

		10			min(97, 90+(10-9)*90/9) = 97		Critical

OS400

	Calculate MemUtil = 

			10000 * Platform.Memory.FaultRate / 

				(Server.CPUCount * Platform.System.PctCombinedCpuUtil * (100 - Platform.LogicalDisk.Total.PctUtil))



		MemUtil Threshold:	Warning - 250, 		Critical = 350



	then Memory Rating = 

		0.2 * MemUtil 					if MemUtil <= 250, 

		0.4 * MemUtil - 50				if (250 < MemUtil < 350)

		min(97, (3250 + MemUtil) / 40)	 		if (MemUtil >= 350, 

z/OS

First Supported in Domino 6

Statistics used:

							Warning		Critical

Platform.Memory.AvailableFrameCount		4192			819

Platform.Memory.OutReadyQueue			1			6

Platform.Memory.PagesPerSec			50			90



AvailFrameCount rating = 

	0							if Platform.Memory.AvailFrameCount >= 8192, 

		100  - (100 * Platform.Memory.AvailFrameCount / 8192)  	if Platform.Memory.AvailFrameCount < 8192



OutReadyQueue rating = 

	50 * Platform.Memory.OutReadyQueue			if Platform.Memory.OutReadyQueue <= 1 

		50 + 8 * (Platform.Memory.OutReadyQueue - 1)		if Platform.Memory.OutReadyQueue > 1 



PagesPerSec rating = Platform.Memory.PagesPerSec

	

	The Memory Component Rating for z/OS is essentially the worst of the three calculated ratings 

Sliding Scale

The sliding scale defines the conditions under which the designated weighting is applied to 

the statistic, as calculated by the formula defined above, and conditions under which the 

weighting mechanism is abandoned in favor of another method to "escalate" the metric.



Define: w = weighting

	tamber = amber trigger 

	tred = red trigger



Sliding Scale Memory Utilization Rating 



	w * MemoryUtil   (if MemoryUtil < tamber) 		GREEN



	(w * tamber) + ((100 - (w * tamber)) * (MemoryUtil - tamber))/(tred - tamber)	

		(if tamber <= MemoryUtil <= tred) 		AMBER



	100	(if MemoryUtil > tred) 			RED

example: MemoryUtilization has 15% weight toward the blended stat

		AMBER threshold = 50, RED threshold = 90:



Sliding Scale MemoryUtil Rating = 



	0.15 * MemoryUtil   			if MemoryUtil <= 50		GREEN

		As MemoryUtil varies from 0 to 50, Sliding Scale varies from 0 to 7.5



	7.5 + (92.5 * (MemoryUtil - 50) / 40)	if 50 < MemoryUtil < 90 		AMBER

		As MemoryUtil varies from 50 to 90, Sliding Scale varies from 7.5 to 100



	100   					if MemoryUtil >= 90 		RED



For NT we would have



MemoryFree (MB)	MemoryUtil	Sliding-Scale MemoryUtil

500			0		0

	250			0		0

	100			0		0

75			25		25 * .15 = 3.75

50			50		50 * .15 = 7.5

	40			60		7.5 + 92.5*(60 - 50)/40 = 30.625

	30			70		7.5 + 92.5*(70 - 50)/40 = 53.75

	20			80		7.5 + 92.5*(80 - 50)/40 = 76.875

	10 			90		100

	5			100		100



For Solaris we would have



Scan Rate		MemoryUtil	Sliding-Scale MemoryUtil

	40			10		1.5

	80			20		3.0

	120			30		4.5

160			40		6.0

200			50		7.5

	250			60		7.5 + 92.5*(60 - 50)/40 = 30.625

	300			70		7.5 + 92.5*(70 - 50)/40 = 53.75

	350			80		7.5 + 92.5*(80 - 50)/40 = 76.875

	400 			90		100

	500			100		100

Subject: RE: re: DDM Health.MemoryUtil.Value

Thanks for the detailed response, it really sheds some light on the issue.

Our server’s at Dennys are all Windows 2000 or 2003 servers running 7.0.3 with 4GB of RAM.

I have been using the Server\Performance tab’s “Real-Time Statistics” graphs including the “Health” statistics to display the statistics over a period of several hours.

Here is your formula for Window 2000 (not counting the sliding-scale):

So, if the reported RAM.Total > 2.1 GB, the SHM adjusts RAM.Available as follows

RAM.Available = 2150 - (RAM.Total - RAM.Available)).

Memory Utilization Rating =

0 if RAM.Available >= 100 MB

100 - RAM.Available if RAM.Available < 100 MB

Based on the formula you gave for Memory utilization for Windows 2000/2003 servers, it seems that I should ignore them on my servers that have heavy peak loads where Memory Utilization is hitting ~ 100%.

Here’s what I think is happening - please tell me if I understand properly. Domino grabs the maximum 2.1GB during the peak loads (but never returns any memory to the OS that has been allocated), so the Health.MemoryUtil.Value statistic continues to display 100% utilization. This only means that Domino has allocated as much RAM from Windows as it can, and from now on it will only be able to utilize RAM from the shared memory pool.

So it seems that via SHM or Health.XXX.Value statistics there is no insight into the shared memory pool.

DDM probes, on the other hand, can tell me if a specific agent consumes too much memory, and I’m assuming that is in regard to the shared memory pool and the use of swap.

I would also like to know more about Health.ServerResponse.Value. On my main application server, Health.ServerResponse.Value reports at a warning or critical level. Real-time plots of all of the other health statistics look good (below 5%). How is Health.ServerResponse.Value computed? Currently it is in a 100Mbps network and the Health.NetworkUtilization.Value statistic is always very low (<5%). Could it just be that that network is congested?