Tips for Using Reverse Proxy to manage HCL DX cache with Apache desk cache settings

Guidance from an article posted by HCL Digital Experience architect Alex Lang:

Using Reverse Proxy to manage HCL DX cache with Apache desk cache settings

As I've discussed in the past, to render an HCL DX Portal page, requires a lot of "statics". These statics are images, CSS files, JavaScript files, etc.

Given that CPU cycles can be considered "expensive" on the HCL DX Portal/WebSphere Application Servers, servers and cheaper on the IHS/Apache servers, it is very desirable to reverse proxy these statics on the IHS servers using the mod_cache facilities.

Prior to IHS version 7, the only choice available was the "mod_mem_cache" module. Mod_mem_cache provides an RFC2616 compliant reverse proxy cache. IHS version 7 added support for the "mod_disk_cache" option in addition to the mem_cache.

Choosing the best type of cache neither intuitive nor obvious. The Apache Caching guide provides some guidance.

In short, the correct answer is to use mod_disk_cache. Let's look at some attributes of each type of cache.

mod_mem_cache:

  1. Cache is "Per process": Apache spawns processes to handle inbound HTTP(S) requests. An instance of the mem_cache is created for each process. There is duplication of cache entries in this scenario (i.e. wasted CPU memory).
    2. Cache size limitations: Because of "1", the cache instances must necessary be limited in size to not exhaust CPU main memory. There are several mod_mem_cache directives to help in limiting the size of responses that can be stored in the cache.
    3. Occasional inefficient replacement algorithm: Because of "1" and "2" together, responses that are near the limit of the size allowed in the cache, make force removal of responses better left in the cache.
    4. Limited capability for stale pages: Because the cache is limited in size and because it gets regenerated with each new process instantiation, there is very limited chance of stale responses being in the cache.

    mod_disk_cache
  2. Cached responses are shared among all process: There is on instance of the cache system wide. Therefore, there is less wasted space in memory.
    2. Disk_cache type takes advantage of Unix/Linux file buffering: See the commentary below for discussion of this item.
  3. Need to use clean up utility - htcacheclean: mod_disk_cache does not automatically clean stale items in the cache. This can result in wasted disk space. Related, some responses from the response owner (i.e. Portal/WAS) may not have proper cache-control headers indicating how long responses are allowed to live in proxy caches. Therefore, the cache can potentially return the wrong, stale version of a response. The htcleancache utility is therefore needed to be periodically used (via "cron", for example), to ensure stale responses are removed from the cache.
    3. Need to allocate disk space: Since the responses are stored on disk, there is always the potential to exhaust disk space. Like all production Unix machines, monitoring policies need to be in place to ensure you don't let this happen.

When first considering which type caching to use, most would immediately suggest mem_cache as the better option. From a performance perspective, serving from memory is obviously better than serving from a disk. In reality though, if you understand how Unix/Linux buffer file I/O, the benefits of disk_cache become apparent. Unix allocates unused portions of memory to buffer files as they are read. So, the initial read request starts reading the file into memory. Subsequent read requests for the same file are read from memory without even touching the disk. So, with the exception of the initial load, disk_cache performs as well as mem_cache and there is only one instance of the response in memory as opposed to "per process" duplication of mem_cache. Because memory utilization is more efficient, cache hit ratios can be much higher with disk_cache.

There is additional interesting commentary for optimizing disk_cache on Linux in the article "Some Tuning Tips For Apache Mod_Cache (mod_disk_cache)"