Solving our CMS Downtime with NGINX

A while back my team faced a challenging issue: our third-party headless CMS was sometimes unavailable, leading to missing product descriptions and other content issues on our websites. After exploring several options, including application-level caching and using Redis, we decided on a simpler, yet effective approach using NGINX’s reverse proxy capabilities.

With a headless CMS, our content management is decoupled from the website’s presentation layer, offering flexibility and scalability. However, this setup introduced a serious vulnerability: whenever the CMS went down, our site’s dynamic content, particularly product descriptions, disappeared, impacting user experience and sales.
However it was essential that if content was available from the source, we prioritize delivering that over cached entries.

Brainstorming Solutions

The team considered various strategies to mitigate these disruptions:

Application-level caching: Implementing logic within our application to cache responses and serve them during CMS downtimes.
Using Redis: Setting up a Redis cache to store and serve content independently of the CMS’s availability.

Both options had merit and would work, but required significant development effort and introduced complexity into our system.

Implementation with NGINX

We chose to leverage NGINX, already in use as a reverse proxy for our Docker containers and other services, to address our CMS reliability issues with the following setup:

Configuring NGINX as a Reverse Proxy: We configured NGINX to act as an intermediary for requests to our CMS. This not only improves security by shielding backend servers from direct external access but also adds a robust caching layer.
Using proxy_cache_use_stale Directive: We employed the proxy_cache_use_stale directive in NGINX, enabling the server to serve stale content if the CMS is unreachable.
Reconfiguring the Application: Instead of querying the CMS directly, our application now queries an internal NGINX URL configured to proxy and cache the CMS responses. This change was simple to implement and seamless for our end users.

Below is a simplified version of our NGINX configuration

http {
proxy_cache_path /data/nginx/cache levels=1:2 keys_zone=cms_cache:8m max_size=1000m inactive=600m;
 
    server {
        location / {
            proxy_cache cms_cache;
            proxy_pass http://yourcmsbackend.com;
            proxy_cache_valid 200 302 404 1s;
            proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
        }
    }
 
}

On line 2, we intialise the cache, set some parameters, most importantly the *keys_zone` defining its name (cms_cache) and size (8mb).
Line 6 references the name set in the keys_zone.
Line 9 here we set the cache to 1 second. This setting is low enough to be “live” as far as the CMS is concerned, but it means that the content is cached, and available as “stale” content in the case of an error with the source.
Line 10 is the magic - this directs nginx to use stale content under the following conditions (in this case error, timeout, and 50x errors).

Conclusion

This solution allowed us to use a battle-tested piece of software—NGINX — to address a significant reliability issue with minimal changes to our existing infrastructure. By implementing NGINX’s caching and stale content serving capabilities, we made sure that our site remains functional and content-rich, even when our CMS is unreachable, without introducing added complexity in our code.