Jesse's Software Engineering Blog
Website Performance with Browser Caching: Part 1 – Static Files
Controlling the way a browser caches your websites is a very important tool in delivering high performance websites. Many developers rely on default browser caching functionality and don’t leverage client side caching to it’s full capabilities. Modern browsers will cache requests to static files by default (CSS, JS, images, etc.) but will not cache dynamic files such as HTML files or PHP scripts. Integrating client side caching into dynamic driven websites can produce substantial increases in page load time and if done properly, will not have negative side effects on dynamic data needed in real time.
All browsers have some sort of “Developer Tools” interface that allow a user to monitor both the requests and responses of every interaction between the client and web server. These tools are extremely useful for monitoring website caching, and all of following data is taken from Google Chrome’s Developer Tools. I will be analyzing my portfolio, jessesnet.com, for this article.
To start, we will look at an example of a first visit to a web page. You may have to manually clear the cache on your browser to simulate a first visit to a website. Before loading the page open up the Developer Tools, click on the Network tab, and clear the data that is in there. On page load you will find the following data:
|jessesnet.com||200||text/html||4.3 KB||143 ms|
|bootstrap.min.css||200||text/css||16.1 KB||192 ms|
|slide1.png||200||image/png||60.5 KB||478 ms|
|TOTALS:||27 requests||-||368 KB||1.0 s|
The five columns we will be looking at are: Name, Status, Type, Size, and Time. A general overview of the columns:
- Name – The file name that is being requested
- Status – The status code returned from the server
- Type – The type of file being returned
- Size – The size of the file
- Time – How long the request took
So on our first page view, all the files have a status of 200, which means OK or that the request has succeeded. All of the requests have varying types, sizes, and times, with a total of 27 requests, 368 KB transferred, and a load time of 1.0 seconds. One thing to note, the text/html row with the root domain name, is the PHP script that is firing for the page. Now that the page has been loaded, I’m going to navigate away from the page and return. We get the following load data:
|jessesnet.com||200||text/html||4.3 KB||177 ms|
|bootstrap.min.css||200||text/css||(from cache)||0 ms|
|slide1.png||200||image/png||(from cache)||0 ms|
|TOTALS:||20 requests||-||4.4 KB||333 ms|
When we returned to the page, all of the requests for static files had a 0 ms load time and were pulled from the cache. This means the client isn’t even making the request to the server for those files. This is the best case scenario as the files are getting loaded directly from the browser’s cache. Notice the HTML file type did not get cached, as that is considered dynamic and is not cached by default. This includes your static HTML files and script files, such as PHP. If you look at the load times, they are about 1/3 of the first time we loaded the page and we didn’t even have to do anything. This is the default browser behavior.
One more scenario to consider, is if we hit refresh on the browser, referred to as a “soft refresh”. The data we now see for the page:
|jessesnet.com||200||text/html||4.3 KB||188 ms|
|bootstrap.min.css||304||text/css||0 B||94 ms|
|slide1.png||304||image/png||0 B||92 ms|
|TOTALS:||24 requests||-||5.2 KB||472 ms|
Now all but the HTML file have status codes of 304, Not Modified. In Chrome, the soft refresh forces the browser to re-validate the files before pulling from it’s cache. When the server returns the 304 status it is telling the browser that the file it has requested has not been changed, so it can go ahead and pull the contents for that file from the local browser cache. The files that returned the 304 status, still have a size of 0 B as they are loaded from the cache, but they now have a small load time due to the validation on the server. If you click on the name of one of the 304 requests (in Developer Tools), you will get more information about the request and the response headers. The two request headers we’re looking at are:
If-Modified-Since: Tue, 31 Dec 2013 20:41:24 GMT If-None-Match: "40e46-81b-4eeda9577a0e8"
And the response headers:
Last-Modified: Tue, 31 Dec 2013 20:41:24 GM ETag: "40e49-6cae-4eeda9577a0e8"
The ETag value is a unique identifier for the requested file. The ETag of a file is generated based on a file’s i-node, mod time, and size. So when a browser makes a request for a file, the browser will check for the files’ ETag and Last-Modified value in it’s cache and include those values with the request. This allows the server to verify that the browser’s cached ETag and Last-Modified time values matches the current browser cache’s values of the file and determines if the files match between client and server. If you need to change how the ETag values are determined on your server, you can update the httpd.conf file with FileETag, and you can also send custom ETag response headers. However, for most use cases, I just let Apache generate those values.
NOTE: For systems that run through a load-balancer this won’t work properly, as files on different servers will have different ETag values. But that discussion is outside of the scope of this article.
So now that we’ve seen a couple different scenarios when loading data from a web server, let’s take a look at manipulating the browser’s default caching behavior.
In the previous examples I demonstrated how browsers cache websites by default. They offer substantial gains in load times, and by default will cache all of your static files. So why worry about customizing your headers? The biggest reason is you don’t know how long your files will be cached for. In the examples, files were getting pulled from the cache, and not being re-validated via the ETag and Last-Modified values, until we did a soft refresh. Once a browser starts loading your site from it’s cache, and no custom behavior has been defined, it is up to the browser’s caching algorithms (expiration calculations) to determine how long to serve files from it’s cache. So if you have clients that are viewing your site from their browser cache, and you upload new files, they may not see them for quite some time, or more likely your files will only be cached for a short period of time. This can lead to unpredictable, or outdated, site functionality.
There are various different headers you can manipulate on your web server to customize the way browsers, and proxy servers, cache your web site. I’m just going to touch on the ones that I use, but you should take the time to familiarize yourself with all of the headers that are available, so that you can determine which are best for your purposes. Header Field Definitions
Expires – This is the most basic header for cache control and tells a browser how long the response is good for. This can be set to a specific date/time or to a relative time i.e. +1 week. This is considered an older directive, and is usually included for broader support in older browsers.
Cache-Control – This was introduced with HTTP 1.1 and more versatile than the Expires header. There are various variables that can be set with this header to give you maximum control over your caching.
I control these directives using Apache. In order to do this with Apache you need to use the Apache module mod_expires. Newer versions of Apache have this enabled by default, but if you are unsure open up your httpd.conf file and verify the following module is being loaded:
LoadModule expires_module modules/mod_expires.so
Remember, for this example I am talking about controlling the caching of static files, CSS, JS, images, etc. Updates to the .htaccess file will not affect the caching of your HTML files or scripts, just as setting headers in your scripts will not affect the caching of your static files.
After you reload Apache, create an .htaccess file for a project you are working on. You can do these changes on global level in httpd.conf, but I choose to do them on a project level for maximum control. Here’s an example of a .htaccess file:
Cache-Control: public, max-age=604800, s-maxage=604800, must-revalidate, proxy-revalidate Expires: Sun, 19 Jan 2014 02:48:18 GMT
I now know when this file will expire and that after it has expired the browser, and proxies, will re-validate the file to verify that it is not stale. This introduces a very defined level of control over how my files are cached and can be tailored to specific projects’ needs. Depending on how often you update, or what you update most, you can set the header values that will maximize site efficiency, while maintaining a consistent experience for your users.
One last thing I would like to point out is how you can force the user’s browser to ignore it’s cache and pull your most recent files. To do this you simply have to rename the file or just append a version number:
The next article will examine how you can cache your HTML files and dynamic content from your website.