7.7 Offline Web applications

7.7.1 Introduction

In order to enable users to continue interacting with Web applications and documents even when their network connection is unavailable — for instance, because they are traveling outside of their ISP's coverage area — authors can provide a manifest which lists the files that are needed for the Web application to work offline and which causes the user's browser to keep a copy of the files for use offline.

To illustrate this, consider a simple clock applet consisting of an HTML page "clock.html", a CSS style sheet "clock.css", and a JavaScript script "clock.js".

Before adding the manifest, these three files might look like this:

<!-- clock.html -->
<!DOCTYPE HTML>
<html>
 <head>
  <title>Clock</title>
  <script src="clock.js"></script>
  <link rel="stylesheet" href="clock.css">
 </head>
 <body>
  <p>The time is: <output id="clock"></output></p>
 </body>
</html>
/* clock.css */
output { font: 2em sans-serif; }
/* clock.js */
setInterval(function () {
    document.getElementById('clock').value = new Date();
}, 1000);

If the user tries to open the "clock.html" page while offline, though, the user agent (unless it happens to have it still in the local cache) will fail with an error.

The author can instead provide a manifest of the three files, say "clock.appcache":

CACHE MANIFEST
clock.html
clock.css
clock.js

With a small change to the HTML file, the manifest (served as text/cache-manifest) is linked to the application:

<!-- clock.html -->
<!DOCTYPE HTML>
<html manifest="clock.appcache">
 <head>
  <title>Clock</title>
  <script src="clock.js"></script>
  <link rel="stylesheet" href="clock.css">
 </head>
 <body>
  <p>The time is: <output id="clock"></output></p>
 </body>
</html>

Now, if the user goes to the page, the browser will cache the files and make them available even when the user is offline.

Authors are encouraged to include the main page in the manifest also, but in practice the page that referenced the manifest is automatically cached even if it isn't explicitly mentioned.

With the exception of "no-store" directive, HTTP cache headers and restrictions on caching pages served over TLS (encrypted, using https:) are overridden by manifests. Thus, pages will not expire from an application cache before the user agent has updated it, and even applications served over TLS can be made to work offline.

View this example online.

7.7.1.1 Supporting offline caching for legacy applications

The application cache feature works best if the application logic is separate from the application and user data, with the logic (markup, scripts, style sheets, images, etc) listed in the manifest and stored in the application cache, with a finite number of static HTML pages for the application, and with the application and user data stored in Web Storage or a client-side Indexed Database, updated dynamically using Web Sockets, XMLHttpRequest, server-sent events, or some other similar mechanism.

This model results in a fast experience for the user: the application immediately loads, and fresh data is obtained as fast as the network will allow it (possibly while stale data shows).

Legacy applications, however, tend to be designed so that the user data and the logic are mixed together in the HTML, with each operation resulting in a new HTML page from the server.

For example, consider a news application. The typical architecture of such an application, when not using the application cache feature, is that the user fetches the main page, and the server returns a dynamically-generated page with the current headlines and the user interface logic mixed together.

A news application designed for the application cache feature, however, would instead have the main page just consist of the logic, and would then have the main page fetch the data separately from the server, e.g. using XMLHttpRequest.

The mixed-content model does not work well with the application cache feature: since the content is cached, it would result in the user always seeing the stale data from the previous time the cache was updated.

While there is no way to make the legacy model work as fast as the separated model, it can at least be retrofitted for offline use using the prefer-online application cache mode. To do so, list all the static resources used by the HTML page you want to have work offline in an application cache manifest, use the manifest attribute to select that manifest from the HTML file, and then add the following line at the bottom of the manifest:

SETTINGS:
prefer-online
NETWORK:
*

This causes the application cache to only be used for master entries when the user is offline, and causes the application cache to be used as an atomic HTTP cache (essentially pinning resources listed in the manifest), while allowing all resources not listed in the manifest to be accessed normally when the user is online.

7.7.1.2 Event summary

When the user visits a page that declares a manifest, the browser will try to update the cache. It does this by fetching a copy of the manifest and, if the manifest has changed since the user agent last saw it, redownloading all the resources it mentions and caching them anew.

As this is going on, a number of events get fired on the ApplicationCache object to keep the script updated as to the state of the cache update, so that the user can be notified appropriately. The events are as follows:

Event name Interface Fired when... Next events
checking Event The user agent is checking for an update, or attempting to download the manifest for the first time. This is always the first event in the sequence. noupdate, downloading, obsolete, error
noupdate Event The manifest hadn't changed. Last event in sequence.
downloading Event The user agent has found an update and is fetching it, or is downloading the resources listed by the manifest for the first time. progress, error, cached, updateready
progress ProgressEvent The user agent is downloading resources listed by the manifest. The event object's total attribute returns the total number of files to be downloaded. The event object's loaded attribute returns the number of files processed so far. progress, error, cached, updateready
cached Event The resources listed in the manifest have been downloaded, and the application is now cached. Last event in sequence.
updateready Event The resources listed in the manifest have been newly redownloaded, and the script can use swapCache() to switch to the new cache. Last event in sequence.
obsolete Event The manifest was found to have become a 404 or 410 page, so the application cache is being deleted. Last event in sequence.
error Event The manifest was a 404 or 410 page, so the attempt to cache the application has been aborted. Last event in sequence.
The manifest hadn't changed, but the page referencing the manifest failed to download properly.
A fatal error occurred while fetching the resources listed in the manifest.
The manifest changed while the update was being run. The user agent will try fetching the files again momentarily.

These events are cancelable; their default action is for the user agent to show download progress information. If the page shows its own update UI, canceling the events will prevent the user agent from showing redundant progress information.

7.7.2 The cache manifest syntax

7.7.2.1 Some sample manifests

This example manifest requires two images and a style sheet to be cached and whitelists a CGI script.

CACHE MANIFEST
# the above line is required

# this is a comment
# there can be as many of these anywhere in the file
# they are all ignored
  # comments can have spaces before them
  # but must be alone on the line

# blank lines are ignored too

# these are files that need to be cached they can either be listed
# first, or a "CACHE:" header could be put before them, as is done
# lower down.
images/sound-icon.png
images/background.png
# note that each file has to be put on its own line

# here is a file for the online whitelist -- it isn't cached, and
# references to this file will bypass the cache, always hitting the
# network (or trying to, if the user is offline).
NETWORK:
comm.cgi

# here is another set of files to cache, this time just the CSS file.
CACHE:
style/default.css

It could equally well be written as follows:

CACHE MANIFEST
NETWORK:
comm.cgi
CACHE:
style/default.css
images/sound-icon.png
images/background.png

Offline application cache manifests can use absolute paths or even absolute URLs:

CACHE MANIFEST

/main/home
/main/app.js
/settings/home
/settings/app.js
http://img.example.com/logo.png
http://img.example.com/check.png
http://img.example.com/cross.png

The following manifest defines a catch-all error page that is displayed for any page on the site while the user is offline. It also specifies that the online whitelist wildcard flag is open, meaning that accesses to resources on other sites will not be blocked. (Resources on the same site are already not blocked because of the catch-all fallback namespace.)

So long as all pages on the site reference this manifest, they will get cached locally as they are fetched, so that subsequent hits to the same page will load the page immediately from the cache. Until the manifest is changed, those pages will not be fetched from the server again. When the manifest changes, then all the files will be redownloaded.

Subresources, such as style sheets, images, etc, would only be cached using the regular HTTP caching semantics, however.

CACHE MANIFEST
FALLBACK:
/ /offline.html
NETWORK:
*
7.7.2.2 Writing cache manifests

Manifests must be served using the text/cache-manifest MIME type. All resources served using the text/cache-manifest MIME type must follow the syntax of application cache manifests, as described in this section.

An application cache manifest is a text file, whose text is encoded using UTF-8. Data in application cache manifests is line-based. Newlines must be represented by U+000A LINE FEED (LF) characters, U+000D CARRIAGE RETURN (CR) characters, or U+000D CARRIAGE RETURN (CR) U+000A LINE FEED (LF) pairs. [ENCODING]

This is a willful violation of RFC 2046, which requires all text/* types to only allow CRLF line breaks. This requirement, however, is outdated; the use of CR, LF, and CRLF line breaks is commonly supported and indeed sometimes CRLF is not supported by text editors. [RFC2046]

The first line of an application cache manifest must consist of the string "CACHE", a single U+0020 SPACE character, the string "MANIFEST", and either a U+0020 SPACE character, a U+0009 CHARACTER TABULATION (tab) character, a U+000A LINE FEED (LF) character, or a U+000D CARRIAGE RETURN (CR) character. The first line may optionally be preceded by a U+FEFF BYTE ORDER MARK (BOM) character. If any other text is found on the first line, it is ignored.

Subsequent lines, if any, must all be one of the following:

A blank line

Blank lines must consist of zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters only.

A comment

Comment lines must consist of zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters, followed by a single U+0023 NUMBER SIGN character (#), followed by zero or more characters other than U+000A LINE FEED (LF) and U+000D CARRIAGE RETURN (CR) characters.

Comments must be on a line on their own. If they were to be included on a line with a URL, the "#" would be mistaken for part of a fragment identifier.

A section header

Section headers change the current section. There are four possible section headers:

CACHE:
Switches to the explicit section.
FALLBACK:
Switches to the fallback section.
NETWORK:
Switches to the online whitelist section.
SETTINGS:
Switches to the settings section.

Section header lines must consist of zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters, followed by one of the names above (including the U+003A COLON character (:)) followed by zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters.

Ironically, by default, the current section is the explicit section.

Data for the current section

The format that data lines must take depends on the current section.

When the current section is the explicit section, data lines must consist of zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters, a valid URL identifying a resource other than the manifest itself, and then zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters.

When the current section is the fallback section, data lines must consist of zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters, a valid URL identifying a resource other than the manifest itself, one or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters, another valid URL identifying a resource other than the manifest itself, and then zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters.

When the current section is the online whitelist section, data lines must consist of zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters, either a single U+002A ASTERISK character (*) or a valid URL identifying a resource other than the manifest itself, and then zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters.

When the current section is the settings section, data lines must consist of zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters, a setting, and then zero or more U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters.

Currently only one setting is defined:

The cache mode setting
This consists of the string "prefer-online". It sets the cache mode to prefer-online. (The cache mode defaults to fast.)

Within a settings section, each setting must occur no more than once.

Manifests may contain sections more than once. Sections may be empty.

URLs that are to be fallback pages associated with fallback namespaces, and those namespaces themselves, must be given in fallback sections, with the namespace being the first URL of the data line, and the corresponding fallback page being the second URL. All the other pages to be cached must be listed in explicit sections.

Fallback namespaces and fallback entries must have the same origin as the manifest itself.

A fallback namespace must not be listed more than once.

Namespaces that the user agent is to put into the online whitelist must all be specified in online whitelist sections. (This is needed for any URL that the page is intending to use to communicate back to the server.) To specify that all URLs are automatically whitelisted in this way, a U+002A ASTERISK character (*) may be specified as one of the URLs.

Authors should not include namespaces in the online whitelist for which another namespace in the online whitelist is a prefix match.

Relative URLs must be given relative to the manifest's own URL. All URLs in the manifest must have the same scheme as the manifest itself (either explicitly or implicitly, through the use of relative URLs). [URL]

URLs in manifests must not have fragment identifiers (i.e. the U+0023 NUMBER SIGN character isn't allowed in URLs in manifests).

Fallback namespaces and namespaces in the online whitelist are matched by prefix match.

7.7.3 Application cache API

[Exposed=Window,SharedWorker]
interface ApplicationCache : EventTarget {

  // update status
  const unsigned short UNCACHED = 0;
  const unsigned short IDLE = 1;
  const unsigned short CHECKING = 2;
  const unsigned short DOWNLOADING = 3;
  const unsigned short UPDATEREADY = 4;
  const unsigned short OBSOLETE = 5;
  readonly attribute unsigned short status;

  // updates
  void update();
  void abort();
  void swapCache();

  // events
           attribute EventHandler onchecking;
           attribute EventHandler onerror;
           attribute EventHandler onnoupdate;
           attribute EventHandler ondownloading;
           attribute EventHandler onprogress;
           attribute EventHandler onupdateready;
           attribute EventHandler oncached;
           attribute EventHandler onobsolete;
};
cache = window . applicationCache

(In a window.) Returns the ApplicationCache object that applies to the active document of that Window.

cache = self . applicationCache

(In a shared worker.) Returns the ApplicationCache object that applies to the current shared worker.

cache . status

Returns the current status of the application cache, as given by the constants defined below.

cache . update()

Invokes the application cache download process.

Throws an InvalidStateError exception if there is no application cache to update.

Calling this method is not usually necessary, as user agents will generally take care of updating application caches automatically.

The method can be useful in situations such as long-lived applications. For example, a Web mail application might stay open in a browser tab for weeks at a time. Such an application could want to test for updates each day.

cache . abort()

Cancels the application cache download process.

This method is intended to be used by Web application showing their own caching progress UI, in case the user wants to stop the update (e.g. because bandwidth is limited).

cache . swapCache()

Switches to the most recent application cache, if there is a newer one. If there isn't, throws an InvalidStateError exception.

This does not cause previously-loaded resources to be reloaded; for example, images do not suddenly get reloaded and style sheets and scripts do not get reparsed or reevaluated. The only change is that subsequent requests for cached resources will obtain the newer copies.

The updateready event will fire before this method can be called. Once it fires, the Web application can, at its leisure, call this method to switch the underlying cache to the one with the more recent updates. To make proper use of this, applications have to be able to bring the new features into play; for example, reloading scripts to enable new features.

An easier alternative to swapCache() is just to reload the entire page at a time suitable for the user, using location.reload().

UNCACHED (numeric value 0)

The ApplicationCache object's cache host is not associated with an application cache at this time.

IDLE (numeric value 1)

The ApplicationCache object's cache host is associated with an application cache whose application cache group's update status is idle, and that application cache is the newest cache in its application cache group, and the application cache group is not marked as obsolete.

CHECKING (numeric value 2)

The ApplicationCache object's cache host is associated with an application cache whose application cache group's update status is checking.

DOWNLOADING (numeric value 3)

The ApplicationCache object's cache host is associated with an application cache whose application cache group's update status is downloading.

UPDATEREADY (numeric value 4)

The ApplicationCache object's cache host is associated with an application cache whose application cache group's update status is idle, and whose application cache group is not marked as obsolete, but that application cache is not the newest cache in its group.

OBSOLETE (numeric value 5)

The ApplicationCache object's cache host is associated with an application cache whose application cache group is marked as obsolete.

7.7.4 Browser state

[NoInterfaceObject, Exposed=Window,Worker]
interface NavigatorOnLine {
  readonly attribute boolean onLine;
};
window . navigator . onLine

Returns false if the user agent is definitely offline (disconnected from the network). Returns true if the user agent might be online.

The events online and offline are fired when the value of this attribute changes.

This attribute is inherently unreliable. A computer can be connected to a network without having Internet access.

In this example, an indicator is updated as the browser goes online and offline.

<!DOCTYPE HTML>
<html>
 <head>
  <title>Online status</title>
  <script>
   function updateIndicator() {
     document.getElementById('indicator').textContent = navigator.onLine ? 'online' : 'offline';
   }
  </script>
 </head>
 <body onload="updateIndicator()" ononline="updateIndicator()" onoffline="updateIndicator()">
  <p>The network is: <span id="indicator">(state unknown)</span>
 </body>
</html>