2013-02-04

Web App Example Using IndexedDB

This article explains how to build a note making application with Web technologies that works completely as a standalone application without need for active internet connection after initial load. It uses IndexedDB for saving notes and utilizes AppCache to make sure that all resources are available for offline use. In short, it behaves like any desktop application could work, but does not require installation. To show some less trivial uses, this application includes features like: saving documents while typing, using Memento design pattern to separate serialized state (the state that is stored in IndexedDB), data to HTML bindings with Knockout.js and live updates from database to HTML, also using Knockout.js. All the code for this example application can be found from GitHub.

What is IndexedDB

IndexedDB is fairly new technology in HTML5 scene that allows applications to store data locally. It is a NoSQL database engine build right into your browser. Currently Firefox, Chrome and IE10 support IndexedDB. For others that support only WebSQL, like Safari, there is a shim implementation. When starting to write my example application for this article, Chrome was implementing slightly older version of the IndexedDB specification, but seems to be fixed in Chrome version 24. I have tested the example with Chrome and Firefox.

Basic idea with IndexedDB is that you can store Javascript objects in the database and you can access those quickly. Everything happens inside your browser and without any need for active internet connection. If you are familiar with some NoSQL database, you should have no problems getting started with IndexedDB. If your background is in SQL databases, you probably need to adjust your way of thinking a little bit when it comes to structuring your data and how to find and access objects in the database. NoSQL does not mean that you do not need to think about the structure of your data, it just allows more flexibility.

When coming from SQL world, IndexedDB can feel a bit strange in the beginning. You do not need to define complete structure of your data and there is no query language. Instead you just define key-value pairs where the key is one of the properties in your object and value is the actual object. Additionally you can defined indices that can be seen as alternative key-value pairs that provide mapping from some other properties to your objects. When you get data from the database, you always get complete object or objects and there is no support for views filter or combining data. Important thing is to understand that accessing objects is always really fast operation, as long as you will not store big blobs of data in your objects.

Using IndexedDB

This part will go through the basic use of IndexedDB, doing so quite quickly. There are quite many much better articles about how to use IndexedDB. This article tries to focus more on where to use and what are the challenges of using IndexedDB. To see more detailed, you can always read the source code of the related example.

As commented earlier, IndexedDB is key-value storage where objects have predefined property that is used as key for identifying objects or such property can be generated automatically for you by using key generator when you save new object. Stored objects are pure-data Javascript objects and functions are not allowed or at least not stored. To access these objects, you either get objects by searching with key or you build some indices that are alternative property combinations to find your objects.

Database itself is divided in object stores, where each object store is expected to hold certain type of objects. Indeces are defined per object store and each object store is per database and database is per origin, more about origin is discussed in same-origin policy part. Object store also defines constraints for the data by expecting to find properties that are used as key or in indices. Other structural definitions or constrains cannot be made.

One really nice feature in IndexedDB is that has transactions. IndexedDB's case you can have multiple readonly transactions at any give time, but when object store is used in readwrite transaction all other transactions need to wait. This should be quite basics, but good thing to keep in mind when writing your transactions. For example when saving data in the example application, you need to remember that all other operations are blocked for that object store. That also why my example application does not call IndexedDB put operation every time you type something, even though it has save while typing behaviour, but instead it marks documents as modified or dirty.

Creating Database and Handling Changes in Its Structure

To create database you just open it, if database does not exist it will be created. Creation happens inside upgradeneeded event handler. This same event is emit when database has been modified and you need to update to new version of the structure.

The example application creates database called "Notes" that has single object store "notes" for storing Note objects. Code for initializing and opening the database is divided between main.js and notes.js so that App (type defined in main.js) takes care of opening the database and calling NotesProvider (type defined in notes.js) to initialize object store "notes" for Note objects. In case of upgradeneeded event, App calls NotesProvider.initStorage() to create its object store and update indices. Database initialization and opening is divided in two so that you could easily add new components to handle the code for initializing and upgrading their own object stores.

App.init() takes care of opening the database and if database version is changed it will request NotesProvider to update object store. Actual object store is created and updated in NotesProvider.initStorage(). This method handles initial creation (you could think it as upgrade from version 0 to 1) and upgrade from version 1 to 2. Later in NotesProvider.init() application expects database to be in good order and gets all records to populate list of notes at HTML side by using ViewModel that is used for Knockout.js bindings.

Finding and Accessing Your Objects

There are two approaches to find or query objects in IndexedDB: get by key or key range, and find through indices. Initial step for any operation is to start a transaction that defines which object stores your transaction uses and what is the mode to use: readonly or readwrite. By default mode is readonly.

For example to fetch all notes when initializing NotesProvider, I create a readonly transaction that uses object store "notes" and I use index for property called modified. I want to use modified property, because I want to populate list of notes in order of modification so that last modified gets top-most in the HTML side of the code. See Notes.init() for complete code, below is only a part of the init-function just to show how to get all notes:


var store;
var index;
var cursorRequest;

store = self.app.db.transaction(["notes"]).objectStore("notes");
index = store.index("modified");

cursorRequest = index.openCursor();

cursorRequest.addEventListener("success", function(e) {
    console.log("NotesProvider.init, cursorRequest success");
    var result = e.target.result;
    if(!!result == false) {
        callback();
    } else {
        console.log(JSON.stringify(result.value, null, '\t'));
        self.app.viewModel.addNote(new Note(self, result.value));
        result.continue();
    }
}, false);

When using cursor like above, you can use event.target.result to iterate back, by calling previous(), and forward, by calling continue(). By reading result.value, you can access the object your cursor is currently pointing to.

Add, Modify and Delete Objects

To modify data, you need to start readwrite transaction like when getting data, but this time you define mode by passing "readwrite" as second parameter when creating your transaction request.

To add new documents, you can see NotesProvider.addNote() in notes.js. Basically I start transaction and call add to save new note into the object store:


var store;
var request;
var transaction = this.app.db.transaction(["notes"], "readwrite");

store = transaction.objectStore("notes");

request = store.add(note._memento);

request.addEventListener("success", function(event) {
    console.log("NotesProvider.addNote, succeed.");
    console.log("Add new note to top of the list and make it current.");
    self.app.viewModel.notes.splice(0, 0, note);
    self.app.viewModel.currentNote(note);
}, false);

To modify documents, again you need to start with readwrite transaction. Inside a transaction you can either save object by using object store method put or call update when accessing object through a cursor. In the example application I use only the put method. If you need to update multiple objects while iterating through with a cursor, you may want to use update method. Other solution is to find all objects inside a readonly transaction and then modify separately inside a readwrite transaction. This approach has an advantage of iterating through all items will not block other read operations.

Below you can find a snippet from the example application where I save a note. There should be nothing too special in the example apart from use of properties _saving and _dirty. These are needed, because the example saves notes automatically when modified and that could generate a lot of sequential readwrite transactions. To prevent database operations starving the system there needs to be some logics when to call put method to save a note. In the example I make sure that only a one transaction for saving a note is happening at a time and no other save operations are queued. This is achieved so that if we are in the middle of a save transaction, new save calls are skipped and instead we just set _dirty flag to indicate that there are unsaved changes. Later when transaction completes, we check if the note was modified during the save operation and if it was, we launch save operation again. In theory this example has possibility of loosing changes when multiple notes are saved at the same time, but since save operation requires user interaction and user can interact with single note at a time, in practice this is not going to happen. Only possible situation might be when user modifies document and really quickly clicks to create a new note, but this is only possible in the desktop or table layouts where machines are most probably fast enough to render this case impossible.


NotesProvider.prototype.saveNote = function(note) {
    var self = this;
    var store;
    var request;
    var IDBTransaction = window.IDBTransaction;
    var transaction;

    note._saving = true;

    transaction = this.app.db.transaction(["notes"], "readwrite");
    transaction.addEventListener("complete", function(event) {
        console.log("NotesProvider.saveNote, transaction completed.");
        note._saving = false;
        if (note._dirty) {
            console.log("NotesProvider.saveNote, re-save since note was updated during the operation.");
            self.saveNote(note);
        }
    }, false);
    transaction.addEventListener("error", function(event) {
        console.log("NotesProvider.saveNote, error: ", event);
        alert("Failed to save note.");
    }, false);

    store = transaction.objectStore("notes");

    note._dirty = false;

    request = store.put(note._memento);

    request.addEventListener("success", function(event) {
        console.log("NotesProvider.saveNote, saved.");
    }, false);
};

Deleting objects happens by calling object store's delete inside a readwrite transaction. As a parameter, you need to pass value of the key that identifies your object. Other option is to use cursor and through the cursor delete the object it is pointing at. In my example application, there is no use of delete, but below is an example of using cursor.


store = self.app.db.transaction(["notes"]).objectStore("notes", "readwrite");
index = store.index("modified");
var cursorRequest = index.openCursor();
cursorRequest.addEventListener("success", function(event) {
    var result = event.target.result;
    if(!!result == false) {
        callback();
    } else {
        result.delete();
        result.continue();
    }
}, false);

Making Application to Work Offline

Using IndexedDB does not make much sense unless you make sure that your application can be used completely in offline mode. To allow your application to work in offline mode, you need to create AppCache manifest that defines what are the resources needed by offline use.

Location of AppCache manifest is defined in your HTML file as a parameter of html-tag. In my example application it is in index.html: <html manifest="notes.appcache">.

Since my application is fairly simple, the actual content of manifest just lists all files that are part of the application:


CACHE MANIFEST

index.html
style.css
layout.css
layout-mobile.css
main.js
notes.js
libs/uuid-v4.min.js
libs/knockout-2.2.0.js

To react on cache changes, this application just notifies user and when user clicks OK, it calls window.location.reload() to take new version in use. This is done in main.js:


window.applicationCache.addEventListener('updateready', function(e) {
    console.log("Appcache update ready.");
    if (window.applicationCache.status == window.applicationCache.UPDATEREADY) {
        // Browser downloaded a new app cache.
        // Swap it in and reload the page to get the new hotness.
        window.applicationCache.swapCache();
        if (confirm('A new version of this site is available. Load it?')) {
            window.location.reload();
        }
    } else {
        // Manifest didn't changed. Nothing new available.
    }
}, false);

There are some caveats when it comes to AppCache. One that you will learn sooner or later is when you modify files without changing content of manifest file and your application will not get refreshed. There are multiple ways of handling this problem, but this issue is good to keep in mind and you probably want to disable caching while developing the application. Dive to HTML5 has really good article about AppCache that you probably want to read: Dive into HTML5, offline.

Same-Origin Policy and Security

How about the security? Web browsers have a simplified security model that is called same-origin policy. Slightly simplifying, this means that all resources application or web pages stores, can be accessed only by other applications or pages that share the same origin. Origin itself is defined by scheme, address and port. In other words, if you have http://example.com/app1 and http://example.com/app2 these applications have same origin and both can access all resources saved by the other application. If you have http://example.com, https://example.com, http://example.com:8080 all these three have different origin and cannot see any resources from each others even if the application is actually the same one. All applications and pages that have same origin can access each others resources and there is no way to restrict this and vice versa, if your origin is different there is no way to allow applications or web pages from other origins to access your data. At least not without having additional help by using something like CORS or some other methods, but these are not in the scope of this article.

When it comes to IndexedDB, in good and bad, same-origin policy simplifies how to think your security: there is only a single way to protect your data from other applications and sites. If this is not enough, you probably do not want to store your data at the client side. On the other hand, this simplicity might bring some annoying constraints. Sometimes it would be nice to expose part of the data to applications from other origin, but this is not possible. Think of an example where you have contacts and email applications and you want to pick contacts for recipients of an email. This is not possible directly by using contacts database from email application, but by using web intents, postMessage or web widgets you could go around the issue. Especially with web intents and postMessage, you could popup window that loads contacts application where you pick the contacts and then return the data to the email application. As an alternative approach, Web widgets could allow you to embed contacts picker widget that actually runs inside another origin and passes the selected contacts to email application. I have not tried this approach and I am just guessing that this might be possible. I might try it and have an another article in the future around this topic.

Where to Use IndexedDB

When and how to use IndexedDB? When writing web applications, you have few possibilities for storing your data: use server side storage or database, use File API and write your data to filesystem, use WebStorage or use IndexedDB. Basically if you aim for Web application that works in offline mode, then the first one is obviously out of the scope. Second is not necessarily optimal and might require quite a lot of logics. So in the end, only two last ones are truly viable choices.

WebStorage has been around quite some time and it is very simple key-value storage where you can list your keys and you can get and save objects identified by key. It is very simple storage, but often it is good enough for simple use cases. For example it would be good enough for my notes application.

IndexedDB requires a bit more learning since you need to get a bit deeper into the NoSQL database world and you need to start using transactions. On the other hand, it is more capable and gives you some additional speed improvements when searching data from a larger dataset. Another gain over WebStorage is that you have a way to divide your data into silos by saving your data into type specific object stores.

When you think about desktop applications that store data in your machine and do not synchronize the data with external services or between multiple devices, IndexedDB has all the capabilities to match such needs. Additionally since you can write your application in HTML, CSS and Javascript, development is fairly fast and easy, at least in case of simple applications. Also distribution and update process is really easy since all you need for running the application is a modern Web browser. When correctly combined with AppCache, you will be able to use the application in offline mode and when connected again, browser will check availability of new version. What would be easier distribution mechanism that this?

Unfortunately world is moving towards having data synchronized between all your devices and allowing cloud to have a replica of the data. This is where the shortcomings of IndexedDB are starting to bite.

Shortcomings of IndexedDB

IndexedDB is fairly new technology and HTML 5.0 is the first version of the standard where it is included. Being a new technology, there are some shortcomings that limit its usefulness. It is just natural that this kind of shortcomings exist with new technologies and I hope these will go away in the future when the technology matures and users point what kind of new features are needed.

Searching by Using Partial Match

Coming from SQL world, one thing that you definitely miss is the lack of search by partial match of a text field. In SQL world you can use LIKE in WHERE clause with % to match either string starting with (LIKE "<search term>%"), including (LIKE "%<search term>%") or ending with (LIKE "%<search term>").

If you are familiar with NoSQL world it should not be too surprising that IndexedDB lacks this kind of capability. The nature of NoSQL databases is build around getting objects and building views or indices pointing to these objects. Often keys and indices are stored in a tree-like data structure that makes it very hard to do partial matches of values of some fields. On the other hand, traversing through all items in the database and do partial match based on certain fields can be quite expensive. To play nicely with this shortcoming requires some changes in the way you think and design you application and your database. If you need this capability, there are some ways to achieve search capabilities or at least partially achieve same features you would use LIKE in SQL.

Of course one possible solution is to walk through all your items and match the field you want with the search term. But this is quite inefficient and especially with mobile devices can take a while and eat your battery.

Alternative solution is to create an index using your text field, you can use key range to achieve search by start of the string. Often this can be enough. Or if you take this further you can create dictionary where you have words to object mappings and use words as the key. This way you can search you by the beginning of any word to find objects that have words which match the beginning of the word with your search term. Pretty much like with SQL LIKE "<search term>%" would give you. Idea is to use key range search and match anything starting with your search term and ending with string where you replace last character of your search term with character following your last one and upper bound of the range not included. As and example to match all words starting with "rep" you would create key range starting with "rep" and ending "ret" and upper boundary "ret" would not be included in the results. This will give you all items where your key starts with "rep" like "reptile" or "representative", but not those starting with "ret" like "return". Only challenge with this approach is that you need to build your dictionary.

To build this dictionary you need to split your text fields into separate words and then inject those into object storage that provides the needed index and mapping. In a theory this kind of generic component, doing this for you, could be created quite easily. But it is a bit challenging, because there is no proper event mechanism to listen database change notifications and hooking into change notifications to harvest these words.

Lack of Event Mechanism for Changes

IndexedDB provides no means to listen changes in the database. This makes it very complicated to create multi-window applications that have multiple windows open and keep views up-to-date when other windows are modifying the data. Or in case of multiple clients that synchronize data through some server or peer-to-peer manner. Other challenge is when you want to create generic component that manipulate or harvest information from new records or when existing ones are modified.

When thinking about first case where you have two windows or tabs accessing same data. If you have one view that shows list of items in your database and another that has single item open that you edit. Now there is no way of list view to get events that one of the items in the list was modified. To be honest, there are ways, but you need to use or create event mechanism that crosses window boundaries, but implementing such is a bit annoying task to do.

One even more useful target for this event mechanism would be to create server-client data synchronization library that is generic and can be easily included as a part of any application. Without this mechanism you still can create this kind of library, but you need to do lots of plumbing to notify about changes in client side or get changes from server side propagated to connected clients. As a one use case, think of a CouchDB connector and how easy it would be to create generic connector that reads your object stores and follows changes in those synchronizing everything with CouchDB.

Third use case is to create generic data workers. It would be useful if you could create generic workers that listen for changes and then do something to new and modified objects. As an example, think about the the example application where you would create dictionary of all possible words found in your notes and then search notes by beginning of any word. Now you need to hook into your own save/update functions and launch your indexer to harvest and update all words found in your notes. Instead of doing this, it would be very handy if you could listen for change notifications and do the processing separately.

To solve this shortcoming, we need a way to emit events when objects are created, modified or deleted in an object store. It should not require much to have this kind of mechanism especially when you already have a event mechanism in the browser. Having this would allow creation of generic libraries that hook into object stores and process items, keep data in sync between client and server or synchronize views across multiple windows.

Lack of Globally/Universally Unique Identifiers

It would be very nice if you could use UUIDs as IndexedDB identifiers since this would make it easier to synchronize your data between server and client. Currently IDs are just numbers that are incremented automatically per object store and if you need UUIDs, you need to insert those your self into your objects and define that field as key path when creating your object store. Only problem is that this requires you to include yet another library or you need to write your own to generate UUIDs. Also if you follow UUID version 4 specification you need a proper random number generator and as far as I know Javascript's Math.random() has some issues which may lead to UUID collisions.

Having UUIDs or similar unique identifiers as object identifiers is pretty common in NoSQL world and it just makes me wonder why IndexedDB does not have this in the specification. Instead it uses approach that is more common in SQL world for having auto increment key field.

Conclusions

IndexedDB is very nice and easy to get started with and I am really happy to see it included in HTML5 standard. Having this kind of real database engine as a part of the browser allows you to do more and more with HTML applications and reduces the need for having native applications.

When comparing IndexedDB features to some other NoSQL databases, I am glad to see that IndexedDB is fairly easy to use. Also inclusion of transactions makes me happy and I have been missing that when using CouchDB.

Only major problem is the lack of event mechanism that notifies for changes in the database. I really cannot understand how this kind of feature is leaved out of the specification when there are so many obvious use cases that absolutely scream the need of it.

When accepting the shortcomings and admitting that this is really the first version of the specification, then I have to say IndexedDB is fairly good. When you think it as a Web replacement of an embedded database that can be used only from single application process at a time and does not have any synchronization support, then IndexedDB is actually matching the needs of Web applications. If you would like to use it as a client side cache to server side big database, then IndexedDB will give you some headache.

Luckily the shortcomings are not causing any insurmountable obstacles and there are projects like PouchDB that matches CouchDB API and is build using IndexedDB. Nice thing with PoutchDB is that it allows you to keep in sync with remote CouchDB. Only thing I would wish from them is to provide API that matches with IndexedDB instead of CouchDB. That would make it even more interesting, if you ask me.

I hope you enjoyed reading this fairly long article and find it useful. If you have any questions or comments about the example code, do not hesitate to contact me.