Developer – How to Sync Multiple Clients to a Server

At some point in your app building career you’ll be faced with the notorious data synchronisation problem. How do you backup your clients data onto a server securely and in the correct order?

Syncing the Correct Order

Security is a topic for another day so let’s tackle “correct order”. All data that your user generates is temporal (time ordered). For example he generates Data Blocks A -> B -> C then modifies A again. Sync options to get it on the server might be:

  1. Send it client -> server, immediately as it’s generated.
  2. Send it client -> server, in a periodic batch sync job (cron anyone?).
  3. Server requests sync from the client (rare case but I’m sure it does happen). This would actually make the server the locus of control which is a good thing. But we could run into connectivity issues or the app might not be open (shockingly, as good as my apps are people don’t have them open all day 🙁 ).

All the above are straight forward when you have a single client. In that case the client is THE source of truth for user generated data.

The Problem

What if we add a second device? Which one now has the most relevant data? If they send similar data to the server which batch is the latest and greatest? The whole mess boils down to this:

Who’s in charge here???

These days we have a handy third party solution for this – Firebase – But it’s expensive, you’re reliant on their code base and Google owns all the data (or something like that).

When developing No Comment Podcasts I needed a solution that avoided horrendous bills, outages and unauthorised usage of my users’ data. Not to mention – what if Firebase goes out of business? Hey, it happened to Parse, which I used in the past.

My Usage Requirements

A crucial feature of the podcast app was the ability to start a podcast on one device, pause playback, then continue playing on a second or third device / web browser etc. As you can imagine the sync needs to be up to date as you can’t predict when the above will happen. Therefore the mandate was:

The data needs to always be in sync.

The tech stack specifics (for your interest) were:
Users’ data on Android – Room DB -> sync to Node JS server (MongoDB) -> POST requests in JSON

My Solution

All data generated are time ordered. Hence if we can attach a time synchronised “updatedAt” tag to our chunks of data then we should know when something was generated. The key is to make sure that all clients use the same synchronised time source. Hence we cannot rely on the system time clock.

The solution is to get an external time source from an NTP server (it’s how most devices get time). NTP is a protocol that returns a UTC time. Now all our clients can pull from that NTP server and know that data timestamps are in sync. Shout out to Lyft engineers for the Kronos Library that does the heavy lifting for me.

My data sync flow is outlined below (when the requests are made is variable according to your needs so I’ve left those details out). The request flow initiated by the client looks like this:

  1. POST to server with single parameter:
    { timeLatestData: 196836833 }.
    (Server checks if it has newer or older data than the above parameter)
  2. Server responds if it has older data:
    { syncToServer: true,
    serverLatestTimeSync: 196811112 }
    Or server responds if it has newer data than the client:
    { syncToClient: true,
    data: [array of synced data],
    serverMoreDataAvailable: true or false }
  3. From there the client uploads data or downloads it from the server according to the response.

To reiterate, this solution relies on 2 key factors.

  1. Correct time sync across devices (watch out for time zones, although NTP is time zone free!) Time zones are probably an edge case unless your user flies across time zones often with iPads stored at all his safe houses 🙂
  2. Sufficient interval for syncing so as not to get the data mixed up. This was crucial for my case.

The beauty is that we don’t need the server to determine time. That is a big bonus because then we don’t need to load our own server with time requests. We’re simply relying on the client device to retrieve the correct time and form the data correctly. (I’m a big fan of client side CPU usage – subject to data integrity requirements of course).

The other bonus is related to sync interval timing. You can have a variable that increases or decreases time between syncs. You can code this according to your users behaviour. For example if they never use a secondary device then you can sync once a day. But if they listen to a podcast on their phone, then load it up on the web when they arrive at work then you can sync when they arrive at work (assuming you have access to their location)! This adaptive approach leads to the best user experience tailored to each and every user.

Anyway, I hope this has been informative and I’ll let you know if I have a disaster in production!