These days, many chat services allow users to use multiple devices in parallel. Therefore, one might assume that such a multi-device functionality must be easy enough to implement. Provided that security and privacy protection are not a major priority, this notion isn’t far off. If, however, the multi-device protocol is required to meet Threema standards, things get complicated.
In order to fulfill Threema’s requirements, a multi-device solution must, of course, provide full end-to-end encryption. That’s a given. But on top of that, it also has to rule out any possibility for the server to alter key material, and the amount of transferred metadata must be kept to the absolute minimum.
Even though these requirements pose a true technological challenge, we believe to have found a capable and elegant solution, which is currently in development and will become available in the course of 2021. For technically inclined readers, we outline our approach below.
Keep the Chat Server Simple
According to Occam’s razor, it’s advisable to prefer simple solutions over complex ones, and as far as security is concerned, this principle certainly holds true. The less moving parts there are in key components, the lower is the risk for vulnerabilities to go unnoticed when assessing a system’s security.
Threema’s chat server, which is probably the most optimized part of our infrastructure, adheres to this “Keep it simple” motto, and there are reasons other than the security benefit why it should stay that way.
For one thing, not everyone will use multiple devices, and there’s no need to change anything as far as the classic single-device use case is concerned. For another thing, the chat server was never designed to handle multiple devices that use the same Threema ID in parallel.
Instead of complicating a crucial part of our infrastructure by extending the chat server’s range of duty, we decided to take a different route.
Introducing the Mediator Server
A new “mediator server” will act as an exchange hub for devices that share a Threema ID. The main tasks of this mediator server are:
- Coordinate access to the chat server
- Reflect messages to other devices
- Synchronize data and settings between devices
Let’s take a closer look at these tasks one by one. We’ll begin with the authentication process.
Anonymous Device Grouping
One fundamental requirement for server-based multi-device functionality is some kind of grouping mechanism that combines multiple devices into a single entity. Of course, the Threema ID could serve as an identifier to establish this tie. However, the mediator server doesn’t need to know which specific Threema ID a group of devices belongs to.
Instead of using the Threema ID itself, we derive a bit of cryptographic key material from the private key associated with the ID:
First, we derive the “Device Group Key” from the Threema ID’s private key. This Device Group Key will be used to encrypt the data that’s exchanged between devices that share an ID.
Next, we derive the “Mediator Path Key” from the Device Group Key. The Mediator Path Key will be used for authentication towards the mediator server. The server uses the public part of this key as identifier for the user’s devices; therefore, it’s called “Device Group ID.”
A device authenticates itself towards the mediator server by solving an authentication challenge based on the Mediator Path Key. Each device is able to accomplish the cryptographic key derivation independently. Since there is no way for the mediator server to determine which Threema ID the keys were derived from, we have established an anonymous device-grouping mechanism. And because all key material is derived from the Threema ID’s private key, which only the clients have access to, it’s impossible for the server to alter key material.
After the authentication process is completed, the mediator server designates one of the devices currently connected to the server to be the “leader.” This device is assigned a vital task: Whenever it receives a message from the chat server, it must process and “reflect” (i.e., forward) it to the other devices via mediator server. Similarly, any device that sends out a message has to reflect that message to the other devices. Reflected messages are end-to-end encrypted using the Device Group Key.
A mechanism that reflects messages to every other device of a group is the most fundamental component for multi-device functionality. As trivial as this might sound, the devil is, as always, in the details.
Most of Threema’s message types trigger some sort of reaction on the recipient’s device. Here are two examples:
- If a message is received and read receipts are enabled, at least one device has to return a read receipt to the sender.
- If a message is received from an unknown contact, this contact must be added to the contact list, where a name and and other properties are assigned. These properties must be identical on all devices.
The context is always relevant as far as incoming messages are concerned. In general, a reaction must be triggered on the leading device when a message is received, whereas devices other than the leader are not supposed to react to incoming messages.
Given this rule of thumb, a problem arises in the second example above: Without a reaction on devices other than the leader, a new contact will only be added to the leading device’s contact list, not to the other ones’. That’s where device synchronization comes into play.
If multiple devices share a single Threema ID, it’s not only required for new messages to appear on each device, it’s also expected that contacts, group chats, distribution lists, and user settings are synchronized across all devices. Therefore, the mediator server also distributes data that never passes the chat server.
With synchronization comes the possibility of conflicts. If, for example, a different profile picture is set on two devices simultaneously, both devices could, in the worst case, end up with the profile picture set by the opposite device. To prevent undesirable outcomes like this, a mechanism to temporarily grant a device exclusive access to the mediator server’s “Reflect Message Queue” is required. However, an explanation of the transaction mechanism is out of this brief overview’s scope.
Threema’s chat server uses TCP as transport protocol. For technical reasons we leave up to the reader’s imagination for now, the mediator server requires WebSocket support. Instead of equipping the chat server with the WebSocket protocol, the mediator server will support this protocol, and communication between chat server and multi-device clients will be proxied via the mediator server.
A welcome side effect of this proxying technique is the fact that the chat server is unable to correlate IP addresses with Threema IDs. And thanks to anonymous device grouping, the same holds true for the mediator server.
While Threema will, of course, provide an official mediator server, users will also be able to self-host their own mediator server if they so desire. In doing so, users will only disclose their mediator servers’ IP addresses to Threema’s chat server but not their devices’ IP addresses, reducing the amount of metadata on Threema’s end once again.
As we have seen, there are quite a few obstacles to overcome if security and privacy protection are major requirements for multi-device functionality.
Threema’s solution provides full end-to-end encryption. Our protocol’s cryptographic properties render it impossible for servers to alter key material. And, in true Threema fashion, the amount of metadata is kept to the absolute minimum thanks to anonymous device grouping and a self-hostable mediator server.
Of course, there are many other facets we haven’t touched upon in this rough outline. While we’re in a stage of development where technical details are still subject to change, the basic framework has been established.