The Threema web client communicates directly with the Threema app over WebRTC data channels. In addition to the standard WebRTC channel encryption, all packets are end-to-end encrypted using NaCl. The WebRTC signaling is also implemented with end-to-end encryption using the SaltyRTC protocol. The following paragraphs explain the general architecture of the Threema web client implementation as well as some aspects of the signaling protocol. If you want to learn more about the signaling protocol, please refer to the main specification as well as the WebRTC task specification.
The web client and the app form a client-server relationship. All data about new events and old conversations is exchanged directly between the two via the shortest possible route.
When the web client is started, it requests the initial data (conversations, contacts, avatars, etc.) from the app. The app responds with the requested data.
To send a message, the web client sends a message request to the app, which will then send the message to the recipient. When a new message arrives, the app notifies the web client. No messages are exchanged directly between the web client and the Threema chat server. The user's private key never leaves the device.
The full connection buildup happens in three stages.
1. Server Handshake
First, the web client generates a new permanent key-pair and a random authentication token. It then connects to the SaltyRTC signaling server as initiator. The hex-encoded public permanent key is used as the WebSocket path.
The public permanent key of the web client as well as the authentication token are transferred to the app through a QR code.
The app first generates its own permanent key-pair and then connects to the same WebSocket path as responder. Both peers conduct the server handshake according to the SaltyRTC protocol.
2. Client Handshake and Signaling
As soon as both peers have successfully finished the server handshake, they can start exchanging client handshake messages. Because the server has established a trusted connection with both the initiator and the responder, it can relay encrypted messages without knowing the content. This is how the session keys and the signaling data are transferred.
The two peers each generate a random session key-pair independent of the permanent key-pair. The public keys are exchanged. Additionally, the peers exchange information on how they intend to communicate after the client handshake (the so-called SaltyRTC Task). In the case of the Threema web client, they both request to communicate over WebRTC data channels.
After the client handshake is successful and both peers know each other's public session key, they initiate a WebRTC PeerConnection. The necessary signaling information to build up the connection (like offer, answer and ICE candidate messages) are exchanged via the WebSocket connection, encrypted with the session keys.
As soon as the WebRTC PeerConnection is established, the two peers initiate the handover of the signaling channel. Future signaling messages (like additional ICE candidates when the network configuration changes) are exchanged over a secure signaling data channel. Once the handover is complete, the WebSocket connection to the server is closed. The two peers now open a second WebRTC data channel in order to exchange application data. All packets transferred through that data channel are encrypted using the session keys.
The two peers can now open a second WebRTC data channel in order to exchange application data. All packets transferred through that data channel are encrypted using the session keys.
3. Web Client Data
All required web client data (conversations, contacts, messages, etc.) can now be freely exchanged through the encrypted data channel.
The direct communication channel between the app and the web client is established using a WebRTC PeerConnection. In order to establish such a peer-to-peer connection, a signaling channel is inevitable. Regular signaling server implementations often use WebSockets without any end-to-end encryption, meaning that the server can read all (potentially sensitive) network information of the peers connecting. There is also the risk of a server manipulating the data being transmitted, opening up possibilities of MITM attacks.
In order to mitigate this risk as well as minimizing metadata exposure in general, we participated in the design and implementation of the SaltyRTC protocol, which offers end-to-end encryption of signaling data and does not require the clients to trust the server at all.
WebRTC Connection Buildup
Because networks in the real world don't always make it possible to establish a direct connection between two peers due to obstacles like firewalls, NATs, CGNs and the like, WebRTC connection buildup may have to resort to mechanisms like STUN and TURN. STUN (Session Traversal Utilities for NAT) is a protocol that allows to find each other's public IP despite the existence of NATs. TURN (Traversal Using Relays around NAT) is a protocol that provides relaying of data packets in case the connection buildup using STUN alone fails. Note, however, that even though a TURN server relays packets between two peers, it cannot know anything about the content of these packets as they are both end-to-end encrypted by WebRTC using DTLS and end-to-end encrypted by SaltyRTC.
Threema runs its own STUN and TURN servers.
In order to prevent having to scan the QR code each time a connection needs to be established, the public permanent key of the peer can optionally be stored as a trusted key.
To be able to securely store this information in the browser, users must provide a password if they want to be able to restore the session at a later point in time. The public permanent key of the app as well as the private permanent key of the web client are then encrypted with the provided user password using authenticated NaCl secret key encryption (XSalsa20 + Poly1305) and stored in local browser storage.
On the app side, the public permanent key of the web client and the private permanent key of the app are stored in the encrypted SQLite database.
When reconnecting to an existing session, instead of creating a new permanent key-pair, the peers restore the trusted key-pair before initiating the server and client handshake. Note, however, that new session keys are still created, thus offering perfect forward secrecy on a session level.
If Google Play Services are installed on the device of the user, the FCM push token (an opaque string provided by Google) is transferred to the web client together with the initial data. If the user requests to persist a session, the token is then stored in the local browser storage together with the trusted keys, encrypted with the user-provided password.
When reconnecting a web client session (e.g. when the user requests to restore a previous session, or when automatically reconnecting due to connection loss) and if a push token is available, the browser sends this token via HTTPS to a push relay server provided by Threema, together with the SHA256 hash of the public permanent key of the initiator. The push relay server then sends the hash to the app as a FCM push notification. When the app receives such a notification, it first checks whether the web client is enabled at all. If it is enabled and if a web client session with the specified public key exists, that session is enabled and connects to the SaltyRTC server for the handshake and reconnection procedure.