The Mobile Shell, or MoSH, has been released on GitHub as a replacement for SSH for mobile devices. The principles behind MoSH are due to be presented in a technical paper at the 2012 USENIX Annual Technical Conference next month.
There are two significant features that distinguish Mosh from alternative mechanisms:
- Firstly, the connection is IP address agnostic; rather than using a TCP/IP connection, data is sent via UDP/IP. As a result, if the signal drops or the device's IP address changes (because it is roaming between a WiFi and a cellular network, for example) then the connection is not reliant on the transport layer maintaining state connectivity.
- Secondly, rather than providing a transparent encrypted byte stream from client to server, and having the server respond with the screen redraw mechanisms, the Mosh client provides a variation of local echo. Thus, when the user types an 'X' on the keyboard, instead of having to round-trip between the client to sever and back to client to display the 'X' on screen, the Mosh client can display the 'X' immediately.
These two changes are clearly a significant departure from the layered architecture of connection streams like SSH, which provide the encrypted bytestream only between two end points and then let the remote server program redraw the screen. As SSH has no knowledge of either end of the connection's use of the data, any keypresses take a minimum of one round trip to display.
Mosh works by creating a server process, running in user space, listening to for UDP packets. The Mosh client then is given the address of the server (and UDP port) and initiates the connection by sending data in packets. However, since the connection itself is stateless at the network layer, if the client moves to a different IP address then UDP packets can come from a different IP address (sending to the same server) without a loss of connectivity.
The use of UDP, rather than TCP, also means the Mosh client/server connection has to do its own state management. Each packet in the connection has an incrementing number, and the client and server both build up a list of known packets, with the Mosh libraries initiating retransmission where necessary. (This is in essence roughly how TCP works under the covers in any case.) It is also this property which lets the Mosh client roam between different IP addresses or (in the future) between IPv4 and IPv6. The server's IP address must remain fixed however.
Packets sent via the Mosh connection are encrypted with AES-128 in OCB mode. This provides an encrypted endpoint with a fixed key, which is generated at server startup and printed along with the connection information:
$ mosh-server
MOSH CONNECT 60004 4NeCCgvZFe2RnPgrcU1PQw
…
Although this approach is new, AES-128 OCB has been around for a while and the authors note that they invite further scrutiny:
Has your secure datagram protocol been audited by experts?
No. Mosh is actively used and has been read over by security-minded crypto nerds who think its design is reasonable, but any novel datagram protocol is going to have to prove itself, and SSP is no exception. We use the reference implementations of AES-128 and OCB, and we welcome your eyes on the code. We think the radical simplicity of the design is an advantage, but of course others have thought that and have been wrong. We don't doubt it will (properly!) take time for the security community to get comfortable with mosh.
The second significant change is the synchronisation of screen state between client and server processes. Almost all of the time, when a user types a character (like 'X') they want the X to be displayed on screen; and when they type backspace, they want the character deleted. Similarly, the up/down/left/right arrow keys almost always move one character in that direction.
To achieve this, the mosh-server stores an emulated view of the client's screen (and vice versa) and has predictions based on whether the keys typed will change the screen in a 'normal' way. If its predictions are off, it will wait for the round-trip time to the server to confirm the state of the screen (essentially, degrading to the old SSH mechanism) but in cases where it has good confidence that typing keystrokes will cause a screen update it will show those locally. In shell-like environments (where most keystrokes are echoed as-is) this reduces the perceived lag to the server, and additionally can send less network traffic because keystrokes can be sent in batched packets rather than one-at-a-time packets as used by most interactive connections.
One restriction is that input is only supported via UTF-8, because multiple keypresses can be converted into a single character (and vice versa). Rather than try to tackle all encodings and input methods, Mosh only focusses on UTF-8 and ensures it gets that right. More details are available on the technical information page.
Mosh brings a new approach to connections between mobile devices and servers. Each client connection takes a single UDP port (and UDP must be routable between the end points) but the channel is encrypted with the private key. Since the connection can be ported to other IP addresses, concerns will naturally be raised about whether the connection can be broken into, and the fact that the 'connection' stays alive even when the client disconnects may be a concern for some.
However, its advantages, and in particular, its improved perception of response time, may be enough to encourage people to look at the system. And since the processes on both sides are user processes, Mosh can easily be installed into existing environments without needing elevated privileges to install.