Over the last year I have been brainstorming ways to implement a platform for redundant and scalable services.
My prototypical application allows users to play the board game "go". Players will use the service to play a game of go online, in real-time.
Aside from the gameplay facilities of "go servers" it is natural for them to:
- manage user 'presence'
- provide a general "room style" chat service as well as user to user chat
- allow users to watch and comment on running games
Because of this overlap with the functionality provided by existing chat technologies such as IRC, I also intend my platform to be suitable for implementing a "next-generation" chat/conferencing service.
Existing "go servers" are monolithic applications that manage user presence, game events, chat events and record game results on a single server. This approach has two major flaws:
Running everything on one machine means the service is limited by the CPU and memory resources of that machine. The service is also limited by the bandwidth and latency of the local network connection.
Scalability is very important for this type of application because the demand on resources does not scale linearly. When managing presence (users joining and leaving rooms) the required number of messages sent by the server scales quadratically:
If 100 users generate 1 presence message per second, the server has to generate 100 notifications per second.
If 200 users generate 2 presence messages per second, this increases to 400 notifications per second.
If 300 users generate 3 messages and 400 users generate 4 messages per second we get 900 and 1600 notifications per second respectively.
The best way to deal with this problem is to separate users into rooms or channels. The danger is that one channel is joined by many or all the users because
Generally, most games have few observers and the number of notifications (game events, presence events, chat events) do not scale uncontrollably.
- the client software joins this room by default
- it is the de facto meeting place for starting new games or some other activity
When a game of special interest is played (or replayed live from a tournament), the number of observers can easily reach several hundred or even a thousand. Sending the game events is manageable, but the presence and chat messages generated by the users watching the game can easily stress the network connection and cause unacceptable lag.
Chat services (IRC, MSN, Jabber etc)
For services that were not designed for group or conference chat, implementing this functionality can result in scalability problems. Although IRC load-balances connections over multiple servers, it faces similar problems when large numbers of users join a channel (greater than linear increase in number of presence and chat messages sent out by servers) or when channel participants generate a lot of chat events.
A benefit of the design of IRC means that these messages will "fan-out" at the edge of the network, just before they are sent to the users. However, IRC suffers from the fact that it must replicate presence information across all servers in the network, which means it does not scale well as new servers are added or the number of participants in each channel grows large. As far as I know, modern IRC networks are generally limited to about 100 000 simultaneous users.
Availability in the event of failure
A service running on a single machine or in a single process becomes unavailable immediately in the event of a configuration error or a network, software or hardware failure. In the case of a chat service the error might go unnoticed by the users if the server restarts rapidly and the client program is capable of preserving the previous chat state and reconnecting in the background.
For a game server the crucial question is whether the game state has been preserved by the server. If not this results in a very bad experience for the user, all the concentration and effort he has invested is wiped out without a satisfactory conclusion.
On large IRC networks it is fairly common to experience a "netsplit" where one server becomes disconnected from the rest. When this happens all the clients belonging to that server appear to leave and rejoin the channel. This "transient failure" is very noticeable to other users.
By designing protocols that assume distributed implementation of a service and implement redundancy from the outset, I hope to build a platform that allows the types of services mentioned above to be implemented in a distributed, scalable and redundant manner.
The network layer
Since redundancy implies that servers will share (or duplicate) state, we need a mechanism for servers to send messages to one another. We have seen that even though users can be spread out over different rooms or channels, there may still be an extraordinary demand for a certain room or game. To cope with this I would like to use a "fan-out" technique where servers responsible for that game/channel are able to send events to other servers, which in turn fan them out to the clients.
To achieve scalability I looked at the techniques used by peer to peer systems. What these typically do is to create a "network overlay" among all the participating nodes. These self-organising networks are scalable because routing state is spread out across the network. Messages can usually be routed between nodes within log N hops. Another feature of their design is the ability to route a message to multiple destinations. The internet we use today does not implement this kind of multicast behaviour. For redundancy it's great to be able to send a message to multiple nodes and implementation is easier if we can do this without specifying the address of each recipient. This can be achieved by carefully designing the routing mechanisms and network layout to allow replication of a message to more than one node.
The lowest layer of my platform will be a network layer consisting of server nodes. I decided that since clients are not under the control of the service administrators, may be insecure or have very high latency or intermittent network connections, clients should connect directly to server nodes. Connections should be load-balanced over the server nodes and each client will probably connect to more than one server node (for redundancy).
By organising the server nodes into this overlay, we enable them to:
To achieve perfect fault-tolerance we need the service to continue seamlessly in the event of software, network or hardware failure at a specific node (I usually assume that configuration errors and easily triggered software errors will cause all nodes to fail at the same time). To do this we need to replicate all the state needed at one or more nodes. For my go server application this means that several nodes will receive the same game events from the clients, and duplicate the application logic that produces output to the users.
- scale as more server nodes are added
- scale with increasing numbers of clients
- scale with increasing demand for a specific resource (by forming a tree of "forwarding" nodes that fan out the outgoing messages)
The application layer
For a game like go, or for a chatroom, it's pretty easy to ensure that the application logic at all the nodes produces the same results. The biggest problem is that nodes will not have synchronised clocks and that messages will take different lengths of the time to reach the server nodes. For chat this is not too important, we can make the server and the client tolerant of such discrepancies. For a game of go, on the other hand, it's important that the service provides predictable and fair management of time allowed to each player. The protocol has to be very carefully designed so that the client is able to pick an authoritative response from one of the servers, and for a different server receiving a game event from the client to be able to adopt to a different game state.
I have not worked out all the details but I plan to have servers create a cryptographic signature for changes in game state, such that the client can send these along with its game event to prove that it isn't cheating.
Although it's conceptually neat to clearly separate the network and application layer, I realised that the problem as a whole is really about replicating and modifying the state of various "objects".
In the case of go, a game's state changes when the player or server generates game events. These must be propagated to the players and viewers. Usually this kind of replication is either a "pull" or a "push" mechanism. A pull mechanism is not suited to our applications because of their interactive nature. To have clients poll frequently for chat or presence messages would quickly swamp the system. For these applications a "push" mechanism coupled with subscription seems appropriate. Should we manage this subscription at the network or the application layer? If we do it at the application layer we forgo opportunities to spread out the burden of propagating messages back to the clients (at the edge of the network). It seems that the network must be aware of these subscriptions, so that we can create an efficient fan out mechanism for high-traffic objects.
If the network is aware of subscriptions, it has to be aware of the objects being subscribed to. This means our state objects are present both in the network and in the application layer.
To identify each object a consistent naming mechanism is required. For a go game this could be a concatenation of:
For a chat service the chat room could be identified by:
- The handle of the white player
- The handle of the black player
- The time the game was created (if this is part of a hash, we have to make it tolerant of time differences between server nodes, perhaps truncated to 30 min intervals)
- A random value generated by W's client
- A random value generated by B's client
- A sequence number counting the N'th game played between these two players in the last period (has to be sychronised across all server nodes)
These could have subscribeable subitems, eg:
- the name of the chat service
- the name of the room
A subscription would take the form of a tuple ( address of subscribing server node or client ID, id of subscribed object, expiry time ). Clients are responsible for renewing their subscriptions. The server nodes connected to clients will in turn renew their subscriptions by routing a message towards nodes handling this object.
- presence events
- game events
- chat events
For certain types of messages (eg subscription renewals), the network layer will process the subscription renewal at each hop.
A simplified client API:
Most communication should only transmit a representation of the changed state. When a message gets lost or there is a sequence violation, the client can request a full snapshot of the object and reset its state.
- subscribe to object O
- unsubscribe from object O
- get snapshot of state of object O updated within time T (allows cached copies)
- send state change C for object O
- receive next state change (the client will block while waiting for the next message)
A complete solution will look something like:
For each application we have to implement:
- Network layer (routing, subscriptions, caching of state)
- Application layer (decoding of objects from network layer into high level language object representation)
- Client API (for connecting to network layer, communicating with application)
- application logic
- web front end or client program
I have implemented a basic network overlay in Erlang based on the Skipnet design by Microsoft Research and the University of Washington.
Routing by name and numeric id (hashes)
Node departure and failure
TCP/IP transmission of messages
For my go server I intend to use Common Lisp. This application layer will be written in CL.
Code to communicate with network layer
Write code to deserialize objects coming from the network into native Lisp objects
Code to generate difference representations from application objects
Code to manage subscriptions
Code to manage object identities
I have implemented the basic logic for representing a game of go and the logic to determine legal moves and which stones have been captured.
Set up machine representation of allowed game state changes
Describe protocol for client events
Presence / Chat events
API for communicating with network layer / application
Object serialization for target language (Common Lisp)
I have implemented a basic AJAX client with Araneida and cl-ajax. Clients will connect via web browser to an araneida server which will connect to the network layer.