Altair® Panopticon

 

Server Cluster Configuration

When you have muliple servers running, you can set them up so they synchronize content between them. The servers will use an internal protocol over http(s) to propagate changes and make sure their content is the same.

The cluster component discovers the other servers and the topology that connects them, and can use various methods to do so. The cluster component also identifies one of the running servers as the leader, the others are followers.

The leader-follower relationship determines how content is synchronized. A follower will immediately push any local change to the leader, for example, when you save a workbook after editing it. On the other hand, a follower periodically polls the leader for changes. This means the leader has the "latest" version of the content, whereas a follower may lag behind by a few seconds. The leader is also special if there are conflicting changes, for example, if two users edits and saves the same dashboard. In this case, the leader's changes always wins.

The REST services, that the servers call to synchronize content, expose potentially sensitive information such as data tables and data source settings. They are protected by token validation just as other services on the server, and only accepts special "server" tokens that are never issued to users. A server can only get a token from another server if they have both been configured with the same shared secret. That said, the calls are not encrypted, so if you connect two servers over the internet, you will want to use https.

Even though the content synchronization makes it easier to run a set of servers as a cluster behind a load balancer, you still need to use sticky sessions (session affinity). The server requires that a single user stays with the same server instance for the duration of a session.

There are four different cluster modes:

q  None - Each server is completely stand-alone, and nothing will be synchronized. This is the default, and no further configuration is needed.

q  Fixed - One server is the permanent leader. The other servers will synchronize with it if it is up. If the leader goes down, the followers will log the problem, but will continue to run basically as stand-alone servers. When the leader comes back up, they will start synchronizing again.

In practice, the fixed mode has a single point of failure. Because the followers connect through the leader, even if they keep running, their content will not be synchronized, and conflicts become more likely the more their content diverge.

To configure fixed mode, set cluster.shared.secret to the same non-empty string on all, set cluster.mode to FIXED on all, and then set cluster.fixed.leader to the URL of the leader on the followers only (leave it blank on the leader).

The leader URL should be the path to the web application, for example http://panoserver:8080/panopticon/. It needs to identify the leader server, and be resolvable on the network that the followers run on. If you use a load balancer, you cannot use the externally exposed URL, as it always needs to resolve to the leader server. If the leader server is dynamically assigned an IP, you need to take extra steps to assign it with a URL that does not change.

q  Bully - The server with the lowest ID (lexicographically) of the running servers is chosen as leader, and if it goes down a new leader is automatically appointed.

When a new server joins a bully cluster, it needs to discover the current list of members and their IDs. To do this, it tries to contact any running server from a list of known servers, called the boot servers. If any one of them answers, it replies with the current members and leader. If none of them answers, it starts as the single member of the cluster if it is one of the boot servers, or refuses to start if not.

In a sense, the bully mode is more flexible than the fixed mode, since it eliminates the single point of failure. As long as one server is still running, there will be a leader, so synchronization will happen. In another sense, it's less flexible as you need to provide more non-changing URLs, one for each server.

To configure the bully mode, set cluster.shared.secret (see above), set cluster.mode to BULLY on all, set cluster.bully.id to a unique ID string for each server (lower ID has higher leader priority), set cluster.bully.bind on each to the URL on which the other servers can reach it, and cluster.bully.boot to a comma-separated list of known server URLs.

An example bully configuration with three servers:

On server #1:

cluster.shared.secret=supersecretpassword

cluster.mode=BULLY

cluster.bully.id=panopticon-1

cluster.bully.bind=http://192.168.0.10/panopticon

cluster.bully.boot=\

http://192.168.0.10/panopticon,\

http://192.168.0.11/panopticon

On server #2:

cluster.shared.secret=supersecretpassword

cluster.mode=BULLY

cluster.bully.id=panopticon-2

cluster.bully.bind=http://192.168.0.11/panopticon

cluster.bully.boot=\

http://192.168.0.10/panopticon,\

http://192.168.0.11/panopticon

On server #3:

cluster.shared.secret=supersecretpassword

cluster.mode=BULLY

cluster.bully.id=panopticon-3

cluster.bully.bind=http://192.168.0.12/panopticon

cluster.bully.boot=\

http://192.168.0.10/panopticon,\

http://192.168.0.11/panopticon

 

Note that only servers #1 and #2 are boot servers, and that only id and bind differ between servers. With this configuration, servers #1 and #2 can be started in any order, but at least one of them must be up before #3 starts. On the other hand, you can add server #3 without #1 and #2 knowing about it up front, so non-boot servers can be useful in auto-scaling scenarios.

One caveat with non-boot servers is that if all the boot servers go down, a non-boot server will become the leader. If a new server joins, or a boot server rejoins, there is now way for them to see this, and you will end up with two separate clusters.

q  Kubernetes - The servers discover each other through the Kubernetes API Server, and the one whose pod has the lowest name (lexicographically) is chosen as leader. Each server periodically refreshes this information, so if the list of available pods change, they adapt.

To call the Kubernetes API, the server needs to know the address of the API Server and also have valid credentials. By default, the address is passed into the pod via Kubernetes downward API as environment variables KUBERNETES_SERVICE_HOST/PORT, and the credentials are mounted to /var/run/secrets/kubernetes.io/serviceaccount/, and the server will use these, so no extra configuration is needed.

The server discovers the other servers (pods) with a Kubernetes label selector. You can use any label and any selector for this, e.g., give each pod the metadata label "app" with value "panopticon" and use the selector "app=panopticon". The server will assume that all pods returned by the query are standard Panopticon servers.

You also need to tell each server what its own pod name is, so it can tell if it's supposed to be a leader or follower, and avoid calling itself. You can use the Kubernetes downward API to pass this in: use valueFrom, fieldRef and fieldPath "metadata.name" (see example below).

To configure the Kubernetes mode, set cluster.shared.secret (see above), set cluster.mode to KUBERNETES, set cluster.kubernetes.id to the pod's name, set cluster.kubernetes.label_selector to the pod selector, and cluster.kubernetes.peer_path to the web application path.

If the pod that runs the Panopticon server container also runs other containers, the first container will be used. If this is not the case, you can set cluster.kubernetes.container_name to the name of the container that runs the Panopticon server.

Example yaml snippet:

template:

  metadata:

    labels:

      app: panopticon

  spec:

    containers:

      ...

      env:

        - name: CLUSTER_SHARED_SECRET

          value: supersecretpassword

        - name: CLUSTER_MODE

          value: KUBERNETES

        - name: CLUSTER_KUBERNETES_ID

          valueFrom:

            fieldRef:

              fieldPath metadata.name

        - name: CLUSTER_KUBERNETES_LABEL_SELECTOR

          value: app=panopticon

        - name: CLUSTER_KUBERNETES_PEER_PATH

          value: panopticon/