Cloud Bursting Startup Script
Create a script that is ran when the cloud node is burst.
Introduction
Your site will want to do some configuration to their cloud nodes after booting. For example, you may want to install some packages, add users, or start services. A startup script can be added to a bursting scenario that will be run when the instance boots to perform automated tasks. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script. You can use startup scripts to easily and programmatically customize your cloud instances.
Startup Script on Windows Platforms
On Windows platforms, the startup script must be a PowerShell script. The content of the
PowerShell script should be enclosed in <powershell>
and
</powershell>
. For more information about PowerShell see PowerShell Scripting.
Startup Script on Linux Platforms
On Linux platforms, a utility specifically designed for cloud instance initialization is cloud-init. The cloud-init program is a bootstrapping utility for pre-provisioned disk images that run in virtualized environments, usually cloud-oriented services. Basically, it sets up the server instance to be usable when it’s finished booting. You must install cloud-init on your cloud provider VM to simplify the task of configuring your instances on boot. For more information see cloud-init.
- Shell scripts
- Cloud config files
The simplest way to configure an instance on boot is to use a shell script. The shell script must begin with #! in order for cloud-init to recognize it as a shell script.
Example of a cloud-init Script for a Linux Virtual Machine
Below are examples of configuration that should be done via the startup script after a node has been burst in the cloud. These examples are not intended to be copied and pasted as is, you must configure the startup script per your site's needs.
#!/bin/sh # Map IP address to hostnames via /etc/hosts echo "/etc/hosts setup" rm -f /etc/hosts echo "127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4" > /etc/hosts echo "PBS_SERVER_IP_ADDR headnode headnode.DOMAINNAME" >> /etc/hosts # Disable NetworkManager so that it does not overwrite the /etc/resolv.conf file systemctl disable NetworkManager systemctl stop NetworkManager systemctl enable network systemctl start network # Configure PBS via /etc/pbs.conf echo "pbs setup" systemctl stop pbs rm -f /etc/pbs.conf echo "PBS_EXEC=/opt/pbs/default" > /etc/pbs.conf echo "PBS_HOME=/var/spool/PBS" >> /etc/pbs.conf echo "PBS_START_SERVER=0" >> /etc/pbs.conf echo "PBS_START_MOM=1" >> /etc/pbs.conf echo "PBS_START_SCHED=0" >> /etc/pbs.conf echo "PBS_START_COMM=0" >> /etc/pbs.conf echo "PBS_SERVER=PBS_SERVER_HOSTNAME" >> /etc/pbs.conf echo "PBS_CORE_LIMIT=unlimited" >> /etc/pbs.conf echo "PBS_SCP=/bin/scp" >> /etc/pbs.conf echo "PBS_LEAF_ROUTERS=HOSTNAME,HOSTNAME" >> /etc/pbs.conf # Since Control 2019.1, DNS is no longer used for registering cloud nodes. Therefore, # pbs.conf must be updated with the cloud node's IP address. IP=$(ip addr show eth0 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1) echo "PBS_MOM_NODE_NAME=$IP" >> /etc/pbs.conf # Configure the MoM echo "mom config setup" . /etc/pbs.conf echo "\$clienthost $PBS_SERVER" >> /var/spool/pbs/mom_priv/config echo "\$clienthost ${PBS_SERVER//.*}" >> /var/spool/pbs/mom_priv/config echo "\$restrict_user_maxsysid 999" >> /var/spool/pbs/mom_priv/config # Restart pbs systemctl start pbs
An explanation for each section of the startup script is given below. For the below examples assume the following:
Fully qualified domain name (FQDN) of the PBS Server = pbs.altair.com
NIC address of the PBS Server is 10.0.0.5 on the 10.0.0.0/24 network.
Configure the Host File
Map hostnames to the PBS Server IP address by updating the /etc/hosts file.
# Map IP address to hostnames via /etc/hosts echo "/etc/hosts setup" rm -f /etc/hosts echo "127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4" > /etc/hosts echo "10.0.0.5 headnode headnode.pbs.altair.com" >> /etc/hosts
Disable NetworkManager and Use Network Interface
# Disable NetworkManager so that it does not overwrite the /etc/resolv.conf file systemctl disable NetworkManager systemctl stop NetworkManager systemctl enable network systemctl start network
Configure PBS
# Configure pbs.conf
echo "pbs setup"
systemctl stop pbs
rm -f /etc/pbs.conf
echo "PBS_EXEC=/opt/pbs" > /etc/pbs.conf
echo "PBS_HOME=/var/spool/pbs" >> /etc/pbs.conf
echo "PBS_START_SERVER=0" >> /etc/pbs.conf
echo "PBS_START_MOM=1" >> /etc/pbs.conf
echo "PBS_START_SCHED=0" >> /etc/pbs.conf
echo "PBS_START_COMM=0" >> /etc/pbs.conf
echo "PBS_SERVER=PBS_SERVER_HOSTNAME" >> /etc/pbs.conf
echo "PBS_CORE_LIMIT=unlimited" >> /etc/pbs.conf
echo "PBS_SCP=/bin/scp" >> /etc/pbs.conf
echo "PBS_LEAF_ROUTERS=HOSTNAME,HOSTNAME" >> /etc/pbs.conf
# Since Control 2019.1, DNS is no longer used for registering cloud nodes. Therefore,
# pbs.conf must be updated with the cloud node's IP address.
IP=$(ip addr show eth0 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1)
echo "PBS_MOM_NODE_NAME=$IP" >> /etc/pbs.conf
Where PBS_SERVER_HOSTNAME represents the hostname of the machine where
the PBS Server is installed and HOSTNAME tells each endpoint which
communication daemon it should talk to.Configure the PBS MoM and Restart PBS
Update the PBS_HOME/mom_priv_config file to configure the MoM:
# Configure /var/spool/pbs/mom_priv/config echo "mom config setup" . /etc/pbs.conf echo "\$clienthost $PBS_SERVER" >> /var/spool/pbs/mom_priv/config echo "\$clienthost ${PBS_SERVER//.*}" >> /var/spool/pbs/mom_priv/config echo "\$restrict_user_maxsysid 999" >> /var/spool/pbs/mom_priv/config systemctl start pbs
Optional Configuration
Use the startup script to configure filesystems (/etc/fstab), configure NIS (/etc/yp.conf), mount necessary filesystems, and any other configuration that your site requires.
Below are a few examples:
Creating Local Scratch Space
Create local scratch on a fast local disk and use it as default location to run jobs (use the PBS sandbox feature to place data in job scripts):
mkdir /scratch
chmod 1777 /scratch
echo "\$jobdir_root /scratch" >> /var/spool/pbs/mom_priv/config
14.13.1.4 Example of Setting Location for Creation of Staging and
Execution Directories
To make it so that jobs with sandbox=PRIVATE have their staging and execution directories created under /scratch, as /scratch/<job-specific_dir_name>, put the following line in MoM’s configuration file:
$jobdir_root /scratch
Mount a Directory for PBS Data Transfer
echo "PBS_SERVER_IP_ADDR headnode headnode.DOMAINNAME" >> /etc/hosts … … yum install -y nfs-utils mount -t nfs headnode:/home /home
Configuring the MoM for Local Copy
echo "PBS_SERVER_IP_ADDR headnode headnode.DOMAINNAME" >> /etc/hosts
…
…
echo "\$usecp headnode:/home/ /home/" >> /var/spool/pbs/mom_priv/config
Example: Add a Custom Resources to a Cloud Node
Use the cloud-init script in conjunction with a PBS MoM Version 2 Configuration file to add PBS host level resources to burst compute nodes.
At the beginning of the cloud-init script placed the following line:
HOST=$(uname -n)
Adding the following lines to the end of the cloud-init script
adds a custom resource ngpu to the cloud node. The custom resource must
already be defined to PBS.#Create a v2config file to add accelerators and custom resources to PBS
#Note: Use $HOST not $IP as mom will create a second vnode from IP but add
#to the natural node via HOST
echo "\$configversion 2" > /root/v2config
echo "$HOST: resources_available.ngpu = 2" >> /root/v2config
/opt/pbs/sbin/pbs_mom -s insert v2config /root/v2config
systemctl restart pbs
$configversion 2
computea000000: resources_available.ngpu = 2
create node 172.17.0.4 set node 172.17.0.4 state = free set node 172.17.0.4 resources_available.arch = linux set node 172.17.0.4 resources_available.cloud_node_image = <IMAGENAME> set node 172.17.0.4 resources_available.cloud_node_instance_type = Standard_A4 set node 172.17.0.4 resources_available.cloud_provisioned_time = 1589376781 set node 172.17.0.4 resources_available.cloud_scenario = <SCENARIO> set node 172.17.0.4 resources_available.host = 172.17.0.4 set node 172.17.0.4 resources_available.mem = 14352224kb set node 172.17.0.4 resources_available.ncpus = 8 set node 172.17.0.4 resources_available.ngpu = 2 set node 172.17.0.4 resources_available.vnode = 172.17.0.4 set node 172.17.0.4 resv_enable = False