This is the sequence to fire a job using
vovtaskerroot running on
UNIX.
- The tasker is running as root
- The tasker receives from vovserver a request to start a
job. The request contains the information about the job and about all the
properties attached to the job, including PRECMD and POSTCMD.
- If the flag -E of vovtasker is used,
then change command line of command to execute to
"vovfire $JOBID -l
SOME_LABEL > /tmp/vovfire.$JOBID.log
.
- If a PTY is requested, create the PTY and connect to the process on the
submission host.
- Compute affinity mask if requested (NUMA control).
- Make sure all groups for the executing user have been cached.
- fork() a subtasker. The parent
process goes back to the main loop. The child, that is, the subtasker, will be used to shepherd the job.
- The subtasker sets its own affinity mask (if
required).
- The subtasker creates the PTY and connects it to the
submission process.
- If VOV_DEBUG_TASKER
is set, the subtasker sleeps 10 seconds (to allow
connection of a debugger)
- The subtasker tries 3 times to switch user identity
(
uid
and gid
). In each attempt,
- it switches
gid
with setgid()
- it switches
uid
with setuid()
- it rebuilds the environment for the user (HOME, USER, LOGNAME, SHELL
)
If the switch fails, the subtasker waits 10 seconds before the next
attempt.
- The subtasker sets signals XCPU XFSZ PIPE USR1 USR2 to their default
behavior.
- The subtasker calls
nice(8 - execPriority)
, based on the
value of the execution priority for the job.
- If the switch of user identity fails, the system calls the diagnostics
script
vov_diagnistics_setuid
(not as root, but as the
owner of Accelerator).
- Now the subtasker is running as the user that owns the job.
- The subtasker tries to change directory chdir(). If it
fails the first time, it tries a few more times based on VOV_RETRY_CHDIR and VOV_RETRY_CHDIR_SLEEP.
- If the directory cannot be changed, the subtasker calls
vov_diagnostics_chdir
with arguments ID and DIR (the
directory that could not be accessed)
- The subtasker tries to switch environment
- The subtasker executes the .pre. scripts of the
environment. (obsolete, but still there)
- The subtasker executes the precmd script with a
system() call. If the precommand exits with a status
that is not 0 (zero), the job is done and failed.
- The subtasker executes the job and waits for it to finish.
- The subtasker executes the postcmd script with another
system() call. The exit status of the
postcmd is used as exit status of the job.