Sanity Check for vovserver

The command sanity is used to perform checks on the consistency of the trace and of other internal data structures.

Use sanity check when the server appears confused about the status of the trace.
% vovproject sanity

Use the reread command to re-read the server configuration. The files read are policy.tcl, security.tcl, equiv.tcl, setup.tcl, and exclude.tcl.

You need not use reread after changes to taskers.tcl, it is not a vovserver config file. It is used by vovtaskermgr.
% vovproject reread
sanity does a wide variety of checks, cleanups, and rebuilds of internal data structures. Check the vovserver log file for messages that include sanity. Here are some of the main things that it does:
  • Clears all alerts
  • Flushes journal and crash recovery files
  • Clears IP/Host caches
  • Stops and restarts resource daemon (vovresourced)
  • Checks and cleans internal object attachments
  • Verifies all places and jobs have sensible status
  • Resets user statistics and average service time
  • Checks the contents of system sets like System:jobs
  • Removes older jobs from recent jobs set
  • Makes sure all jobs in the running jobs set are actually running
  • Verifies all sets have the correct size
  • Clears the barrier-invalid flag on all nodes and recomputes it
  • Clears empty retrace sets
  • Checks preemption rules
  • Checks all tasker machines, marking them sick if they are not responding
  • Checks for rebooted tasker machines and terminates jobs attached to them
  • Checks filesystems on tasker machines and verifies mount points
  • Clears resource list caches from jobs
  • Clears and rebuilds job class sets
  • Creates limit resources for ones that are missing
  • Verifies grabbed resources (non running jobs should not have any)
  • Makes sure only running jobs have stolen resources
  • Reserves resources for all running jobs
  • Create any missing resource maps for groups and priorities
  • For each job with I/O, makes sure outputs are newer than inputs
  • Makes sure any file with running status has an input job with running status
  • Verifies the status of all nodes
  • Checks for stuck primary inputs (primary inputs should only be VALID or MISSING)
  • If a file is invalid or missing, but the input job is VALID, turn the job INVALID
  • Finds running jobs without tasker and changes the status to SLEEPING
  • Makes sure all input files of a VALID job are also VALID
  • Makes sure all output files of a job have the same status as the job
  • Recomputes waitreason counts
  • Checks job queue buckets
  • Verifies link between job queue buckets and resource maps
  • Makes sure all queued jobs have job queue buckets
  • Checks FairShare groups
  • Checks for a license