2016.09 Update 12

New Features and Enhancements

Products Internal Number Case Number Description
NetworkComputer 8174   Improved performance of vtk_resourcemap_set_limit.
NetworkComputer 8125   Added timing control for preemption with two new parameters: preemption.max.time.overall which limits the time spent by the preemption code in any given preemption iteration (normally 0.3 seconds at most once every 3 seconds) and preemption.max.time.rule which limits the maximum time for each rule. Improve performance of RESERVE_SLAVES rules. Allow RESERVE_SLAVES lists to be specified with SlaveList:NAMEOFLIST. Allow reservation of slaves to a bucket.
NetworkComputer 6022 21107 Removed the confusing -q option to nc forget, introduced in 2016.09 Update 9, since it could be confused with the -q (queue) option. Replaced it with the -quiet option to nc forget. Note that the command is not quiet in case of errors. To make it completely silent, use: nc forget ... >& /dev/null.

Resolved Issues

Products Internal Number Case Number Description
All 8184   Fixed error case when vovconsole prints out unknown color name "" when there is not visible error.
FlowTracer 8052 21619 Fixed bugs with barriers. Prevent jobs above a valid barrier from being retraced when a job below the barrier is retraced with the aggressive retrace flag. Prevent invalidation of a valid barrier that is the output of an invalid job during vovbuild. Added protection from illegal status changes (for example, a job can not be MISSING). Allow only INVALID to propagate to the entire downcone of a node.
FlowTracer 8083 21752 vtk_set_get_elements accepts -selrule as the same option with -rule. Error handling with invalid options is improved.
FlowTracer 8161   Fixed bug that sometimes caused deleted sets to not be removed from the set browser in FlowTracer after being forgotten. This was most visible when doing vovforget -allsets and depended on the order that sets were forgotten.
LicenseAllocator 8123 21866 LA will check allocations against min restriction at every step of allocation calculation.
LicenseAllocator 8088   LA will check out at least 1 token of jobs_la, even if no job are running.
LicenseMonitor 8113 21882 Fixes issue where the override timezone was not being honored for log parsing jobs.
LicenseMonitor 8176 21952 In some cases, the LM 'convert to batch report' link generated command lines that produced reports different from those in the browser UI. This is now fixed.
NetworkComputer 7958 21591 Modified both nc info and node.cgi to show the CHOSENSLAVEID if it is set. Also added the -sameslave option to vovresreq to control this behavior.
NetworkComputer 8027 21723 Prevent changes to a job while running if the changes would invalidate the job. This fixes a bug when resources are modified with nc modify on a running job, causing it to turn INVALID/Idle even though the processes are still running.
NetworkComputer 8126 21884 A new debug environment variable, VOV_DEBUG_NO_START has been created. The script vov_diagnostic_no_start will be run only if this environment variable is set to a non-zero value on the slave. The script contains a vovselect query, which can increase server load significantly under certain conditions. The query will be run only if VOV_DEBUG_NO_START is set to 2 on the slave.
NetworkComputer 5624 20532 Changed behavior for when we reconcile a resource R we can take a look at the previous resource in the grabbed list and check if the previous resource is a summary resource for R.
NetworkComputer 7962 21605 Fixed calculation of most recent job in the nc info ! command.
NetworkComputer 8043 21762 There are minor changes to the wording of the output of vsy, vovwhy, nc why, wx why and vtk_explain_status.
NetworkComputer 8090 21778 Corrected incorrect wording in why-waiting analysis that misreported the job's bucket rank in FairShare as the job's order in the bucket.
NetworkComputer 8093 21849 Fixed incorrect reference to: NC:AlsoRemovePreviousSummaryResources in recursive call.
NetworkComputer 8171   Reduced the rate at which vovresourced checks for LM to be up when using a hard-coded LM location and LM is currently down.
NetworkComputer 6939 21653 Fixed nc gui timeout restart so you do not have to go through the additional steps of clicking in the set bar and pressing enter.
NetworkComputer 8044 21764 If a slave becomes "stopped" slave, symlink slave.log is not updated by the slave.
NetworkComputer 8050 21772 The behavior of the nc info command has been fixed to provide better information in cases where a FAILED or INVALID job has an invalid return code.
NetworkComputer 8159   The nc who command was nonfunctional starting in 2016.09u9. This has been fixed.
NetworkComputer 7944 21576 Fixed issue where NC wrapper fails to exit when job is complete.
NetworkComputer 7978 21596 There is a new optional policy.tcl parameter on Linux platforms named slave.childProcessCleanup. Setting this parameter to 1 causes slaves to kill all child processes when a job exits. This parameter will implicitly use cgroups. Additionally the old method of using vovprocessmgr has been fixed.
NetworkComputer 8098 21828 The time window was missing from the URL of the FairShare web page. This prevented it from being shared. The URL now again has name=value pairs, which should once again allow sharing.
NetworkComputer 8136 21918 There was a race condition which could result in a deadlock when a signal handler was called. This manifested variously in the traceback as a hung call to futex() or readSocket(), and possibly others. This problem has been fixed.
NetworkComputer 8156 21915 When reading environments, if there is an environment variable with an empty value, call unsetenv on it instead of logging an error with a backtrace in the slave log.
NetworkComputer 8157 21928 The emulated bsub command now correctly uses the job placement policy defined in the NC jobclass that is mapped to the bsub "-q" option.
WorkloadXcelerator 8085 21803 WX now correctly handles SlaveList requests by not attempting to process them in the front-end, passing them on to the NC back-end for processing instead.
WorkloadXcelerator 7772 21266 Failover slaves now use the original vovserver's VOVDIR as opposed to their own when starting a failover server.
WorkloadXcelerator 8097   Catch error caused by the removal of a non-existent slave.