perfSONAR

Version 4.0rc2

Release Date

November 1, 2016

Features/Changes Highlights

  • NOTE: Changes are since 4.0 RC1. For a full list of changes from 3.5.1 to 4.0 RC1 see the RC1 release notes
  • NOTE: If you are updating from RC1, we highly recommend you run the command “pscheduler internal reset” and reboot the host. Note this command can take awhile depending on the number of tests on the schedule. While not strictly required it will give you a clean slate for testing to make sure your host is not suffering from any lingering bugs that existed between RC1 and RC2.

pScheduler

  • Numerous bug fixes and stability fixes (see bottom of this document for more details)
  • The psc command alias has been removed and you must now use the pscheduler command.
  • Added new pscheduler monitor command to view schedule in real time
  • Added initial support for detecting CPU socket based on selected interface for throughput tests

Graphs

  • Fixed display of testing errors
  • Updated to latest react-time-series library
  • Numerous bug fixes and visual enhancements (see bottom of this document for more details)

MeshConfig

  • Added support for all pScheduler tool options that were missing in the MeshConfig .conf file and resulting JSON
  • The agent responsible for updating the MaDDash configuration is now a daemon instead of a nightly cronjob. It will check remote meshes every 60 minutes (configurable) and automatically detect changes to the local meshconfig-guiagent.conf file
  • Added support for "address maps" which tell a host to use a different address when interacting with a particular remote host. Useful in situations such as when you have hosts communicate across different layer 2 circuits using /30 addresses or similar.

MaDDash

  • Most configuration changes now detected without a restart

Known Issues

  • On hosts with numerous traceroute and/or ping tests, you may frequently get errors such as "Unable to schedule test" when trying to create a throughput task from the pscheduler command-line. This is because throughput tasks currently want no other items on the schedule. You can generally get around this by adding the --slip PT10M option to your command (e.g. pscheduler task --slip PT10M throughput ...). This will tell the command that it will accept a start time within the next 10 minutes as opposed to the default 5 minutes. This does not affect tests managed by meshconfig-agent since by default a very large slip is requested.
  • On sufficiently loaded hosts it is not uncommon for archiving to fall behind. This is evidenced by graphs that only have data in the past, but the graphs continue to get new data just never in the present. We are working on a number of ways to make this process more efficient
  • On sufficiently loaded hosts, it is not uncommon for the pscheduler task to exit before the first run can be scheduled. Generally scheduling the first run should be a quick process so we are exploring this issue as well

Detailed Fixes and Changes

Bundles and Shared Libraries

  • Install a default limits file as part of testpoint bundle
  • Dependency on pscheduler-bundle-full in testpoint as opposed to individual packages
  • Fixed some error handling in pScheduler perl client
  • Fixed memory rounding errors in memory detection libraries
  • Detecting OpenStack, QEMU and BHYVE virtualization technologies and marking as VM in host info used by toolkit web page and lsregistrationdaemon
  • Fixed issue where toolkit page would not load because calls made in wrong order

BWCTL

  • No changes

Central Management

  • No changes

Esmond

  • No changes

Graphs

  • Changed opacity of tooltips and added IPv6 values to tooltips
  • Added a load indicator
  • Fixed issues when using multiple MA urls
  • Fixed some formatting issues with host details
  • Added support for URL parameters for tool, ipversion, timeframe, start and end
  • Updated react-time-series charts to version 0.10.2
  • Improved display of test errors

I2Util

  • No changes

LS Cache Daemon

  • No changes

LS Registration Daemon

  • No changes

MaDDash

  • Added support for detecting most configuration changes without needing to restart

MeshConfig

  • Added support for all missing tool options that are supported by pscheduler but were not yet in meshconfig
  • Fixed issue with ping packet_interval where it should have been read as floating point not integer
  • Removed all dependencies on the Toolkit ConfigDaemon
  • GUIAgent is now a daemon instead of a cron job. It reads the mesh once an hour and detects local config file changes to meshconfig-guiagent.conf without need for restart. It also no longer restarts maddash since a restart is not required to pick-up config changes.
  • Added support for the address_map option that allows a host to communicate with different hosts using different addresses even when in the same mesh. This is especially useful in layer2 causes where a host communicates with each remote endpoint using a different /30 or similar.
  •  
  • Related to the previous bullet, added the exclude_unmapped optionw hich makes it easier to create "sparse" meshes where only a few of the hosts test to each other.

OWAMP

  • Don't call setpgid() if we are already the process group leader. This allows owampd -Z to run under OSX launchd with a plist file.
  • Fix invalid struct timeval values that caused errors on Mac OS X

pScheduler

  • Removed psc alias to pscheduler command.
  • Added jitter reporting to nuttcp
  • Fixes for hostname checking in throughput test to avoid incorrectly triggering BWCTL
  • Fixed throughput test to properly list all inputs in commands like pscheduler monitor
  • Better error checking of participant lists
  • Made task updates more efficient
  • Added status command to init.d scripts
  • Added pscheduler internal reset command to clear out database
  • Fixed packaging issue so pscheduler-server can do warmboots
  • Fixed divide by zero error in ping tool
  • Fixed error in runner where local result was missing
  • Added CLI pscheduler monitor command that shows schedule in top-like output
  • Now force REST API to use SSL
  • Improved database indexing to speed up a number of operations
  • Time for failed runs that end early shortened so resources are freed sooner
  • Added bbr as a congestion control option for throughput tools
  • Fixed issue where limit inversion did not work
  • Increased the fudge factor pScheduler gives OWAMP to complete a task to 10 seconds
  • Created new limit type localif that matches all address belonging to an interfaces on the local host
  • Fixed handling of slip in CLI. Also added support for PSCHEDULER_SLIP environment variable. Default is 5 minutes.
  • Better error handling of case where a run cannot be scheduled
  • Ping test complains when source hostname cannot be resolved
  • Better detection of ip version in ping and trace tools
  • Increased the amount of time pscheduler tasks will wait for a run to be scheduled from 30 seconds to 60 seconds
  • Fixed malformed JSON output from powstream tool when reporting an error
  • Removed -S option from latency test. Now all tests are single-participant
  • Cleaned-up error reporting when server returns a generic HTTP error with HTML
  • Fixed formatting of floating-point numbers in latency and latencybg command-line output
  • Reload limits file automatically without restart
  • Fixed deadlock in owping tool caused by output that was too large
  • Fixed issue with omit interval formatting in BWCTl test
  • Changed cpu-affinity to client-cpu-affinity and server-cpu-affinity for throughput tests
  • Automatically detecting CPU socket to use based on interface for throughput through numactl. This is critical for high bandwidth testing.

Toolkit

  • Added support for interface aliases in interface selection screen
  • List interfaces in UNKNOWN state such as loopback interfaces in Toolkit web pages's interface selection menu
  • Added support for throughput "zero_copy" parameter
  • Fixed configure_esmond script that was not always properly initializing databases
  • Added /etc/pscheduler files in CentOS 6 to 7 migration scripts
  • Fixed broken links to reverse traceroute and reverse ping tools
  • Fixed malformed .bashrc scripts caused by bad updates from the toolkit package

Raw changes

Updated components: