perfSONAR 4.0rc2 Release Notes

Released November 1, 2016

Features/Changes Highlights

NOTE: Changes are since 4.0 RC1. For a full list of changes from 3.5.1 to 4.0 RC1 see the RC1 release notes
NOTE: If you are updating from RC1, we highly recommend you run the command “pscheduler internal reset” and reboot the host. Note this command can take awhile depending on the number of tests on the schedule. While not strictly required it will give you a clean slate for testing to make sure your host is not suffering from any lingering bugs that existed between RC1 and RC2.

pScheduler

Numerous bug fixes and stability fixes (see bottom of this document for more details)
The psc command alias has been removed and you must now use the pscheduler command.
Added new pscheduler monitor command to view schedule in real time
Added initial support for detecting CPU socket based on selected interface for throughput tests

Graphs

Fixed display of testing errors
Updated to latest react-time-series library
Numerous bug fixes and visual enhancements (see bottom of this document for more details)

MeshConfig

Added support for all pScheduler tool options that were missing in the MeshConfig .conf file and resulting JSON
The agent responsible for updating the MaDDash configuration is now a daemon instead of a nightly cronjob. It will check remote meshes every 60 minutes (configurable) and automatically detect changes to the local meshconfig-guiagent.conf file
Added support for “address maps” which tell a host to use a different address when interacting with a particular remote host. Useful in situations such as when you have hosts communicate across different layer 2 circuits using /30 addresses or similar.

MaDDash

Most configuration changes now detected without a restart

Known Issues

On hosts with numerous traceroute and/or ping tests, you may frequently get errors such as “Unable to schedule test” when trying to create a throughput task from the pscheduler command-line. This is because throughput tasks currently want no other items on the schedule. You can generally get around this by adding the –slip PT10M option to your command (e.g. pscheduler task –slip PT10M throughput …). This will tell the command that it will accept a start time within the next 10 minutes as opposed to the default 5 minutes. This does not affect tests managed by meshconfig-agent since by default a very large slip is requested.
On sufficiently loaded hosts it is not uncommon for archiving to fall behind. This is evidenced by graphs that only have data in the past, but the graphs continue to get new data just never in the present. We are working on a number of ways to make this process more efficient
On sufficiently loaded hosts, it is not uncommon for the pscheduler task to exit before the first run can be scheduled. Generally scheduling the first run should be a quick process so we are exploring this issue as well

Detailed Fixes and Changes

Bundles and Shared Libraries

Install a default limits file as part of testpoint bundle
Dependency on pscheduler-bundle-full in testpoint as opposed to individual packages
Fixed some error handling in pScheduler perl client
Fixed memory rounding errors in memory detection libraries
Detecting OpenStack, QEMU and BHYVE virtualization technologies and marking as VM in host info used by toolkit web page and lsregistrationdaemon
Fixed issue where toolkit page would not load because calls made in wrong order

BWCTL

No changes

Central Management

No changes

Esmond

No changes

Graphs

Changed opacity of tooltips and added IPv6 values to tooltips
Added a load indicator
Fixed issues when using multiple MA urls
Fixed some formatting issues with host details
Added support for URL parameters for tool, ipversion, timeframe, start and end
Updated react-time-series charts to version 0.10.2
Improved display of test errors

I2Util

No changes

LS Cache Daemon

No changes

LS Registration Daemon

No changes

MaDDash

Added support for detecting most configuration changes without needing to restart

MeshConfig

Added support for all missing tool options that are supported by pscheduler but were not yet in meshconfig
Fixed issue with ping packet_interval where it should have been read as floating point not integer
Removed all dependencies on the Toolkit ConfigDaemon
GUIAgent is now a daemon instead of a cron job. It reads the mesh once an hour and detects local config file changes to meshconfig-guiagent.conf without need for restart. It also no longer restarts maddash since a restart is not required to pick-up config changes.
Added support for the address_map option that allows a host to communicate with different hosts using different addresses even when in the same mesh. This is especially useful in layer2 causes where a host communicates with each remote endpoint using a different /30 or similar.
Related to the previous bullet, added the exclude_unmapped optionw hich makes it easier to create “sparse” meshes where only a few of the hosts test to each other.

OWAMP

Don’t call setpgid() if we are already the process group leader. This allows owampd -Z to run under OSX launchd with a plist file.
Fix invalid struct timeval values that caused errors on Mac OS X

pScheduler

Removed psc alias to pscheduler command.
Added jitter reporting to nuttcp
Fixes for hostname checking in throughput test to avoid incorrectly triggering BWCTL
Fixed throughput test to properly list all inputs in commands like pscheduler monitor
Better error checking of participant lists
Made task updates more efficient
Added status command to init.d scripts
Added pscheduler internal reset command to clear out database
Fixed packaging issue so pscheduler-server can do warmboots
Fixed divide by zero error in ping tool
Fixed error in runner where local result was missing
Added CLI pscheduler monitor command that shows schedule in top-like output
Now force REST API to use SSL
Improved database indexing to speed up a number of operations
Time for failed runs that end early shortened so resources are freed sooner
Added bbr as a congestion control option for throughput tools
Fixed issue where limit inversion did not work
Increased the fudge factor pScheduler gives OWAMP to complete a task to 10 seconds
Created new limit type localif that matches all address belonging to an interfaces on the local host
Fixed handling of slip in CLI. Also added support for PSCHEDULER_SLIP environment variable. Default is 5 minutes.
Better error handling of case where a run cannot be scheduled
Ping test complains when source hostname cannot be resolved
Better detection of ip version in ping and trace tools
Increased the amount of time pscheduler tasks will wait for a run to be scheduled from 30 seconds to 60 seconds
Fixed malformed JSON output from powstream tool when reporting an error
Removed -S option from latency test. Now all tests are single-participant
Cleaned-up error reporting when server returns a generic HTTP error with HTML
Fixed formatting of floating-point numbers in latency and latencybg command-line output
Reload limits file automatically without restart
Fixed deadlock in owping tool caused by output that was too large
Fixed issue with omit interval formatting in BWCTl test
Changed cpu-affinity to client-cpu-affinity and server-cpu-affinity for throughput tests
Automatically detecting CPU socket to use based on interface for throughput through numactl. This is critical for high bandwidth testing.

Toolkit

Added support for interface aliases in interface selection screen
List interfaces in UNKNOWN state such as loopback interfaces in Toolkit web pages’s interface selection menu
Added support for throughput “zero_copy” parameter
Fixed configure_esmond script that was not always properly initializing databases
Added /etc/pscheduler files in CentOS 6 to 7 migration scripts
Fixed broken links to reverse traceroute and reverse ping tools
Fixed malformed .bashrc scripts caused by bad updates from the toolkit package

Raw changes

Updated components:

Tags: