perfSONAR

FAQ

Mailing List Archives

Frequently Asked Questions

  1. How do I join the perfSONAR Collaboration?
  2. Where can I ask questions or report bugs?
  3. How do I use the NPAD command line client (diag-client)?
  4. How do I use BWCTL?
  5. How do I use the NDT command line client (web100clt)?
  6. How do I use OWAMP?
  7. Can I Use a Firewall?
  8. What should I enter for the Communities of interest configuration question?
  9. I do not think I am a member of a community, should I put anything?
  10. What are the hardware requirements for running the perfSONAR Toolkit?
  11. Does my machine have to meet the System Requirements?
  12. The colors on my Console Configuration do not match what I see on the web. Some are green already.
  13. The "Services On This Node" screen shows many services in the non-running state when first started, what is wrong?
  14. I do not see my service in the Global Set Of Services, where is it?
  15. What is the purpose of BWCTL Limits/OWAMP Limits?
  16. How can I set limits to prevent others from overusing BWCTL/OWAMP?
  17. How many NTP servers do I need, can I select them all?
  18. Can I boot from a USB key instead of a CD?
  19. A CVE announcement was made for the current perfSONAR Toolkit Kernel, what do I do?
  20. During the NetInstall, I see errors about a corrupt file being downloaded. What should I do?
  21. How do I change the MTU for a device?
  22. How do I change the SSL certificate used by the web server?
  23. I forgot to enable IPv6 in CentOS when I installed the toolkit. How do I enable it?
  24. Why is the static IPv6 address I assigned during the net-install process not configured when my host starts-up?
  25. The drivers that are included for my "foo" network card old, what should I do?
  26. Where can I find more resources regarding timekeeping for VMWare Virtual Machines?
  27. What TCP congestion control algorithm is used by the perfSONAR Toolkit?
  28. Where are the relevant logs for perfSONAR services?
  29. Is it possible to change the default port for tool X?
  30. Why doesn't the perfSONAR toolkit include the most recent version of the Myricom NIC driver?
  31. My perfSONAR node is experiencing a kernel panic for a recently installed kernel, what should I do?
  32. Can I configure yum to exclude kernel packages from it's update procedure?
  33. How can I configure yum to automatically update the system?
  34. When attempting to use BWCTL with an IPv6 address, the command fails: bwctl: Unable to connect to 2001:468:1:11::16:66:4823. What should I do?
  35. My host was impacted by Shellshock/Heartbleed/etc., what should I do?
  36. I would like to install and patch perfsonar boxes behind a web proxy, is it possible to specify this on the grub command line?
  37. How can I add custom rules to IPTables?
  38. When installing the perfSONAR Toolkit 3.5 netinstall image with static IP, it does not keep this when the system first boots. The BOOTPROTO option remains set to "dhcp" in the /etc/sysconfig/network-scripts/ifcfg-eth0 after the installation. Is this a bug?
  39. How can I force testing over IPv4 or IPv6 in the mesh configuration?
  40. When trying a clean install with perfSONAR Toolkit version 3.4, the system doesn't recognize any disks/doesn't see my RAID controller. Things work with other systems (CentOS 6, Scientific Linux 6). What should I do?
  41. Which repository addresses will be used to get updates to the perfSONAR software?
  42. I see that the latency/loss tests in 3.4 use bwping. Does this mean that we can run both throughput and latency/loss tests on the same interface without interference due to the way bwctl scheduling works?
  43. Where can I find documentation on interacting with perfSONAR via a custom client?
  44. When setting up a dual homed host, how can one get individual tests to use one interface or another?
  45. Which licenses do perfSONAR products use?
  46. When attempting to publish data to esmond, a "301 MOVED PERMANENTLY" is returned. What should I do?
  47. Is it possible to use non-intel SFP+ optics in the Intel X520-SR2 NIC?
  48. How can I clean up the data in my esmond instance?
  49. How can I backup the data in my esmond instance?
  50. How do I backup a perfSONAR instance or migrate the configuration and data to a new machine?
  51. How can I tune a Dell server for a low latency environment?
  52. How do I disable global registration?
  53. I have a measurement archive machine with esmond running, and there is a separate disk partition mounted on the machine where I want to store all the incoming measurement data from measurement points. What is the proper way to change the default directory location for storing the measurement archive data?
  54. How can I nuke all of the data in esmond, and start from scratch?
  55. I am running a small node, and seeing a lot of IO. What is going on?
  56. I want to operate a "Dynamic" Maddash Mesh with hosts from a lookup service. Where do I start?
  57. How to get rid of "There isn't a perfSONAR sudo user defined" message?
  58. After restarting my updated 3.5. hosts the regular testing service is shown as not running?
  59. How does version numbering work for the perfSONAR project?
  60. Is it possible to run multiple simultaneous tests for NDT?
  61. I have some machines with the old Level 2 install option. When I update to version 3.5 what will happen? Will they become equivlent to one of the 3.5 install bundles?
  62. After upgrading to 3.5 my maddash instance won't start (e.g. HTTP fails)?
  63. What is PTP?
  64. How accurate is PTP?
  65. What is required to use PTP in my network?
  66. Does perfSONAR support PTP?
  67. I am seeing a "Can't locate object method 'ssl_opts' via package 'LWP::UserAgent'" error when trying to use a Central Measurement Archive.
  68. I have a central MA for my perfSONAR data, what happens if the central MA goes down for a while, or the network is unavailable between the beacons and the MA? Are the measurements lost, or are they buffered to be delivered later? Does this have any effect on the timing of subsequent measurements?
  69. What if there are multiple MAs used for a central MA architecutre, but only one is down or unreachable, what will happen?
  70. If you have made manual changes to regular_testing.conf to point to a different MA (or multiple MAs), and you subsequently change test configurations through the GUI, does this leave your MA customizations alone?
  71. How much memory is needed for a host running an MA?
  72. I am trying to run perfSONAR on low-cost hardware (e.g. raspberry pi, etc.). Where should I start?
  73. Why do I get such weird results when I test from a 10G connected host to 1G connected host?
  74. My perfSONAR results show consistent line-rate performance, but a researcher at my site is reporting really poor performance, what gives?
  75. Is there a way to visualize GridFTP results in Maddash?
  76. How can I configure my toolkit to allow a private IP?
  77. How do I change the default BWCTL tool?
  78. Cassandra seems to be using 100% CPU on my system and I can't figure out why.
  79. How do I setup a perfSONAR node to have two interfaces on the same subnet?
  80. Is there a way to re-image perfSONAR resources remotely?

Q:How do I join the perfSONAR Collaboration? A:

Please contact us at perfsonar-lead@internet2.edu.


Q:Where can I ask questions or report bugs? A:

For questions, send email to perfsonar-user at internet2 dot edu. You may also join the mailing list by visiting https://lists.internet2.edu/sympa/info/perfsonar-user. For bugs, report at https://github.com/perfsonar/project/issues.


Q:How do I use the NPAD command line client (diag-client)? A:

The diag-client is a command line version of the NPAD diagnostic client. Instead of needing a web browser, this client runs the tests from a terminal window. The basic options are to provide a server name/address and the connection IP. The NPAD server has two ports open:

  • 8000 for HTTP traffic
  • 8001-8020 for measurement traffic

Please connect to 8100:

[knoppix@Knoppix ~]$ diag-client HOSTNAME 8100
Using: rtt 10 ms and rate 20
Connected.
Control connection established.
port = 8003
Starting test.
Parameters based on 107 ms initial RTT
peakwin=27617 minpackets=3 maxpackets=83 stepsize=8
Target run length is 608 packets (or a loss rate of 0.16447368%)
Test 1a (11 seconds): Coarse Scan
Test 1b (11 seconds): ...
...

Connecting to the HTTP port will result in the following error:

[knoppix@Knoppix ~]$ diag-client HOSTNAME 8200
Using: rtt 10 ms and rate 20
Connected.
Protocol error: bad handshake.
Please make sure you have the latest client,  and you have the correct port number

Q:How do I use BWCTL? A:

BWCTL (Bandwidth Test Controller) is a client/server program developed to simplify Iperf, iperf3, and nuttcp testing between hosts. At boot time, the toolkit starts a BWCTL server process and leaves it listening on TCP port 4823. This server may then be accessed by remote BWCTL clients. Additionally, the disk contains BWCTL client applications that can be used to test to remote instances. The BWCTL server allows TCP tests with a maximum duration of 60 seconds. To run a test to a remote BWCTL server:

  • Logon to the perfSONAR Toolkit server using the root user, or other valid userid
  • Identify the remote server
  • Run bwctl -s remote-bwctl-server command to stream data for 10 seconds from the remote BWCTL server to the local instance.
  • Run bwctl -c remote-bwctl-server command to stream data for 10 seconds from the locally instance to the remote BWCTL server.
  • Results are displayed on our console or terminal window.

Q:How do I use the NDT command line client (web100clt)? A:

The web100clt is a command line version of the NDT diagnostic client. Instead of needing a web browser, this client runs the tests from a terminal window. This can be installed via the "ndt-client" RPM package (available via yum), or by downloading the NDT source and compiling only the client tools. 

The basic options are to provide a server name/address and the connection IP. The NPAD server has two ports open:

  • 7123 for HTTP traffic
  • 3001, 3002, 3003 for measurement traffic

Please connect to 3001, 3002, or 3003:

[knoppix@Knoppix ~]$ web100clt -n HOSTNAME -p 3001
Testing network path for configuration and performance problems  --  Using IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . .  Done
checking for firewalls . . . . . . . . . . . . . . . . . . .  Done
running 10s outbound test (client to server) . . . . .  164.00 kb/s
running 10s inbound test (server to client) . . . . . . 13.40 Mb/s
The slowest link in the end-to-end path is a a 622 Mbps OC-12 subnet
Information [C2S]: Packet queuing detected: 16.95% (local buffers)
Information [S2C]: Packet queuing detected: 67.10% (local buffers)
Server '128.193.128.237' is not behind a firewall. [Connection to the ephemeral port was successful]
Client is not behind a firewall. [Connection to the ephemeral port was successful]
Packet size is preserved End-to-End
Server IP addresses are preserved End-to-End
Client IP addresses are preserved End-to-End

Connecting to the HTTP port (or other ports) will result in the following error:

[knoppix@Knoppix ~]$ web100clt -n HOSTNAME -p 7123
Testing network path for configuration and performance problems  --  Using IPv4 address
Information: The server 'HOSTNAME' does not support this command line client

Q:How do I use OWAMP? A:

OWAMP (One-Way Ping) is a client server program that was developed to provide delay and jitter measurements between two target computers. At boot time, the perfSONAR Toolkit starts an OWAMP server process and leaves it listening on TCP port 861. This server may then be used by remote clients. Additionally, the disk contains an OWAMP client application that can be used to test to remote instances. By default, the OWAMP server sends a low-level data stream in each direction and measures the one-way delay and jitter between the two hosts. Separate measurements, one for each direction, are reported to the user at the end of the test. To run a test to a remote OWAMP server:

  • Logon to the perfSONAR Toolkit server using a valid userid.
  • Identify the remote server.
  • Run the owping remote-owamp-server command to make a pair of 10 second delay measurements (one in each direction) between remote OWAMP server and the local instance.
  • Results are displayed on the console or terminal window.

Q:Can I Use a Firewall? A:

Please see http://www.perfsonar.net/deploy/security-considerations/ page on the perfSONAR website, that discusses firewall use.


Q:What should I enter for the Communities of interest configuration question? A:

This question can be confusing to answer for new users. The goal is to associate some loosely coupled labels to the data that the perfSONAR Toolkit disk will be making available to the larger world. Think of this step similar to assigning labels to photos or music. Some examples of valid answers are:

  • Internet2 - The data made available somehow connects the Internet2 backbone
  • LHC (CMS, ATLAS, etc.) - The disk is part of the LHC deployment structure
  • eVLBI - The disk is a part of the larger telescope community
  • MAX - A connector of member of the MAX gigapop
  • DOE-SC-LAB - US Department of Energy Office of Science Labs

Use as many community names as necessary to properly categorize the data from the installation.


Q:I do not think I am a member of a community, should I put anything? A:

Communities are not required, but they allow other individuals and organizations to find and use your services. It is a good practice to join as many as you may think are applicable.


Q:What are the hardware requirements for running the perfSONAR Toolkit? A:

See this page. Note that the perfSONAR Toolkit development team has not created hard minimum or maximum requirements - the perfSONAR Toolkit will function on almost any form of hardware. Performance considerations do favor meeting or exceeding the minimum guidelines.


Q:Does my machine have to meet the System Requirements? A:

There is nothing on the perfSONAR Toolkit that will prevent systems that do not meet the requirements from starting. Erroneous or inaccurate behavior is possible if the hardware cannot support the measurement tools.


Q:The colors on my Console Configuration do not match what I see on the web. Some are green already. A:

If you are upgrading from a previous version of the perfSONAR Toolkit, the colors may be green already because a particular aspect was configured previously. These do not need to be configured again.


Q:The Services On This Node screen shows many services in the non-running state when first started, what is wrong? A:

Many of the services will be in this state because they are missing some key configuration items (e.g. from the Administrative Info). After following the configuration steps check this screen again, most should be functional.


Q:I do not see my service in the Directory Of Services, where is it? A:

Much like DNS, the information that will populate the Lookup Service will take time to propagate. Please allow some time (e.g. a few hours) before your service will be fully visible.


Q:What is the purpose of BWCTL Limits/OWAMP Limits? A:

These allow you to limit the influence that outside users have on your system performance. For example, to prevent your machine/network from being saturated with BWCTL tests, limit the duration and maximum bandwidth available. These screens allow a fine grained way to protect resources.


Q:How can I set limits to prevent others from overusing BWCTL/OWAMP? A:

BWCTL and OWAMP have configurable dialog that allows the administrator to limit the resources consumed. To set the limits for BWCTL, consult this section. To set the limits for OWAMP, consult this section.


Q:How many NTP servers do I need, can I select them all? A:

It is recommended that 4 to 5 close and active servers be used. The Select Closest Servers button will help with this decision. Note that some servers may not be available due to routing (e.g. non-R&E networks vs R&E networks - a common problem for Internet2 and ESnet servers).


Q:Can I boot from a USB key instead of a CD? A:

The perfSONAR Toolkit Netinstall and Live USB images are capable of being installed on a USB stick instead of a CD. To write these images to the media, we recommend using dd:

 

sudo dd if=pS-Performance_Toolkit-3.4-FullInstall-x86_64.iso of=/dev/disk3

 


Q:A CVE announcement was made for the current perfSONAR Toolkit Kernel, what do I do? A:

The perfSONAR development effort subscribes to all major outlets that will announce kernel CVEs. In the event that a CVE is announce that directly effects operation of the pS Performance Toolit, the following steps will take place:

  • Announcements regarding the CVE will be posted to the perfsonar-user and perfsonar-announce mailing lists, more information on the mailing lists can be found here: https://lists.internet2.edu/sympa/info/perfsonar-user and https://lists.internet2.edu/sympa/info/perfsonar-announce
  • A timeline will be relayed regarding availability of new kernels.
  • The CentOS project will make the patched kernel available first, and it will be available through the yum repositories on the toolkit before the perfSONAR project is able to apply the web100 patches.
    • It is strongly suggested that perfSONAR Toolkit users utilizing the NetInstall option upgrade immediately. Note this will break support for the NDT and NPAD tools, but use of measurement tools is secondary to security. Run the following command:
      sudo yum update
      
    • perfSONAR Toolkit users that are rely on the LiveCD option, through April of 2015, can either continue using their instance, or shut it down depending on the severity of the CVE. Note that the project will announce if a new LiveCD will be available, depending on the severity of the CVE and impact to users.
  • The perfSONAR project will release a web100 patched version of the most recent kernel, and make these available through the yum repo. Announcements will be made again to the mailing lists.

Q:During the NetInstall, I see errors about a corrupt file being downloaded. What should I do? A:

During the NetInstall, you may see some errors about a corrupt file being downloaded along with buttons like Reboot and Retry. This happens if it fails to download an RPM from a mirror, which can happen for numerous reasons. Usually, that error can be solved by hitting Retry. You may have to hit that multiple times depending on which mirrors the install is trying to download the RPM from.


Q:How do I change the MTU for a device? A:

N.B. Changing the MTU on your perfSONAR host should only be done if the underlying network supports the chosen size. Please work with your local network staff before making this change on any host.

You can view the MTU of your network devices by executing the /sbin/ifconfig command.

To temporarily change the MTU for a device, you use the ifconfig command and specify the device and the new MTU. For example:

 

ifconfig eth0 mtu 9000 up

 

 

To make these changes permanent you need to modify the specific devices configuration file. These files are in /etc/sysconfig/network-scripts/ and have names like ifcfg-eth0 for the device eth0 and ifcfg-eth1 for eth1.

 

  1. For example you could add the line MTU="9000" for IPv4 or IPV6_MTU="9000" for IPv6 to /etc/sysconfig/network-scripts/ifcfg-eth0.
  2. After making the changes you need to restart the network services by running the command 'service network restart' as root.

Q:How do I change the SSL certificate used by the web server? A:

The toolkit by default generates a self-signed SSL certificate that it configures for use with the Apache web server. Some users may desire to replace this certificate with a certificate signed by a certificate authority (CA).

You may also need to replace the certificate due to a problem sometimes encountered with browsers not accepting the self-signed certificate. You may see an error like the following:

 

    HOST uses an invalid security certificate.
    The certificate is not trusted because it is self-signed.
    The certificate is only valid for localhost.localdomain
    (Error code: sec_error_untrusted_issuer)

 

 

You can find instructions for installing a new certificate in Apache here.

 


Q:I forgot to enable IPv6 in CentOS when I installed the toolkit. How do I enable it? A:

It is recommended that you always enable IPv6 during the CentOS installation portion of the toolkit setup. If you did not enable it, then you can do so with the following steps:

  1. Login to the toolkit as a user capable of running sudo
  2. Run sudo and enter your sudo password
  3. Open the file /etc/modprobe.conf in a text editor and remove the following lines:
    alias net-pf-10 off
    alias ipv6 off
    options ipv6 disable=1
    
  4. Restart the host
  5. You can now assign an IPv6 address.

Q:Why is the static IPv6 address I assigned during the net-install process not configured when my host starts-up? A:

When you perform the net-install of the toolkit, you will be prompted twice to enter networking information by CentOS. The first time is to define the networking to be used for downloading required packages. The second prompt is later in the installation and defines what will be configured on the host post-installation. It is a known CentOS behavior that IPv6 information entered at the first prompt is not automatically filled-in at the second prompt. This can be confusing because the IPv4 information does get automatically filled-in. If you do not manually enter the IPv6 information a second time, then your host will not have the address configured post-installation. You will have to manually assign the address if this happens.


Q:The drivers that are included for my "foo" network card old, what should I do? A:

The perfSONAR Toolkit includes drivers from the upstream CentOS distribution and a very limited number of specially built drivers for cards that are known to be widely used (e.g. myicom). If you notice that there is a newer driver available than what the perfSONAR Toolkit provides you can do the following:

  • Download the source or RPM from the manufactuers web site, build on your perfSONAR Toolkit
  • If you are running a netinstall version of the perfSONAR Toolkit, run modprobe after installation to load the new version
  • If you are running a LiveCD, you will need to modify the local init.d script to force modprobe on each reboot to load the proper version

Q:Where can I find more resources regarding timekeeping for VMWare Virtual Machines? A:

VMWare has two resources worth reading:


Q:What TCP congestion control algorithm is used by the perfSONAR Toolkit? A:

The perfSONAR toolkit uses the CentOS 6 default TCP congestion control algorithm, which is htcp. However, if you enable NDT or NPAD, these tools require the Reno congestion control algorithm, so the default congestion control algorithm for the host is changed to reno.


Q:Where are the relevant logs for perfSONAR services? A:

Please see http://docs.perfsonar.net/manage_logs.html for more information. 


Q:Is it possible to change the default port for tool X? A:

The measurement servers use 2 kinds of ports:

  • Contact ports, e.g. a well known location to contact the daemon to initiate a test
  • Test ports, e.g. negotiated ports to flow test or control traffic when a test is requested

Test ports are easily configured to run on a specific set of ports, and can be configured to be opened in a site firewall. The daemon is often able to negotiate these at run time. The contact port is well known, and because of that should never be changed to a different value. Doing so severely impacts the ability of the tool to interoperate on a global scale.

As an example, the OWAMP server listens on the registered port 861 (see http://tools.ietf.org/search/rfc4656 section 2). This is the standard port for the application, in the same way that port 80 is the standard port for an HTTP server. While one can run a web server on a port other than 80, it makes the web server less useful because it's not a standard config. The same is true for OWAMP. The OWAMP protocol is standardized, and has a well-known port - port 861 - associated with it. Running the OWAMP daemon on a non-standard port introduces significant interoperability challenges between deployments.

If you're going to run a measurement infrastructure inside your own organization, you are of course free to do whatever you want. If you want to integrate with the rest of the world, the measurement tools should be run on the standard port to ensure interoperability.


Q:Why doesn't the perfSONAR toolkit include the most recent version of the Myricom NIC driver? A:

We only support the default CentOS device drivers on the toolkit. However, several groups have reported 10% performance improvements using the latest driver from Myricom. If you want to update your toolkit host to the latest version of the Myricom driver, follow the instructions here


Q:My perfSONAR node is experiencing a kernel panic for a recently installed kernel, what should I do? A:

A kernel panic might produce errors similar to this during the boot process:

 

RAMDISK: incomplete write (3107 != 32768)
write error
Kernel panic - not syncing: VFS: Unable to mount root fs on unkown-block(0,0)
Pid: 1, comm: swapper Not tainted 2.6.32-431.11.2.e16.aufs.web100.x86_64 #1

 

 

When this occurs, it is suggested to reboot, and when the GRUB prompt appears, boot into a known working kernel (e.g. normally a previous iteration that should be present on the machine). If this kernel succeeds in booting, it is recommended to evaluate the available space on the disk as a first step:

 

 

[root@host ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       50G  2.5G   45G   6% /
tmpfs                 939M     0  939M   0% /dev/shm
/dev/sda1             485M  151M  309M  33% /boot
/dev/mapper/VolGroup-lv_home
                      176G  6.3G  161G   4% /home

 

 

If the /boot partition is listed as full, it will be necessary to uninstall some of the old kernel packages to make room. The /boot partition is a part of the underlying CentOS distribution, and cannot be changed. It's size will be a function of the amount of disk that was available at the time of installation. Regular maintenance will be required to ensure it has enough operating room. To remove old kernels, first get a listing of what is installed:

 

 

[root@host ~]$ rpm -qa | grep ^kernel.* | sort
kernel-2.6.32-431.11.2.el6.x86_64
kernel-2.6.32-431.3.1.el6.aufs.web100.x86_64
kernel-2.6.32-431.5.1.el6.aufs.web100.x86_64
kernel-2.6.32-431.el6.web100.aufs.x86_64
kernel-2.6.32-431.el6.x86_64
kernel-devel-2.6.32-431.11.2.el6.x86_64
kernel-devel-2.6.32-431.3.1.el6.aufs.web100.x86_64
kernel-devel-2.6.32-431.5.1.el6.aufs.web100.x86_64
kernel-devel-2.6.32-431.el6.web100.aufs.x86_64
kernel-devel-2.6.32-431.el6.x86_64
kernel-firmware-2.6.32-431.11.2.el6.noarch
kernel-headers-2.6.32-431.11.2.el6.x86_64

 

 

It is recommended to remote the kernel that didn't work, as well as versions that are 2 and 3 back from your current version. Yum can be used to remove the kernels:

 

 

[psadmin@nettest ~]$ sudo yum remove kernel-2.6.32-431.el6 kernel-2.6.32-431.el6.web100.aufs kernel-2.6.32-431.3.1.el6.aufs.web100 kernel-devel-2.6.32-431.el6 
kernel-devel-2.6.32-431.el6.web100.aufs kernel-devel-2.6.32-431.3.1.el6.aufs.web100
[sudo] password for psadmin: 
Loaded plugins: fastestmirror, security
Setting up Remove Process
Resolving Dependencies
--> Running transaction check
---> Package kernel.x86_64 0:2.6.32-431.el6 will be erased
---> Package kernel.x86_64 0:2.6.32-431.el6.web100.aufs will be erased
---> Package kernel.x86_64 0:2.6.32-431.3.1.el6.aufs.web100 will be erased
---> Package kernel-devel.x86_64 0:2.6.32-431.el6 will be erased
---> Package kernel-devel.x86_64 0:2.6.32-431.el6.web100.aufs will be erased
---> Package kernel-devel.x86_64 0:2.6.32-431.3.1.el6.aufs.web100 will be erased
--> Finished Dependency Resolution
Globus                                                                                                         | 1.9 kB     00:00     
Globus-Updates                                                                                                 | 1.9 kB     00:00     
Internet2                                                                                                      |  951 B     00:00     
base                                                                                                           | 3.7 kB     00:00     
epel/metalink                                                                                                  |  13 kB     00:00     
epel                                                                                                           | 4.4 kB     00:00     
epel/primary_db                                                                                                | 6.1 MB     00:33     
extras                                                                                                         | 3.4 kB     00:00     
perfsonar                                                                                                      |  951 B     00:00     
perfsonar-noarch                                                                                               |  951 B     00:00     
perfsonar-source                                                                                               |  951 B     00:00     
updates                                                                                                        | 3.4 kB     00:00     

Dependencies Resolved

======================================================================================================================================
 Package                  Arch               Version                                       Repository                            Size
======================================================================================================================================
Removing:
 kernel                   x86_64             2.6.32-431.el6                                @base                                121 M
 kernel                   x86_64             2.6.32-431.el6.web100.aufs                    @Internet2-web100_kernel             121 M
 kernel                   x86_64             2.6.32-431.3.1.el6.aufs.web100                @Internet2-web100_kernel             121 M
 kernel-devel             x86_64             2.6.32-431.el6                                @base                                 24 M
 kernel-devel             x86_64             2.6.32-431.el6.web100.aufs                    @Internet2-web100_kernel              24 M
 kernel-devel             x86_64             2.6.32-431.3.1.el6.aufs.web100                @Internet2-web100_kernel              24 M

Transaction Summary
======================================================================================================================================
Remove        6 Package(s)

Installed size: 435 M
Is this ok [y/N]: 

 

 

After removing the kernels, you can verify the space again, and run a yum update to reinstall the latest kernel. Reboot and see if things have returned to normal.

 


Q:Can I configure yum to exclude kernel packages from it's update procedure? A:

A detailed explanation of yum configuration can be found in the RHEL documents: https://access.redhat.com/site/solutions/10185. There are two ways to exclude kernel packages from a yum update, the first solution can be invoked on the command line:

 

yum update --exclude=kernel*

 

 

To make permanent changes, edit the /etc/yum.conf file and following entries to it:

 

 

[main]
cachedir=/var/cache/yum/$basearch/$releasever
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
exclude=kernel* samba*                           <==== 

 

 

NOTE: If there are multiple package to be excluded then separate them using a single space or comma.

 


Q:How can I configure yum to automatically update the system? A:

Note that as of version 3.4, this is enabled, see this more for more detail: http://docs.perfsonar.net/manage_update.html#automatic-updates


Q:When attempting to use BWCTL with an IPv6 address, the command fails: bwctl: Unable to connect to 2001:468:1:11::16:66:4823. What should I do? A:

Wrap the IPv6 address in square brackets and double quotes (to prevent the shell from trying to interpret the brackets). For example: bwctl -T iperf3 -t 10 -i 1-c "[2001:468:1:11::16:66]:4823"


Q:My host was impacted by Shellshock/Heartbleed/etc., what should I do? A:

Please check the vulnerability archive for the specific attack you interested in, information on mitigation for perfSONAR nodes will be posted there, or in the mailing list archives.


Q:I would like to install and patch perfsonar boxes behind a web proxy, is it possible to specify this on the grub command line? A:

Anaconda documentation indicates this grub parameter should do the trick:

 

proxy=[protocol://][username[:password]@]host[:port]

 

 

Note that during a fresh network installation, Anaconda does install updates immediately (e.g. it wouldn't use a version of an RPM from when the ISO was built), and doesn't actually run any network services before the reboot. For those that are rebuilding after shellshock - using a proxy should not be necessary.

 


Q:How can I add custom rules to IPTables? A:

The rules added by the perfSONAR toolkit are contained within a special perfSONAR chain of iptables (and ip6tables). You may add rules to the other chains, such as the INPUT chain, just as you would any other firewall rule. It is NOT recommended you change the perfSONAR chain as any changes you make could be overwritten by a software update.

An example that will block access to port 8000 for all traffic is shown in the steps below (note it is not recommended you run this command as it will block access to the NPAD tool):

  1. Login to your host via SSH or the terminal as a root user
  2. View your current iptables rules with the following /sbin/service iptables status command:
    # /sbin/service iptables status
    Table: filter
    Chain INPUT (policy ACCEPT)
    num  target     prot opt source               destination
    1    fail2ban-SSH  tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22
    2    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
    3    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED
    4    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:22
    5    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           state NEW udp spt:547 dpt:546
    6    perfSONAR  all  --  0.0.0.0/0            0.0.0.0/0
    7    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    
    Chain FORWARD (policy ACCEPT)
    num  target     prot opt source               destination
    1    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    
    Chain OUTPUT (policy ACCEPT)
    num  target     prot opt source               destination
    1    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
    
    Chain fail2ban-SSH (1 references)
    num  target     prot opt source               destination
    1    RETURN     all  --  0.0.0.0/0            0.0.0.0/0
    
    Chain perfSONAR (1 references)
    num  target     prot opt source               destination
    1    ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0           icmp type 255
    2    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:80 state NEW,ESTABLISHED
    3    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:443 state NEW,ESTABLISHED
    4    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpt:123 udp
    5    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:33434:33634
    6    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:8000
    7    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:8001:8020
    8    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:7123
    9    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:3001:3003
    10   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:861
    11   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:8760:9960
    12   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:4823
    13   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:6001:6200
    14   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:6001:6200
    15   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:5000:5600
    16   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:5000:5900
    17   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:10101:10300
    18   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:10101:10300
    19   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:7
    20   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpt:7
    21   RETURN     all  --  0.0.0.0/0            0.0.0.0/0
    
  3. Run the iptables command below to block port 8000. Note that we are adding this rule to the INPUT chain and leaving the perfSONAR chain untouched. We are also adding it at position 5 which ensures it is processed before the perfSONAR rules:
    iptables -I INPUT 5 -p tcp --dport 8000 -j REJECT
    
  4. Now save your configuration change:
    /sbin/service iptables save
    
  5. We can now see our new rule was added:
    # /sbin/service iptables status
    Table: filter
    Chain INPUT (policy ACCEPT)
    num  target     prot opt source               destination
    1    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
    2    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED
    3    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:22
    4    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           state NEW udp spt:547 dpt:546
    5    REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:8000 reject-with icmp-port-unreachable
    6    perfSONAR  all  --  0.0.0.0/0            0.0.0.0/0
    7    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    
    Chain FORWARD (policy ACCEPT)
    num  target     prot opt source               destination
    1    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    
    Chain OUTPUT (policy ACCEPT)
    num  target     prot opt source               destination
    1    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
    
    Chain perfSONAR (1 references)
    num  target     prot opt source               destination
    1    ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0           icmp type 255
    2    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:80 state NEW,ESTABLISHED
    3    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:443 state NEW,ESTABLISHED
    4    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpt:123 udp
    5    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:33434:33634
    6    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:8000
    7    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:8001:8020
    8    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:7123
    9    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:3001:3003
    10   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:861
    11   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:8760:9960
    12   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:4823
    13   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:6001:6200
    14   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:6001:6200
    15   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:5000:5600
    16   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:5000:5900
    17   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:10101:10300
    18   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:10101:10300
    19   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:7
    20   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpt:7
    21   RETURN     all  --  0.0.0.0/0            0.0.0.0/0
    

Note: Prior to version 3.4, custom firewall rules were not handled properly. As such you may find that when upgrading from versions older than 3.4 that you will lose any custom rules. Following the steps above should ensure your rules are maintained for updates beyond 3.4 in the foreseeable future.


Q:When installing the perfSONAR Toolkit 3.4 netinstall image with static IP, it does not keep this when the system first boots. The BOOTPROTO option remains set to "dhcp" in the /etc/sysconfig/network-scripts/ifcfg-eth0 after the installation. Is this a bug? A:

See the documentation in http://docs.perfsonar.net/install_centos_netinstall.html#step-by-step-guide for more information, this is a known problem with anaconda (the RHEL/CentOS installer). After installation you may need to re-enter the network settings. These settings are only expected for use during the installation process and may be discarded once the process completes.


Q:How can I force testing over IPv4 or IPv6 in the mesh configuration?
A:

 

There is both a ipv4_only and ipv6_only option you can set in the test parameters of a mesh config. Setting them both at the same time gives an error.


Q:When trying a clean install with perfSONAR Toolkit version 3.4, the system doesn't recognize any disks/doesn't see my RAID controller. Things work with other systems (CentOS 6, Scientific Linux 6). What should I do?
A:

 

If you have started with a different OS, you can attempt to install the necessary packages manually.

Once it’s installed, reboot the machine.

Alternatively, if you are upgrading from a LiveCD, use 'dban' to wipe out the data partition placed there from previous use.



Q:Which repository addresses will be used to get updates to the perfSONAR software?
A:

By default, the Internet2 repo points at a mirror list hosted by software.internet2.edu. In this mirror list is linux.mirrors.es.net. In order to use the default configuration you will need to allow access to software.internet2.edu so you can grab the mirrorlist. After that, the packages can be downloaded from any of the sites listed which includes linux.mirrors.es.net, software.internet2.edu, and a few other places. You should be able to get away with just opening up access to software.internet2.edu (so it can get the mirror list) and linux.mirrors.es.net (so you can get the packages). Those should be the only places you need as linux.mirrors.es.net also has a mirror for all the base CentOS packages.

The initial version of 3.4 contained a pointer to another repo only hosted on datstax.com, but this was dropped for release 3.4.1 and later. If you do encounter this repo you can safely run a 'yum remove datastax-repo' as the packages it provides have been copied to the Internet2 repo.



Q:I see that the latency/loss tests in 3.4 use bwping. Does this mean that we can run both throughput and latency/loss tests on the same interface without interference due to the way bwctl scheduling works?
A:

Ping tests can be scheduled on the same host as throughput tests, but owamp tests can’t currently (they use powstream instead of bwping).



Q:Where can I find documentation on interacting with perfSONAR via a custom client?
A:

Note: the release of perfSONAR 3.5.1 changed the location of many binaries and configuration files. See http://docs.perfsonar.net/config_files.html for more info.

There is a web page which outlines the REST API: http://software.es.net/esmond/perfsonar_client_rest.html, the perl API: http://software.es.net/esmond/perfsonar_client_perl.html and the python API: http://software.es.net/esmond/perfsonar_client_python.html.

Setting up the environment on a perfSONAR Toolkit host can be done with these steps:

cd /opt/esmond
source /opt/rh/python27/enable
/opt/rh/python27/root/usr/bin/virtualenv --prompt="(esmond)" .
. bin/activate
pip install esmond-client


Q:When setting up a dual homed host, how can one get individual tests to use one interface or another?
A:

In the GUI when configuring the regular tests, you configure the interface that the tests will be performed using.

The easiest way to force bwctld and owampd to listen only on their respective interfaces is to set the ‘srcnode’ variable in /etc/bwctld/bwctld.conf and /etc/owampd/owampd.conf, and then restart bwctld and owampd. Look at owampd.conf.default and bwctld.conf.default in those same directories for an example.

http://docs.perfsonar.net/manage_dual_xface.html



Q:Which licenses do perfSONAR products use?
A:

BWCTL, OWAMP, and perfSONAR components are licensed under the Apache 2.0 license. NDT has a slightly different license: https://ndt.googlecode.com/svn/trunk/copyright.html based on an open source policy. Other products on the perfSONAR toolkit (NPAD, Web100 + other components such as Dojo) may have other restrictions worth checking into.



Q:When attempting to publish data to esmond, a "301 MOVED PERMANENTLY" is returned. What should I do?
A:

If using the examples from:

When querying the local esmond instance, the following could be returned:

HTTP/1.1 301 MOVED PERMANENTLY
Date: Sat, 18 Apr 2015 01:05:19 GMT
Server: Apache/2.2.15 (CentOS)
Location: http://localhost/esmond/perfsonar/archive/
Content-Length: 0
Connection: close
Content-Type: text/html; charset=utf-8

Check the script and make sure the URL is:

http://localhost/esmond/perfsonar/archive/

and not

http://localhost/esmond/perfsonar/archive


Q:Is it possible to use non-intel SFP+ optics in the Intel X520-SR2 NIC?
A:

The ixgbe driver has an option to allow alternative optics:

allow_unsupported_sfp=1

This can be tested using the fillow commands:

sudo modprobe -r ixgbe
sudo modprobe ixgbe allow_unsupported_sfp=0


Q:How can I clean up the data in my esmond instance?
A:

Information on this can be found here: http://docs.perfsonar.net/multi_ma_backups.html#multi-ma-backups-delete.



Q:How can I backup the data in my esmond instance?
A:

Information on this can be found here: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html.

Additionally, to back up perfSONAR data from an MA see guidance on this page: http://docs.perfsonar.net/multi_ma_backups.html.  Note that some steps may destroy data. 

For PostgreSQL: 

"This will delete any existing data and replace it with the backup” 

For Cassandra and nodetool, It also overwrites existing data (via https://specs.openstack.org/openstack/trove-specs/specs/liberty/cassandra-backup-restore.html):

"A snapshot can be restored by moving all *.db files from a snapshot directory to the respective keyspace overwriting any existing files.”



Q:How do I backup a perfSONAR instance or migrate the configuration and data to a new machine?
A:

Note: the release of perfSONAR 3.5.1 changed the location of many binaries and configuration files. See http://docs.perfsonar.net/config_files.html for more info.

To back up perfSONAR configurations and logs:

  • /opt/perfsonar_ps/ls_registration_daemon/etc/*
  • /opt/perfsonar_ps/mesh_config/etc/*
  • /opt/perfsonar_ps/regular_testing/etc/*
  • /opt/perfsonar_ps/toolkit/etc/*
  • /opt/perfsonar_ps/traceroute_ma/etc/*
  • /opt/perfsonar_ps/serviceTest/etc/*
  • /opt/perfsonar_ps/snmp_ma/etc/*
  • /opt/esmond/*
  • /etc/httpd/conf.d/apache-toolkit_web_gui.conf
  • /etc/owampd/*
  • /etc/bwctld/*
  • /etc/cassandra/*
  • /var/lib/pgsql/*
  • /var/lib/cassandra/*
  • /var/lib/perfsonar/*
  • /var/log/perfsonar/*

To back up perfSONAR data, see guidance on this page: http://docs.perfsonar.net/multi_ma_backups.html.



Q:How can I tune a Dell server for a low latency environment?
A:

Dell offers a guide on tuning: http://i.dell.com/sites/content/shared-content/data-sheets/en/Documents/configuring-low-latency-environments-on-dell-poweredge-12g-servers.pdf.



Q:How do I disable global registration?
A:

The following commands will stop, and disable, this service:

/sbin/service ls_registration_daemon stop
chkconfig ls_registration_daemon off

Q:I have a measurement archive machine with esmond running, and there is a separate disk partition mounted on the machine where I want to store all the incoming measurement data from measurement points. What is the proper way to change the default directory location for storing the measurement archive data?
A:

Change the directory where cassandra and postgres store data. This is controlled through the respective tools configuration files and not esmond directly.

  • For cassandra, in /etc/cassandra/conf/cassandra.yaml change the commitlog_directory, data_file_directories and saved_caches_directory to the new locations you desire. Restart cassandra: /sbin/service cassandra restart
  • For postgres, in /var/lib/pgsql/data/postgresql.conf change the data_directory to the new location. Restart postgres: /etc/init.d postgresql restart. You will need to rebuild the esmond tables after this change is made. To do so, follow the instructions here: http://software.es.net/esmond/rpm_install.html#configuration

An alternative way to do what you want is to stop both postgres and cassandra, move /var/lib/cassandra and /var/lib/pgsql to the new partition and then create symlinks to the new location in /var/lib/cassandra and /var/lib/pgsql. That saves the need to rebuild postgres and preserves any existing data.


Q:How can I nuke all of the data in esmond, and start from scratch?
A:

Note: the release of perfSONAR 3.5.1 changed the location of many binaries and configuration files. See http://docs.perfsonar.net/config_files.html for more info.

Note, many of these commands should be done as root, and will destroy existing data and metadata collections. An additional step to load the esmond key into a regular testing file may be required (read output of tools to know for sure):

/etc/init.d/postgresql stop
rm -rf /var/lib/pgsql/data/*
/sbin/service postgresql initdb
/sbin/service postgresql start

wait about 20 seconds

sudo -u postgres psql -c "CREATE USER esmond WITH PASSWORD '7hc4m1'"
sudo -u postgres psql -c "CREATE DATABASE esmond"
sudo -u postgres psql -c "GRANT ALL ON DATABASE esmond to esmond"
cp -f /opt/perfsonar_ps/toolkit/etc/default_service_configs/pg_hba.conf /var/lib/pgsql/data/pg_hba.conf
/sbin/service postgresql restart
sed -i "s/sql_db_name = .*/sql_db_name = esmond/g" /opt/esmond/esmond.conf
sed -i "s/sql_db_user = .*/sql_db_user = esmond/g" /opt/esmond/esmond.conf
sed -i "s/sql_db_password = .*/sql_db_password = 7hc4m1/g" /opt/esmond/esmond.conf
/opt/perfsonar_ps/toolkit/scripts/system_environment/configure_esmond 

Q:I am running a small node, and seeing a lot of IO. What is going on?
A:

Some users report abnormalities on their small nodes related to I/O activity (e.g. iostat reports long w_await times - sometimes measured in multiple seconds). These coincide with intervals of testing, in particular related to OWAMP.

Deeper investigation found that there is too much I/O going on: syslogd and systemd-journald processing syslog messages from "owampd, bwctld, and powstream” in “/var/log/messages”, sometimes up to 30-40 syslog messages per second depending on the testing configuration of a host. Given that small nodes are based on flash memory, changes should be made to ensure a more balanced approach to logging:

  • Do journaling on memory by editing “/etc/systemd/journald.conf”.
    • Make option "Storage=volatile” instead of the default “Storage=auto”. Make sure to limit the maximum usage of memory for journaling. You can do this by fiddling with “RuntimeKeepFree” and “RuntimeMaxUse” options.
    • Don’t *restart* the journaling service (i.e., don’t do “systemctl restart systemd-journald”). Do an *OS reboot* instead.
  • Turn down owampd syslog level.
    • For perfSONAR before 3.5rc2, the only way is to comment out the “verbose” option in “/etc/owampd/owampd.conf” (so that it defaults to non-verbose. For perfSONAR 3.5rc2, I believe you can set the logging level in the configuration.
    • Restart owampd service.

Q:I want to operate a "Dynamic" Maddash Mesh with hosts from a lookup service. Where do I start? A:

There is information on this method of mesh configuration available at the following link:

The server and agent each have needs regarding the definition of tests, information on each can be found via these links:


Q:How to get rid of "There isn't a perfSONAR sudo user defined" message? A:

Note: the release of perfSONAR 3.5.1 changed the location of many binaries and configuration files. See http://docs.perfsonar.net/config_files.html for more info.

Either add a non-root user to the pssudo group or remove the line /opt/perfsonar_ps/toolkit/scripts/add_pssudo_user —auto from /root/.bashrc. Note that future updates could revert the /root/.bashrc file.


Q:After restarting my updated 3.5. hosts the regular testing service is shown as not running? A:

Note: the release of perfSONAR 3.5.1 changed the location of many binaries and configuration files. See http://docs.perfsonar.net/config_files.html for more info.

In the ‘regular_testing.log’ file the following error may appear:

2015/09/29 09:01:36 (8074) ERROR> ConfigFile.pm:101 perfSONAR_PS::RegularTesting::Utils::ConfigFile::parse_file - Line 309 malformed
2015/09/29 09:01:36 (8074) ERROR> daemon:84 main:: - Problem parsing configuration file: Line 309 malformed

If you have an owampd install that was initially done in early 2013 or prior and you never set the receive_ports in your regular_testing.conf. You just need to clean-up the offending line in /opt/perfsonar_ps/regular_testing/etc/regular_testing.conf to correct the issue. The root cause should be fixed in the RPM that went out on Sept 29th 2015 (version 3.5.0.1).


Q:How does version numbering work for the perfSONAR project? A:

See https://github.com/perfsonar/project/wiki/Versioning if you are interested in learning about our version numbering scheme.


Q:Is it possible to run multiple simultaneous tests for NDT? A:

Network measurement is an inherently resource intensive operation for the host that must manage a test. In particular a CPU, the memory system, and network interface must be working at peak efficiency to guarantee an accurate measurement for many metrics (e.g. network throughout).

The perfSONAR project would like to caution against the practice of running multiple simultaneous tests for any tool for these reasons - there is no guarantee that resources can be split evenly, and produce accurate network results.

It is possible to configure the NDT server to allow for multiple tests, note this is not a default behavior and not encouraged given the risks of false data that may be produced. To enable this behavior, add the -m option to your ndt server command. Assuming installation via the NDT RPM, you can update the existing WEB100SRV_OPTIONS variable to include -m in the file /etc/sysconfig/ndt. Run /etc/init.d/ndt restart after the change.


Q: I have some machines with the old Level 2 install option. When I update to version 3.5 what will happen? Will they become equivlent to one of the 3.5 install bundles? A:

The packages installed by the Level2 installation (bwctl, owamp, regular testing, etc) will all update if a “yum update” is performed. The Level2 install is essentially the new “TestPoint” bundle. You may install perl-perfSONAR-TestPoint bundle if desired, but there is no requirement to do so. Both the “Level2” and “TestPoint” are really just meta packages with dependencies on other packages.


Q:After upgrading to 3.5 my maddash instance won't start (e.g. HTTP fails)? A:

If after running yum update you see this error:

[user@host ~]$ sudo /etc/init.d/httpd start
Starting httpd: Syntax error on line 1 of /etc/httpd/conf.d/apache-esmond.conf:
Invalid command 'WSGIScriptAlias', perhaps misspelled or defined by a module not included in the server configuration
                                                         [FAILED]

There could be a problem with the version of one of the WSGI libraries that was pulled in. You can verify it as such:

[user@host ~]$ yum list installed | grep -i wsgi
python27-mod_wsgi.x86_64             3.4-12.el6.centos.alt            @scl

Due to the specific version of python that maddash/perfSONAR requires, the workaround is to uninstall the version above, and use the version found in the perfSONAR repository:

[user@host ~]$  sudo yum erase "python27-mod_wsgi*"
[user@host ~]$  sudo yum --disablerepo="*" --enablerepo="Internet2" --enablerepo="base" install esmond

Then restart cassandra, and start httpd

[user@host ~]$  sudo /sbin/service cassandra stop
Shutdown Cassandra: OK
[user@host ~]$  sudo /sbin/service cassandra start
Starting Cassandra: OK
[user@host ~]$  sudo /sbin/service httpd start
Starting httpd: [ OK ]

Q:What is PTP? A:

PTP is the Precision Time Protocol, also known as IEEE 1588, a more-accurate successor to the Network Time Protocol which as been used for many years to discipline the clocks in general-purpose computers.


Q:How accurate is PTP? A:

Under ideal conditions, PTP can discipline a clock to within a few microseconds of UTC. Compare this with NTP, which typically has accuracy of about a millisecond when used with clocks on the Internet and 100 microseconds or less when using a stratum-1 clock in a LAN environment.


Q:What is required to use PTP in my network? A:

Unlike NTP, which provides satisfactory operation using software clients and a pool of servers usually on the Internet, running PTP requires specialized equipment:

  • Clocks. For production-grade service, PTP requires a minimum of two grandmaster clocks. These are dedicated hardware appliances that use the Global Positioning System to recover accurate time and a high-precision oscillator for holdover during periods when GPS is not available. At this writing, base model clocks cost about US$2,500 each.
  • Network Infrastructure. PTP requires that all network elements between the grandmaster and slaves be capable of functioning as a boundary clock. This is a feature typically found on high-end routers and switches designed for use in low-latency applications.
  • Network Interface Cards. Interfaces in the slave system require hardware support for the timestamping that makes PTP work accurately. While software-only PTP clients exist, they may suffer inaccuracies induced by the vagaries of running under a general-purpose operating system and provide inaccurate results when testing latency in a LAN environment.

Q:Does perfSONAR support PTP? A:

Not at this time. The prohibitive cost of deploying PTP makes it unlikely to be used widely enough to merit adding support. The current perfSONAR code contains assumptions that the clock is disciplined by NTP and would need to be modified for other protocols.


Q:I am seeing a "Can't locate object method 'ssl_opts' via package 'LWP::UserAgent'" error when trying to use a Central Measurement Archive. A:

Note: the release of perfSONAR 3.5.1 changed the location of many binaries and configuration files. See http://docs.perfsonar.net/config_files.html for more info.

In the toolkit regular_testing.conf I added the following block:

<measurement_archive>
database https://<host_fqdn>/esmond/perfsonar/archive/
password xxx
type esmond/traceroute
username perfsonar
ca_certificate_path /etc/ssl/certs
</measurement_archive>

But no measurement is being received in the MA and the regular_testing.log is full with messages similar to the following:

2015/10/15 10:55:32 (17628) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__ANON__- Problem handling test results: Can't locate object method "ssl_opts" via package "LWP::UserAgent" at /opt/perfsonar_ps/regular_testing/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondBase.pm line 249.

This is due to a pretty old version of perl-libwww-perl is included with CentOS 6. If you remove “ca_certificate_path” from the configuration file things will work.


Q:I have a central MA for my perfSONAR data, what happens if the central MA goes down for a while, or the network is unavailable between the beacons and the MA? Are the measurements lost, or are they buffered to be delivered later? Does this have any effect on the timing of subsequent measurements? A:

The beacons observe a "push" architecture (i.e. the perfsonar node which made a measurement connects to the central MA to store the results) to the central MA.

If the host MA goes down, the missed measurements are stored on local disk under /var/lib/regular_testing and the regular_testing daemon tries to register them when the MA comes back. If the MA is down too long though (where “too long” varies on the system and number of tests being run) the backlog of tests can get too big and regular-testing can’t catch-up. On a Toolkit installation, we actually clean out the backlog of tests for this reason as we have frequently seen hosts get in a state where the backlog of tests is too big.

See also the documentation for this service: http://docs.perfsonar.net/multi_overview.html


Q:What if there are multiple MAs used for a central MA architecutre, but only one is down or unreachable, what will happen? A:

Each MA will be treated the same as if they were the only MA in the file. In other words, the one that is up will get the data and a backlog will be kept on disk for the one that is down. When the downed MA comes back up the daemon will try to register the old data points.


Q:If you have made manual changes to regular_testing.conf to point to a different MA (or multiple MAs), and you subsequently change test configurations through the GUI, does this leave your MA customizations alone? A:

Yes. The GUI leaves all measurement_archive blocks alone.


Q:How much memory is needed for a host running an MA? A:

Cassandra will try to use 4G of memory by default (if its available on the system). It is possible to tweak the memory settings if you want it to use less. Read more here: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html. Tuning this makes it possible to run an MA on a host with less memory.


Q:I am trying to run perfSONAR on low-cost hardware (e.g. raspberry pi, etc.). Where should I start? A:

There are numerous hardware platforms that have emerged that are an attractive option for use in network performance measurement. The perfSONAR collaboration does not recommend, nor support, the use of perfSONAR on low-end, ARM-based hardware such as the Raspberry PI. It has been shown that it is difficult to distinguish network issues, from host issues, on these devices. In particular, we do not recommend these devices for testing throughput with the BWCTL, NDT, or other related tools. Use of latency based tools (Ping, OWAMP) is possible provided that an accurate clock source is available.

For more information, see this web page: http://www.perfsonar.net/deploy/hardware-selection/low-cost-hardware/


Q:Why do I get such weird results when I test from a 10G connected host to 1G connected host? A:

The network interface card on a host is designed to send at line rate as long as there is data being made available via the kernel. This means that a 1G card will send at either 1G, or 0G (it does not pace itself at a rate in between). In situations where a larger capacity machine is testing to a smaller capacity machine, it is inevitable that the 10Gbps stream of data will need to be buffered somewhere in the path: either one of the last hop switches or the device itself. If there is not enough buffering available, the data will be dropped and TCP will be forced to reduce sending rates.

The 'choppy' behavior can be seen on this page: http://www.perfsonar.net/deploy/hardware-selection/hardware-advice/, and is challenging to mitigate unless there is a significant amount of buffer available. For these reasons, it is recommended that when designing testing scenarios, only 'like' to 'like' capacities be explored.


Q:My perfSONAR results show consistent line-rate performance, but a researcher at my site is reporting really poor performance, what gives? A:

perfSONAR is designed to give a "best case scenario" test result for end to end testing:

  • perfSONAR is typically installed on well-provisioned server-class hardware that contains adequate CPU, memory, and NIC support
  • By default, tuning are applied to the TCP stack: https://fasterdata.es.net/host-tuning/linux/
  • BWCTL invokes "memory to memory" test tools, e.g. the host does not pull data off of, or write to, storage during testing
  • perfSONAR typically runs short single streamed TCP tests

The user of a network may not have a machine that is as tuned as a perfSONAR node, could be using an application that is incorrect for the job of data movement, and may have a bottleneck due to storage. Consider all of these factors when working with them to identify performance issues. It is often the case that the 'network' may be working fine, but the host and software infrastructure need additional attention.


Q:Is there a way to visualize GridFTP results in Maddash? A:

Please see documentation on the maddash web site.


Q:How can I configure my toolkit to allow a private IP? A:

The file resides at:

/usr/lib/perfsonar/web/root/gui/services/etc/web_admin.conf

The config option is allow_internal_addresses. Set it to 1.


Q:How do I change the default BWCTL tool? A:

The file resides at: /etc/perfsonar/regulartesting.conf


Q:Cassandra seems to be using 100% CPU on my system and I can't figure out why. A:

Cassandra features a nodetool utility that can be used to see what is going on. For instance, if you notice that after you restart cassandra it spends lots of time 'compacting', you can view status as such:

[crussell@ps-bryant-bw ~]$ nodetool compactionstats
pending tasks: 1
          compaction type        keyspace           table       completed           total      unit  progress
               Compaction          esmondrate_aggregations       140157665      2632220068     bytes     5.32%
Active compaction remaining time :        n/a

If after repeated runnings the total progress does not move, cassandra may be having trouble compacting. There may be the following logs in the file:

 INFO [CompactionExecutor:8] 2016-04-02 12:32:18,205 CompactionController.java (line 192) Compacting large row esmond/rate_aggregations:ps:packet_loss_rate:b30f54e8df9549ceb8292278b782f05b:2015 (121215124 bytes) incrementally
 INFO [CompactionExecutor:8] 2016-04-03 04:50:45,168 CompactionController.java (line 192) Compacting large row esmond/rate_aggregations:ps:time_error_estimates:b30f54e8df9549ceb8292278b782f05b:2015 (123923983 bytes) incrementally
 INFO [CompactionExecutor:8] 2016-04-03 22:06:38,417 CompactionController.java (line 192) Compacting large row esmond/rate_aggregations:ps:packet_loss_rate:76b654c4279241f19898dcdb8cacdfb2:2015 (120871402 bytes) incrementally

It may be the case that an exceptionally large data row can't be compacted. When the size of the row exceeded 64Mb, Cassandra should try to swap to disk, and that may fail.

It is possible to edit /etc/cassandra/conf/cassandra.yaml and change the "in_memory_compaction_limit_in_mb" value up from 64 to 256 and restart cassandra. Once the row or rows in question are compacted, you can change it back and restart again.


Q:How do I setup a perfSONAR node to have two interfaces on the same subnet? A:

This can be accomplished by setting the following items in sysctl:

 

net.ipv4.conf.default.arp_filter = 2
net.ipv4.conf.all.arp_filter = 2

More information available here:

http://z-issue.com/wp/linux-rhel-6-centos-6-two-nics-in-the-same-subnet-but-secondary-doesnt-ping/


Q:Is there a way to re-image perfSONAR resources remotely? A:

If the intention is to use the perfSONAR ISO as the base, the installer just needs view the installation medium like a CD or USB would be mounted.

As for specifics of a mechanism to remotely install, consult the documentation of your server. For instance some services support "virtual media" if they contain a DRACs with the enterprise feature set enabled.

For a more general solution, and going on the assumption that a remote console access is available to a servers, consider a package called iPXE. iPXE can do is attach an ISO via iSCSI or HTTP, so all that is needed is to put up a server the remote machines can reach. The commands to do it are:

    set net0/ip 10.9.8.7
    set net0/netmask 255.255.255.0
    set net0/gateway 10.9.8.1
    set dns 10.9.8.2
    sanboot http://server.kinber.org/toolkit.iso

If there is DHCP available, the four set commands can be removed and a single dhcp command put in their place. Any HTTP server used to serve the ISO must support range requests. The standard Apache on most systems will.

Note that iPXE needs to be on a bootable medium, and it’s operationally better when separate from the disk in the machine. This means that remote locations will need to have something like a USB stick installed. Once in place, set the BIOS to ignore it and boot it explicitly when needed. Since it’s a regular USB device, it can be updated remotely while the main OS is running.