perfSONAR

Version 3.4 FAQ

Mailing List Archives

Frequently Asked Questions

  1. How do I join the perfSONAR Collaboration?
  2. Where can I ask questions or report bugs?
  3. How do I use the NPAD command line client (diag-client)?
  4. How do I use BWCTL?
  5. How do I use the NDT command line client (web100clt)?
  6. How do I use OWAMP?
  7. Can I Use a Firewall?
  8. What should I enter for the Communities of interest configuration question?
  9. I do not think I am a member of a community, should I put anything?
  10. What are the hardware requirements for running the perfSONAR Toolkit?
  11. Does my machine have to meet the System Requirements?
  12. The colors on my Console Configuration do not match what I see on the web. Some are green already.
  13. The "Services On This Node" screen shows many services in the non-running state when first started, what is wrong?
  14. I do not see my service in the Global Set Of Services, where is it?
  15. When looking at the data display for tests I have configured, I do not see results, but I filled out information in the Scheduled Testing area. Where is my data?
  16. What is the purpose of BWCTL Limits/OWAMP Limits?
  17. How can I set limits to prevent others from overusing BWCTL/OWAMP?
  18. How many NTP servers do I need, can I select them all?
  19. Service X is listed as Not Runnning on the web interface. How can I restart?
  20. NTPD has exited/is not running on my machine, why did this happen and how can I fix it?
  21. BWCTL seems to exit immediately after starting/restarting. Why won't it stay in a running state?
  22. Can I boot from a USB key instead of a CD?
  23. How can I backup historical OWAMP/BWCTL data?
  24. A CVE announcement was made for the current perfSONAR Toolkit Kernel, what do I do?
  25. During the NetInstall, I see errors about a corrupt file being downloaded. What should I do?
  26. How do I change the MTU for a device?
  27. How do I change the SSL certificate used by the web server?
  28. I forgot to enable IPv6 in CentOS when I installed the toolkit. How do I enable it?
  29. Why is the static IPv6 address I assigned during the net-install process not configured when my host starts-up?
  30. The drivers that are included for my "foo" network card old, what should I do?
  31. I setup a server from the latest ISO, but when I do a NDT test to the server, nothing happens. Is there a problem with the NDT tool?
  32. What can I do in the case of a catastrophic hardware failure?
  33. Where can I find more resources regarding timekeeping for VMWare Virtual Machines?
  34. What TCP congestion control algorithm is used by the perfSONAR Toolkit?
  35. How do I migrate an existing LiveCD installation to a NetInstall installation?
  36. My perfSONAR Toolkit instance keeps running out of space allocated via the default partitions, particularly /home is assigned more room than root (/). What can we do to correct this?
  37. Where are the relevant logs for perfSONAR services?
  38. Is it possible to change the default port for tool X?
  39. Why doesn't the perfSONAR toolkit include the most recent version of the Myricom NIC driver?
  40. My perfSONAR node is experiencing a kernel panic for a recently installed kernel, what should I do?
  41. Can I configure yum to exclude kernel packages from it's update procedure?
  42. How can I configure yum to automatically update the system?
  43. How can I convert NDT/NPAD into signed JAVA applets?
  44. When attempting to use BWCTL with an IPv6 address, the command fails: bwctl: Unable to connect to 2001:468:1:11::16:66:4823. What should I do?
  45. OWPING/POWSTREAM tests are failing to certain hosts with a "Server denied test" and "Session Failed!" message. How can I debug this?
  46. How can I edit the thresholds for maddash?
  47. My host was impacted by Shellshock/Heartbleed/etc., what should I do?
  48. Are there instructions on how to grab all the necessary files for perfsonar and put them on a new install?
  49. I would like to install and patch perfsonar boxes behind a web proxy, is it possible to specify this on the grub command line?
  50. How can I add custom rules to IPTables?
  51. When installing the perfSONAR Toolkit 3.4 netinstall image with static IP, it does not keep this when the system first boots. The BOOTPROTO option remains set to "dhcp" in the /etc/sysconfig/network-scripts/ifcfg-eth0 after the installation. Is this a bug?
  52. I have yum updated from 3.3.x to 3.4 and rebooted. It is asking for a user to configure when log on again ("Once you set passwords, you can login to the web interface and finish configuration."). Is this expected? Why is it not respecting the configuration set in 3.3.2?
  53. There is a new repository in version 3.4 for Datastax, what is this for? Can it be reached over R&E networking connections?
  54. I don't see a way to control SSH via the 'Enabled Services' screen on version 3.4 of my toolkit, where did it go?
  55. When accessing graphs on my 3.4 host, I see this error: Error retrieving test data: undefined
  56. How can I write the fullinstall or netinstall ISO to a USB key for installation?
  57. I've just updated from 3.3.2 to 3.4. However, but the host is indicating there is no address configured. What should I do?
  58. After upgrading from 3.3 to 3.4, I see some of the data for my Throughput/Latency tests, but not all of it is shown. Specifically the three columns of "Throughput Latency(ms) Loss(%)" are missing, only the Source and Destination nodes are displayed. If I click on a line where tests are running, a graph with the results is displayed. Only the main page seems to have lost the three columns. Is the data gone?
  59. After upgrading to 3.4 from a 3.4 RC, the web interface is not running and the following error is seen:
    /etc/init.d/httpd start
    Starting httpd: Syntax error on line 1 of /etc/httpd/conf.d/apache-esmond.conf:
    Invalid command 'WSGIScriptAlias', perhaps misspelled or defined by a module not included in the server configuration
                                                               [FAILED]
    
    What should we do?
  60. We are seeing the following errror retrieving test data, this is from the django log:
    File
    "/opt/esmond/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg
    2/base.py", line 182, in _cursor
       self.connection = Database.connect(**conn_params)
     File "/opt/esmond/lib/python2.7/site-packages/psycopg2/__init__.py", line
    164, in connect
       conn = _connect(dsn, connection_factory=connection_factory, async=async)
    
    OperationalError: FATAL:  Ident authentication failed for user "esmond"
    
  61. How can I force testing over IPv4 or IPv6 in the mesh configuration?
  62. Our dashboard completely stopped working after upgrading to 3.4. All boxes in the grid are now showing the orange "Unable to retrieve data" color. What should we do?
  63. The "Enabled Services" screen on 3.4 is different than on 3.3, why where things changed?
  64. When trying a clean install with perfSONAR Toolkit version 3.4, the system doesn't recognize any disks/doesn't see my RAID controller. Things work with other systems (CentOS 6, Scientific Linux 6). What should I do?
  65. I am using Java Version 8 Update 25 and NDT/NPAD no longer work - what should I do?
  66. When trying to save administrative information, I see "Problem saving configuration: Problem saving administrative information". What should I do?
  67. When running OWAMP 3.3 on OS X Yosemite, the tests stalls. What should I do?
  68. Which repository addresses will be used to get updates to the perfSONAR software?
  69. My toolkit web page (http://HOST/toolkit) won't load when visiting in a web browser. How can I debug?
  70. I see that the latency/loss tests in 3.4 use bwping. Does this mean that we can run both throughput and latency/loss tests on the same interface without interference due to the way bwctl scheduling works?
  71. Where can I find documentation on interacting with perfSONAR via a custom client?
  72. When setting up a dual homed host, how can one get individual tests to use one interface or another?
  73. We have some perfsonar servers having multiple NICs/IPs/Hostnames, the web interface appears to choose the “Primary Address” at its own will, is there a way that I can FIX the “Primary Address” manually, rather than changing from time to time?
  74. Which licenses do perfSONAR products use?
  75. When attempting to publish data to esmond, a "301 MOVED PERMANENTLY" is returned. What should I do?
  76. Is it possible to use non-intel SFP+ optics in the Intel X520-SR2 NIC?
  77. How can I clean up the data in my esmond instance?
  78. How can I backup the data in my esmond instance?
  79. How can I backup a perfSONAR instance?
  80. How can I tune a Dell server for a low latency environment?
  81. How do I disable global registration?
  82. Regular testing is running on my host, but results are not being stored into the database?
  83. A maddash that was working suddenly stopped.
  84. I have a measurement archive machine with esmond running, and there is a separate disk partition mounted on the machine where I want to store all the incoming measurement data from measurement points. What is the proper way to change the default directory location for storing the measurement archive data?
  85. I have an Intel X520 10Gbps NIC, and the load on my machine is very high. What should I do?
  86. How can I nuke all of the data in esmond, and start from scratch?
  87. I am running a small node, and seeing a lot of IO. What is going on?

Q:How do I join the perfSONAR Collaboration? A:

Please contact us at perfsonar-lead@internet2.edu.


Q:Where can I ask questions or report bugs? A:

For questions, send email to performance-node-users at internet2 dot edu. You may also join the mailing list by visiting https://lists.internet2.edu/sympa/info/perfsonar-user. For bugs, report at https://github.com/perfsonar/project/issues.


Q:How do I use the NPAD command line client (diag-client)? A:

The diag-client is a command line version of the NPAD diagnostic client. Instead of needing a web browser, this client runs the tests from a terminal window. The basic options are to provide a server name/address and the connection IP. The NPAD server has two ports open:

  • 8000 for HTTP traffic
  • 8001-8020 for measurement traffic

Please connect to 8100:

[knoppix@Knoppix ~]$ diag-client HOSTNAME 8100
Using: rtt 10 ms and rate 20
Connected.
Control connection established.
port = 8003
Starting test.
Parameters based on 107 ms initial RTT
peakwin=27617 minpackets=3 maxpackets=83 stepsize=8
Target run length is 608 packets (or a loss rate of 0.16447368%)
Test 1a (11 seconds): Coarse Scan
Test 1b (11 seconds): ...
...

Connecting to the HTTP port will result in the following error:

[knoppix@Knoppix ~]$ diag-client HOSTNAME 8200
Using: rtt 10 ms and rate 20
Connected.
Protocol error: bad handshake.
Please make sure you have the latest client,  and you have the correct port number

Q:How do I use BWCTL? A:

BWCTL (Bandwidth Test Controller) is a client/server program developed to simplify Iperf, iperf3, and nuttcp testing between hosts. At boot time, the toolkit starts a BWCTL server process and leaves it listening on TCP port 4823. This server may then be accessed by remote BWCTL clients. Additionally, the disk contains BWCTL client applications that can be used to test to remote instances. The BWCTL server allows TCP tests with a maximum duration of 60 seconds. To run a test to a remote BWCTL server:

  • Logon to the perfSONAR Toolkit server using the root user, or other valid userid
  • Identify the remote server
  • Run bwctl -s remote-bwctl-server command to stream data for 10 seconds from the remote BWCTL server to the local instance.
  • Run bwctl -c remote-bwctl-server command to stream data for 10 seconds from the locally instance to the remote BWCTL server.
  • Results are displayed on our console or terminal window.

Q:How do I use the NDT command line client (web100clt)? A:

The web100clt is a command line version of the NDT diagnostic client. Instead of needing a web browser, this client runs the tests from a terminal window. This can be installed via the "ndt-client" RPM package (available via yum), or by downloading the NDT source and compiling only the client tools. 

The basic options are to provide a server name/address and the connection IP. The NPAD server has two ports open:

  • 7123 for HTTP traffic
  • 3001, 3002, 3003 for measurement traffic

Please connect to 3001, 3002, or 3003:

[knoppix@Knoppix ~]$ web100clt -n HOSTNAME -p 3001
Testing network path for configuration and performance problems  --  Using IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . .  Done
checking for firewalls . . . . . . . . . . . . . . . . . . .  Done
running 10s outbound test (client to server) . . . . .  164.00 kb/s
running 10s inbound test (server to client) . . . . . . 13.40 Mb/s
The slowest link in the end-to-end path is a a 622 Mbps OC-12 subnet
Information [C2S]: Packet queuing detected: 16.95% (local buffers)
Information [S2C]: Packet queuing detected: 67.10% (local buffers)
Server '128.193.128.237' is not behind a firewall. [Connection to the ephemeral port was successful]
Client is not behind a firewall. [Connection to the ephemeral port was successful]
Packet size is preserved End-to-End
Server IP addresses are preserved End-to-End
Client IP addresses are preserved End-to-End

Connecting to the HTTP port (or other ports) will result in the following error:

[knoppix@Knoppix ~]$ web100clt -n HOSTNAME -p 7123
Testing network path for configuration and performance problems  --  Using IPv4 address
Information: The server 'HOSTNAME' does not support this command line client

Q:How do I use OWAMP? A:

OWAMP (One-Way Ping) is a client server program that was developed to provide delay and jitter measurements between two target computers. At boot time, the perfSONAR Toolkit starts an OWAMP server process and leaves it listening on TCP port 861. This server may then be used by remote clients. Additionally, the disk contains an OWAMP client application that can be used to test to remote instances. By default, the OWAMP server sends a low-level data stream in each direction and measures the one-way delay and jitter between the two hosts. Separate measurements, one for each direction, are reported to the user at the end of the test. To run a test to a remote OWAMP server:

  • Logon to the perfSONAR Toolkit server using a valid userid.
  • Identify the remote server.
  • Run the owping remote-owamp-server command to make a pair of 10 second delay measurements (one in each direction) between remote OWAMP server and the local instance.
  • Results are displayed on the console or terminal window.

Q:Can I Use a Firewall? A:

Please see this page on the perfSONAR website, that discusses firewall use. 


Q:What should I enter for the Communities of interest configuration question? A:

This question can be confusing to answer for new users. The goal is to associate some loosely coupled labels to the data that the perfSONAR Toolkit disk will be making available to the larger world. Think of this step similar to assigning labels to photos or music. Some examples of valid answers are:

  • Internet2 - The data made available somehow connects the Internet2 backbone
  • LHC (CMS, ATLAS, etc.) - The disk is part of the LHC deployment structure
  • eVLBI - The disk is a part of the larger telescope community
  • MAX - A connector of member of the MAX gigapop
  • DOE-SC-LAB - US Department of Energy Office of Science Labs

Use as many community names as necessary to properly categorize the data from the installation.


Q:I do not think I am a member of a community, should I put anything? A:

Communities are not required, but they allow other individuals and organizations to find and use your services. It is a good practice to join as many as you may think are applicable.


Q:What are the hardware requirements for running the perfSONAR Toolkit? A:

See this page. Note that the perfSONAR Toolkit development team has not created hard minimum or maximum requirements - the perfSONAR Toolkit will function on almost any form of hardware. Performance considerations do favor meeting or exceeding the minimum guidelines.


Q:Does my machine have to meet the System Requirements? A:

There is nothing on the perfSONAR Toolkit that will prevent systems that do not meet the requirements from starting. Erroneous or inaccurate behavior is possible if the hardware cannot support the measurement tools.


Q:The colors on my Console Configuration do not match what I see on the web. Some are green already. A:

If you are upgrading from a previous version of the perfSONAR Toolkit, the colors may be green already because a particular aspect was configured previously. These do not need to be configured again.


Q:The Services On This Node screen shows many services in the non-running state when first started, what is wrong? A:

Many of the services will be in this state because they are missing some key configuration items (e.g. from the Administrative Info). After following the configuration steps check this screen again, most should be functional.


Q:I do not see my service in the Directory Of Services, where is it? A:

Much like DNS, the information that will populate the Lookup Service will take time to propagate. Please allow some time (e.g. a few hours) before your service will be fully visible.


Q:When looking at the data display for tests I have configured, I do not see results, but I filled out information in the Scheduled Testing area. Where is my data? A:

There are several possible reasons for not seeing data:

  • Data from regular tests will take time to collect. In the case of a 4 hour testing interval, data may arrive anytime in a 4 hour window.
  • Database corruption may prevent the measurement tools from storing the results. Consult the MySQL daemon (e.g. ensure it is running and able to be accessed via the mysql -u root command
  • The measurement tools could fail to complete due to:
    • Firewalls blocking access to key ports at the source, destination, or middle connectivity of the test hosts. Attempt to run a manual test with the target tool to further explore this potential problem.
    • NTP has not fully synchronized on the source or destination host. BWCTL and OWAMP rely on stable clocks. If the clock is not stable due to system restart, lack of the NTP daemon running, or extreme clock skew (e.g. caused by using NTP servers that are not close enough to the target system), the measurement tools will fail to complete.
    • Version mismatch between source and destination systems. This occurrence is rare, and running a manual test with the target tool is the best way to further debug this problem.
    • Changes were made to the /etc/hosts file to re-direct traffic from the system's DNS name to the loopback (127.0.0.1) address. The following shows a modified version of /etc/hosts that would cause measurements to fail:
      127.0.0.1	DNS_HOSTNAME localhost.localdomain localhost
      ::1		localhost6.localdomain6 localhost6
      
      It is suggested that /etc/hosts be similar to the following:
      127.0.0.1	localhost.localdomain localhost
      HOST_IP		DNS_HOSTNAME
      ::1		localhost6.localdomain6 localhost6 
      
      The errors that may result from this modification, and will show up in the syslog output, are as follows for BWCTL:
      bwctl: Unable to connect to HOSTNAME:4823 
      
      And as follows for OWAMP:
      powstream[PID]: OWPControlOpen([HOSTNAME]:861): Couldn't open 'control' connection to server: Invalid argument 
      

Q:What is the purpose of BWCTL Limits/OWAMP Limits? A:

These allow you to limit the influence that outside users have on your system performance. For example, to prevent your machine/network from being saturated with BWCTL tests, limit the duration and maximum bandwidth available. These screens allow a fine grained way to protect resources.


Q:How can I set limits to prevent others from overusing BWCTL/OWAMP? A:

BWCTL and OWAMP have configurable dialog that allows the administrator to limit the resources consumed. To set the limits for BWCTL, consult this section. To set the limits for OWAMP, consult this section.


Q:How many NTP servers do I need, can I select them all? A:

It is recommended that 4 to 5 close and active servers be used. The Select Closest Servers button will help with this decision. Note that some servers may not be available due to routing (e.g. non-R&E networks vs R&E networks - a common problem for Internet2 and ESnet servers).


Q:Service X is listed as Not Running on the web interface. How can I restart? A:


Q:NTPD has exited/is not running on my machine, why did this happen and how can I fix it? A:

NTPD may exit if the hardware clock on the host is too far off from the true time to make a difference. To skip the clock ahead to the correct time, try these commands:

 

[knoppix@Knoppix init.d]$ sudo /etc/init.d/ntp stop
Stopping NTP server: ntpd.
[knoppix@Knoppix init.d]$ sudo ntpdate owamp.newy.net.internet2.edu owamp.wash.net.internet2.edu
Looking for host owamp.newy.net.internet2.edu and service ntp
host found : eth-0.nms-rlat.newy32aoa.net.internet2.edu
Looking for host owamp.wash.net.internet2.edu and service ntp
host found : eth-1.nms-rlat.wash.net.internet2.edu
27 Jan 13:42:51 ntpdate[14891]: adjust time server 2001:468:9:12::16:34 offset -0.001660 sec
[knoppix@Knoppix init.d]$ sudo /etc/init.d/ntp restart
Stopping NTP server: ntpd.
Starting NTP server: ntpd.

 

 

If NTPD continues to exit on a periodic basis, there may be a hardware failure. Consult the machine's BIOS to see if there may be problems with the hardware clock or internal battery.

 


Q:BWCTL seems to exit immediately after starting/restarting. Why won't it stay in a running state? A:

BWCTL relys on NTP (Network Time Protocol) to have an accurate representation of time for measurements. BWCTL (and ntpd) will simply exit if the system clock is too far from the recognized time. To check the status of the ntpd daemon:

 

[knoppix@Knoppix ~]$ ps axw | grep ntpd
 5146 ?        Ss     0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -u 115:121 -g

 

 

If ntpd is not running, you will get nothing back:

 

 

[knoppix@Knoppix ~]$ ps axw | grep ntpd
 5140 pts/0    R+     0:00 grep ntpd

 

 

To bring your system clock back up to date, try the following steps:

 

 

sudo /etc/init.d/ntp stop 
sudo ntpdate owamp.newy.net.internet2.edu
sudo /etc/init.d/ntp start

 

 

To check your system clock on the perfSONAR Toolkit, try the following command (after restarting ntpd):

 

 

[knoppix@Knoppix ~]$ ntpq -p -c rv
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*chronos.es.net  .PPS.            1 u   30   64    1   25.016    7.335   0.170
+navobs1.oar.net .USNO.           1 u   29   64    1    7.299    5.607   0.065
+tick.usno.navy. .USNO.           1 u   28   64    1   40.144    7.690   2.046
-2001:468:1:12:: 130.207.244.240  2 u   27   64    1   27.212    5.913   0.052
-2001:468:2:12:: 64.57.16.34      2 u   26   64    1   30.335    5.501   0.047
assID=0 status=0644 leap_none, sync_ntp, 4 events, event_peer/strat_chg,
version="ntpd 4.2.2p4@1.1585-o Sun Nov 22 16:42:02 UTC 2009 (1)",
processor="i686", system="Linux/2.6.27.37-web100", leap=00, stratum=2,
precision=-20, rootdelay=25.016, rootdispersion=946.806, peer=21397,
refid=198.124.252.90,
reftime=cf7c5bc9.8f99a127  Fri, Apr 23 2010 13:47:53.560, poll=6,
clock=cf7c5bf1.8e0ffeba  Fri, Apr 23 2010 13:48:33.554, state=4,
offset=7.335, frequency=-47.691, jitter=1.113, noise=2.593,
stability=0.020, tai=0

 

 

If you find that your clock is stopping on a regular basis, the internal battery of your server may be failing. Consult your server user's manual or on-line references for more information.

 


Q:Can I boot from a USB key instead of a CD? A:

The perfSONAR Toolkit Netinstall and Live USB images are capable of being installed on a USB stick instead of a CD. To write these images to the media, we recommend using dd:

 

sudo dd if=pS-Performance_Toolkit-3.4-FullInstall-x86_64.iso of=/dev/disk3

 


Q:How can I backup historical OWAMP/BWCTL data? A:

These steps will get rid of all data older than 3 months (adjust the 'maxmonths' if you want to allow more to stay in the db), and store them as SQL dumps on the host. Here are the steps to clean the database on the throughput machine:

 

sudo mkdir -p /var/log/BACKUP/bwctl
sudo /opt/perfsonar_ps/perfsonarbuoy_ma/bin/clean_pSB_db.pl \
--mysqldump-opts="--skip-lock-tables" --dbtype=bwctl --maxmonths=3 \
--owmesh-dir=/opt/perfsonar_ps/perfsonarbuoy_ma/etc/ \
--dumpdir=/var/log/BACKUP/bwctl

 

 

Run the following command on latency machine to archive data older than 30 days (adjust the 'maxdays' to keep more):

 

 

sudo mkdir -p /var/log/BACKUP/owamp
sudo /opt/perfsonar_ps/perfsonarbuoy_ma/bin/clean_pSB_db.pl \
--mysqldump-opts="--skip-lock-tables" --dbtype=owamp --maxdays=30 \
--owmesh-dir=/opt/perfsonar_ps/perfsonarbuoy_ma/etc/ \
--dumpdir=/var/log/BACKUP/owamp

 


Q:A CVE announcement was made for the current perfSONAR Toolkit Kernel, what do I do? A:

The perfSONAR development effort subscribes to all major outlets that will announce kernel CVEs. In the event that a CVE is announce that directly effects operation of the pS Performance Toolit, the following steps will take place:

  • Announcements regarding the CVE will be posted to the perfsonar-user and perfsonar-announce mailing lists, more information on the mailing lists can be found here: https://lists.internet2.edu/sympa/info/perfsonar-user and https://lists.internet2.edu/sympa/info/perfsonar-announce
  • A timeline will be relayed regarding availability of new kernels.
  • The CentOS project will make the patched kernel available first, and it will be available through the yum repositories on the toolkit before the perfSONAR project is able to apply the web100 patches.
    • It is strongly suggested that perfSONAR Toolkit users utilizing the NetInstall option upgrade immediately. Note this will break support for the NDT and NPAD tools, but use of measurement tools is secondary to security. Run the following command:
      sudo yum update
      
    • perfSONAR Toolkit users that are rely on the LiveCD option, through April of 2015, can either continue using their instance, or shut it down depending on the severity of the CVE. Note that the project will announce if a new LiveCD will be available, depending on the severity of the CVE and impact to users.
  • The perfSONAR project will release a web100 patched version of the most recent kernel, and make these available through the yum repo. Announcements will be made again to the mailing lists.

Q:During the NetInstall, I see errors about a corrupt file being downloaded. What should I do? A:

During the NetInstall, you may see some errors about a corrupt file being downloaded along with buttons like Reboot and Retry. This happens if it fails to download an RPM from a mirror, which can happen for numerous reasons. Usually, that error can be solved by hitting Retry. You may have to hit that multiple times depending on which mirrors the install is trying to download the RPM from.


Q:How do I change the MTU for a device? A:

You can view the MTU of your network devices by executing the /sbin/ifconfig command.

To temporarily change the MTU for a device, you use the ifconfig command and specify the device and the new MTU. For example:

 

ifconfig eth0 mtu 9000 up

 

 

To make these changes permanent you need to modify the specific devices configuration file. These files are in /etc/sysconfig/network-scripts/ and have names like ifcfg-eth0 for the device eth0 and ifcfg-eth1 for eth1.

 

  1. For example you could add the line MTU="9000" for IPv4 or IPV6_MTU="9000" for IPv6 to /etc/sysconfig/network-scripts/ifcfg-eth0.
  2. After making the changes you need to restart the network services by running the command 'service network restart' as root.

Q:How do I change the SSL certificate used by the web server? A:

The toolkit by default generates a self-signed SSL certificate that it configures for use with the Apache web server. Some users may desire to replace this certificate with a certificate signed by a certificate authority (CA).

You may also need to replace the certificate due to a problem sometimes encountered with browsers not accepting the self-signed certificate. You may see an error like the following:

 

    HOST uses an invalid security certificate.
    The certificate is not trusted because it is self-signed.
    The certificate is only valid for localhost.localdomain
    (Error code: sec_error_untrusted_issuer)

 

 

You can find instructions for installing a new certificate in Apache here.

 


Q:I forgot to enable IPv6 in CentOS when I installed the toolkit. How do I enable it? A:

It is recommended that you always enable IPv6 during the CentOS installation portion of the toolkit setup. If you did not enable it, then you can do so with the following steps:

  1. Login to the toolkit as a user capable of running sudo
  2. Run sudo and enter your sudo password
  3. Open the file /etc/modprobe.conf in a text editor and remove the following lines:
    alias net-pf-10 off
    alias ipv6 off
    options ipv6 disable=1
    
  4. Restart the host
  5. You can now assign an IPv6 address.

Q:Why is the static IPv6 address I assigned during the net-install process not configured when my host starts-up? A:

When you perform the net-install of the toolkit, you will be prompted twice to enter networking information by CentOS. The first time is to define the networking to be used for downloading required packages. The second prompt is later in the installation and defines what will be configured on the host post-installation. It is a known CentOS behavior that IPv6 information entered at the first prompt is not automatically filled-in at the second prompt. This can be confusing because the IPv4 information does get automatically filled-in. If you do not manually enter the IPv6 information a second time, then your host will not have the address configured post-installation. You will have to manually assign the address if this happens.


Q:The drivers that are included for my "foo" network card old, what should I do? A:

The perfSONAR Toolkit includes drivers from the upstream CentOS distribution and a very limited number of specially built drivers for cards that are known to be widely used (e.g. myicom). If you notice that there is a newer driver available than what the perfSONAR Toolkit provides you can do the following:

  • Download the source or RPM from the manufactuers web site, build on your perfSONAR Toolkit
  • If you are running a netinstall version of the perfSONAR Toolkit, run modprobe after installation to load the new version
  • If you are running a LiveCD, you will need to modify the local init.d script to force modprobe on each reboot to load the proper version

Q:I setup a server from the latest ISO, but when I do a NDT test to the server, nothing happens. Is there a problem with the NDT tool? A:

First check to be sure that the NDT server is running:

 

/etc/init.d/ndt status
web100srv (pid 2430) is running...
fakewww (pid 12737) is running...

 

 

Also check to be sure you are running a web100 enabled kernel:

 

 

uname -a
Linux HOSTNAME 2.6.18-274.7.1.el5.web100 #1 SMP Wed Nov 2 08:35:05 PDT 2011 i686 i686 i386 GNU/Linux

 

 

If you are not running a web100 related kernel, use

 

 

yum update

to be sure you are running the latest version, and restart NDT if necessary.

 


Q:What can I do in the case of a catastrophic hardware failure? A:

In the case of a catastrophic hardware failure you may be able to recover some or all of your data files if you have a recent backup. This section contains information on backing up your system.

If you don’t have a recent backup and you can read your disk you still may be able to recover your data. These scripits: number 1 and number 2, copy your data files and toolkit configuration to a tar archive. You can unpack this archive in the root directory of a fresh toolkit installation to restore your data and configuration. You may also want to copy your /home and /root directories to the new installation.

Note that there is no grantee of a full recovery. Depending on the condition of your system you may or may not have success.


Q:Where can I find more resources regarding timekeeping for VMWare Virtual Machines? A:

VMWare has two resources worth reading:


Q:What TCP congestion control algorithm is used by the perfSONAR Toolkit? A:

The perfSONAR toolkit uses the CentOS 6 default TCP congestion control algorithm, which is htcp. However, if you enable NDT or NPAD, these tools require the Reno congestion control algorithm, so the default congestion control algorithm for the host is changed to reno.


Q:How do I migrate an existing LiveCD installation to a NetInstall installation? A:

See this guide: http://docs.perfsonar.net/manage_update.html#migrating-from-a-3-3-2-livecd-liveusb


Q:My perfSONAR Toolkit instance keeps running out of space allocated via the default partitions, particularly /home is assigned more room than root (/). What can we do to correct this? A:

Anaconda (the CentOS 6 installer) by default splits up the logical volumes into / and /home. We actually added some lines in the install to merge the two logical volumes, but due to variations in volume naming we have found recently that they don't always work. You can manually merge the volumes with a set of commands below. You'll need to be careful when following these steps though as it could mess up your filesystem if you get them wrong:

  1. Run "df -H" and note the name of the home and root partitions under the far left column labelled "File System". They will be something like "/dev/mapper/VolGroup-lv_home" and "/dev/mapper/VolGroup-lv_root". It won't be exactly those (or else you likely wouldn't be seeing this issue) so take note of the actual values for passing to the commands later.
  2. Run the command "lvdisplay". Take note of the LV Path field for the home and root partitions for later commands. They will be something like /dev/VolGroup/lv_home and /dev/VolGroup/lv_root.
  3. Backup your current home directory:
    	mv /home /home.bak
    
  4. Unmount the home volume (replace "/dev/mapper/VolGroup-lv_home" with the name of the /home volume from step 1):
    	umount /dev/mapper/VolGroup-lv_home
    
  5. Remove the home volume (Replace /dev/VolGroup/lv_home with the LV path for the /home volume from step 2)
    	lvremove -f /dev/VolGroup/lv_home
    
  6. Extend your root volume (Replace "/dev/VolGroup/lv_root" with the LV path for the root volume from Step 2)
    	lvextend -r -l +100%FREE /dev/VolGroup/lv_root
    
  7. Open /etc/fstab in a text editor and remove the line referencing the old /home volume.
  8. Restore your home directory:
    	mv /home.bak/ /home/
    

Q:Where are the relevant logs for perfSONAR services? A:

Please see http://docs.perfsonar.net/manage_logs.html for more information. 


Q:Is it possible to change the default port for tool X? A:

The measurement servers use 2 kinds of ports:

  • Contact ports, e.g. a well known location to contact the daemon to initiate a test
  • Test ports, e.g. negotiated ports to flow test or control traffic when a test is requested

Test ports are easily configured to run on a specific set of ports, and can be configured to be opened in a site firewall. The daemon is often able to negotiate these at run time. The contact port is well known, and because of that should never be changed to a different value. Doing so severely impacts the ability of the tool to interoperate on a global scale.

As an example, the OWAMP server listens on the registered port 861 (see http://tools.ietf.org/search/rfc4656 section 2). This is the standard port for the application, in the same way that port 80 is the standard port for an HTTP server. While one can run a web server on a port other than 80, it makes the web server less useful because it's not a standard config. The same is true for OWAMP. The OWAMP protocol is standardized, and has a well-known port - port 861 - associated with it. Running the OWAMP daemon on a non-standard port introduces significant interoperability challenges between deployments.

If you're going to run a measurement infrastructure inside your own organization, you are of course free to do whatever you want. If you want to integrate with the rest of the world, the measurement tools should be run on the standard port to ensure interoperability.


Q:Why doesn't the perfSONAR toolkit include the most recent version of the Myricom NIC driver? A:

We only support the default CentOS device drivers on the toolkit. However, several groups have reported 10% performance improvements using the latest driver from Myricom. If you want to update your toolkit host to the latest version of the Myricom driver, follow the instructions here


Q:My perfSONAR node is experiencing a kernel panic for a recently installed kernel, what should I do? A:

A kernel panic might produce errors similar to this during the boot process:

 

RAMDISK: incomplete write (3107 != 32768)
write error
Kernel panic - not syncing: VFS: Unable to mount root fs on unkown-block(0,0)
Pid: 1, comm: swapper Not tainted 2.6.32-431.11.2.e16.aufs.web100.x86_64 #1

 

 

When this occurs, it is suggested to reboot, and when the GRUB prompt appears, boot into a known working kernel (e.g. normally a previous iteration that should be present on the machine). If this kernel succeeds in booting, it is recommended to evaluate the available space on the disk as a first step:

 

 

[root@host ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       50G  2.5G   45G   6% /
tmpfs                 939M     0  939M   0% /dev/shm
/dev/sda1             485M  151M  309M  33% /boot
/dev/mapper/VolGroup-lv_home
                      176G  6.3G  161G   4% /home

 

 

If the /boot partition is listed as full, it will be necessary to uninstall some of the old kernel packages to make room. The /boot partition is a part of the underlying CentOS distribution, and cannot be changed. It's size will be a function of the amount of disk that was available at the time of installation. Regular maintenance will be required to ensure it has enough operating room. To remove old kernels, first get a listing of what is installed:

 

 

[root@host ~]$ rpm -qa | grep ^kernel.* | sort
kernel-2.6.32-431.11.2.el6.x86_64
kernel-2.6.32-431.3.1.el6.aufs.web100.x86_64
kernel-2.6.32-431.5.1.el6.aufs.web100.x86_64
kernel-2.6.32-431.el6.web100.aufs.x86_64
kernel-2.6.32-431.el6.x86_64
kernel-devel-2.6.32-431.11.2.el6.x86_64
kernel-devel-2.6.32-431.3.1.el6.aufs.web100.x86_64
kernel-devel-2.6.32-431.5.1.el6.aufs.web100.x86_64
kernel-devel-2.6.32-431.el6.web100.aufs.x86_64
kernel-devel-2.6.32-431.el6.x86_64
kernel-firmware-2.6.32-431.11.2.el6.noarch
kernel-headers-2.6.32-431.11.2.el6.x86_64

 

 

It is recommended to remote the kernel that didn't work, as well as versions that are 2 and 3 back from your current version. Yum can be used to remove the kernels:

 

 

[psadmin@nettest ~]$ sudo yum remove kernel-2.6.32-431.el6 kernel-2.6.32-431.el6.web100.aufs kernel-2.6.32-431.3.1.el6.aufs.web100 kernel-devel-2.6.32-431.el6 
kernel-devel-2.6.32-431.el6.web100.aufs kernel-devel-2.6.32-431.3.1.el6.aufs.web100
[sudo] password for psadmin: 
Loaded plugins: fastestmirror, security
Setting up Remove Process
Resolving Dependencies
--> Running transaction check
---> Package kernel.x86_64 0:2.6.32-431.el6 will be erased
---> Package kernel.x86_64 0:2.6.32-431.el6.web100.aufs will be erased
---> Package kernel.x86_64 0:2.6.32-431.3.1.el6.aufs.web100 will be erased
---> Package kernel-devel.x86_64 0:2.6.32-431.el6 will be erased
---> Package kernel-devel.x86_64 0:2.6.32-431.el6.web100.aufs will be erased
---> Package kernel-devel.x86_64 0:2.6.32-431.3.1.el6.aufs.web100 will be erased
--> Finished Dependency Resolution
Globus                                                                                                         | 1.9 kB     00:00     
Globus-Updates                                                                                                 | 1.9 kB     00:00     
Internet2                                                                                                      |  951 B     00:00     
base                                                                                                           | 3.7 kB     00:00     
epel/metalink                                                                                                  |  13 kB     00:00     
epel                                                                                                           | 4.4 kB     00:00     
epel/primary_db                                                                                                | 6.1 MB     00:33     
extras                                                                                                         | 3.4 kB     00:00     
perfsonar                                                                                                      |  951 B     00:00     
perfsonar-noarch                                                                                               |  951 B     00:00     
perfsonar-source                                                                                               |  951 B     00:00     
updates                                                                                                        | 3.4 kB     00:00     

Dependencies Resolved

======================================================================================================================================
 Package                  Arch               Version                                       Repository                            Size
======================================================================================================================================
Removing:
 kernel                   x86_64             2.6.32-431.el6                                @base                                121 M
 kernel                   x86_64             2.6.32-431.el6.web100.aufs                    @Internet2-web100_kernel             121 M
 kernel                   x86_64             2.6.32-431.3.1.el6.aufs.web100                @Internet2-web100_kernel             121 M
 kernel-devel             x86_64             2.6.32-431.el6                                @base                                 24 M
 kernel-devel             x86_64             2.6.32-431.el6.web100.aufs                    @Internet2-web100_kernel              24 M
 kernel-devel             x86_64             2.6.32-431.3.1.el6.aufs.web100                @Internet2-web100_kernel              24 M

Transaction Summary
======================================================================================================================================
Remove        6 Package(s)

Installed size: 435 M
Is this ok [y/N]: 

 

 

After removing the kernels, you can verify the space again, and run a yum update to reinstall the latest kernel. Reboot and see if things have returned to normal.

 


Q:Can I configure yum to exclude kernel packages from it's update procedure? A:

A detailed explanation of yum configuration can be found in the RHEL documents: https://access.redhat.com/site/solutions/10185. There are two ways to exclude kernel packages from a yum update, the first solution can be invoked on the command line:

 

yum update --exclude=kernel*

 

 

To make permanent changes, edit the /etc/yum.conf file and following entries to it:

 

 

[main]
cachedir=/var/cache/yum/$basearch/$releasever
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
exclude=kernel* samba*                           <==== 

 

 

NOTE: If there are multiple package to be excluded then separate them using a single space or comma.

 


Q:How can I configure yum to automatically update the system? A:

Note that as of version 3.4, this is enabled, see this more for more detail: http://docs.perfsonar.net/manage_update.html#automatic-updates

There is a package that is available that will run yum via a regularly occurring cron job. To install this package, do the following:

 

sudo yum install yum-cron

 

 

The configuration file for this service is located in /etc/sysconfig/yum-cron. It is recommended that the administrator change the MAILTO= directive to be an email address (or simply root. After this package is installed and configured, it will need to be enabled in the chkcongif system:

 

 

sudo /sbin/chkconfig yum-cron --levels 2345 on

 

 

Note that this process will follow whatever directives are specified in the global yum.conf file, and will exclude any packages you choose not to be updated (e.g. the kernel).

 

To exclude kernel packages, edit /etc/sysconfig/yum-cron and change the YUM_PARAMETER line:

YUM_PARAMETER="-x kernel*"


Q:How can I convert NDT/NPAD into signed JAVA applets? A:

Note that as of version 3.4, the NDT applet is signed by default.

The following steps can be used to install signed JAVA applets for NDT and NPAD and are courtsey of PNNL.

  1. Modify the NDT and NPAD source RPMs (http://software.internet2.edu/rpms/el6/SRPMS/) to incorporate a "Permissions: sandbox" line in MANIFEST.MF. (The NPAD tarball in the source RPM includes a precompiled DiagClient.jar file so by default it is never recompiled; fixing this takes another small change to the .spec file %prep section.)
  2. Have the resulting .jar files signed with an Extended Validation Java code signing certificate from a vendor that can provide one (e.g. Entrust)
  3. Copy over /usr/ndt/Tcpbw100.jar and /var/lib/npad/DiagClient.jar with the signed .jar files.
  4. Stop and restart the NDT and NPAD services.

Note that after these steps are complete, Oracle Java with default security settings should allow these applets to run.


Q:When attempting to use BWCTL with an IPv6 address, the command fails: bwctl: Unable to connect to 2001:468:1:11::16:66:4823. What should I do? A:

Wrap the IPv6 address in square brackets and double quotes (to prevent the shell from trying to interpret the brackets). For example: bwctl -T iperf3 -t 10 -i 1-c "[2001:468:1:11::16:66]:4823"


Q:OWPING/POWSTREAM tests are failing to certain hosts with a "Server denied test" and "Session Failed!" message. How can I debug this? A:

We have received some recent reports from users related to the OWAMP, OWPING, and POWSTREAM tools that note tests are being denied. An example message from the command line (similar messages may appear in /var/log/messages) indicates a policy constraint is preventing tests from completing:

 

[user@host ~]$ owping host2
owping: FILE=capi.c, LINE=709, Server denied test: host2
owping: FILE=owping.c, LINE=195, Session Failed!

 

 

We believe this is related to a setting in the owampd.limits file, related to the 'bandwidth' that a daemon may be able to consume, that is being triggered when a host participants in a medium to large number of simultaneous tests. This setting was originally set to be 100 Kbps, and the project has taken steps to upgrade this to 1 Mbps in the next release of the software. In the meantime, users can make this change manually:

 

 

# setup the root node with unlimited disk and bandwidth
limit root with disk=0,\
		bandwidth=0,\
               delete_on_fetch=on

# Limit non-authorized users to 10G of disk storage, and 1M of bandwidth
# (overall for all non-authorized users). Unspecified values default to
# parent permissions
limit regular with parent=root,\
		disk=10G,\
		bandwidth=1M

# Default everything to regular user access
assign default regular

 


Q:How can I edit the thresholds for maddash? A:

The /etc/maddash/maddash-server/maddash.yaml contains the threshold definitions. For example, changes related to OWAMP Latency/Loss dashboards are performed by a script named check_owdelay.pl, search for this in the file. The following represents a check that may be performed:

 

/opt/perfsonar_ps/nagios/bin/check_owdelay.pl -u %maUrl -w 0.001 -c 0.1 -r 2700 -l -p -s %row -d %col

 

 

The -w and -c are the warning and critical thresholds respectively. These values can be adjusted to thresholds that are more suitable. Note that the legend (for display purposes) should also be changed. If you are using the meshconfig agent, this file is re-written if there are configuration changes.

 


Q:My host was impacted by Shellshock/Heartbleed/etc., what should I do? A:

Please check the vulnerability archive for the specific attack you interested in, information on mitigation for perfSONAR nodes will be posted there, or in the mailing list archives.


Q:Are there instructions on how to grab all the necessary files for perfsonar and put them on a new install? A:

The scripts below exist to backup and restore files used by perfsonar:

 

/opt/perfsonar_ps/toolkit/scripts/ps-toolkit-migrate-backup.sh 
/opt/perfsonar_ps/toolkit/scripts/ps-toolkit-migrate-restore.sh 

 

 

Replace in the first command with the name of a tarball file you would like the script to create. In the second command, give it the name of the file generated by the first command.

 

Note that this method will restore things like the /etc/passwd file. If you are doing the rebuild as the result of a security breach, it is safer to just start from scratch to make sure compromised things aren't inadvertently copied.


Q:I would like to install and patch perfsonar boxes behind a web proxy, is it possible to specify this on the grub command line? A:

Anaconda documentation indicates this grub parameter should do the trick:

 

proxy=[protocol://][username[:password]@]host[:port]

 

 

Note that during a fresh network installation, Anaconda does install updates immediately (e.g. it wouldn't use a version of an RPM from when the ISO was built), and doesn't actually run any network services before the reboot. For those that are rebuilding after shellshock - using a proxy should not be necessary.

 


Q:How can I add custom rules to IPTables? A:

The rules added by the perfSONAR toolkit are contained within a special perfSONAR chain of iptables (and ip6tables). You may add rules to the other chains, such as the INPUT chain, just as you would any other firewall rule. It is NOT recommended you change the perfSONAR chain as any changes you make could be overwritten by a software update.

An example that will block access to port 8000 for all traffic is shown in the steps below (note it is not recommended you run this command as it will block access to the NPAD tool):

  1. Login to your host via SSH or the terminal as a root user
  2. View your current iptables rules with the following /sbin/service iptables status command:
    # /sbin/service iptables status
    Table: filter
    Chain INPUT (policy ACCEPT)
    num  target     prot opt source               destination
    1    fail2ban-SSH  tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22
    2    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
    3    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED
    4    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:22
    5    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           state NEW udp spt:547 dpt:546
    6    perfSONAR  all  --  0.0.0.0/0            0.0.0.0/0
    7    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    
    Chain FORWARD (policy ACCEPT)
    num  target     prot opt source               destination
    1    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    
    Chain OUTPUT (policy ACCEPT)
    num  target     prot opt source               destination
    1    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
    
    Chain fail2ban-SSH (1 references)
    num  target     prot opt source               destination
    1    RETURN     all  --  0.0.0.0/0            0.0.0.0/0
    
    Chain perfSONAR (1 references)
    num  target     prot opt source               destination
    1    ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0           icmp type 255
    2    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:80 state NEW,ESTABLISHED
    3    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:443 state NEW,ESTABLISHED
    4    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpt:123 udp
    5    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:33434:33634
    6    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:8000
    7    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:8001:8020
    8    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:7123
    9    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:3001:3003
    10   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:861
    11   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:8760:9960
    12   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:4823
    13   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:6001:6200
    14   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:6001:6200
    15   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:5000:5600
    16   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:5000:5900
    17   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:10101:10300
    18   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:10101:10300
    19   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:7
    20   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpt:7
    21   RETURN     all  --  0.0.0.0/0            0.0.0.0/0
    
  3. Run the iptables command below to block port 8000. Note that we are adding this rule to the INPUT chain and leaving the perfSONAR chain untouched. We are also adding it at position 5 which ensures it is processed before the perfSONAR rules:
    iptables -I INPUT 5 -p tcp --dport 8000 -j REJECT
    
  4. Now save your configuration change:
    /sbin/service iptables save
    
  5. We can now see our new rule was added:
    # /sbin/service iptables status
    Table: filter
    Chain INPUT (policy ACCEPT)
    num  target     prot opt source               destination
    1    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
    2    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED
    3    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:22
    4    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           state NEW udp spt:547 dpt:546
    5    REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:8000 reject-with icmp-port-unreachable
    6    perfSONAR  all  --  0.0.0.0/0            0.0.0.0/0
    7    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    
    Chain FORWARD (policy ACCEPT)
    num  target     prot opt source               destination
    1    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    
    Chain OUTPUT (policy ACCEPT)
    num  target     prot opt source               destination
    1    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
    
    Chain perfSONAR (1 references)
    num  target     prot opt source               destination
    1    ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0           icmp type 255
    2    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:80 state NEW,ESTABLISHED
    3    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:443 state NEW,ESTABLISHED
    4    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpt:123 udp
    5    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:33434:33634
    6    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:8000
    7    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:8001:8020
    8    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:7123
    9    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:3001:3003
    10   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:861
    11   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:8760:9960
    12   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:4823
    13   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:6001:6200
    14   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:6001:6200
    15   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:5000:5600
    16   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:5000:5900
    17   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpts:10101:10300
    18   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpts:10101:10300
    19   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW,ESTABLISHED tcp dpt:7
    20   ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpt:7
    21   RETURN     all  --  0.0.0.0/0            0.0.0.0/0
    

Note: Prior to version 3.4, custom firewall rules were not handled properly. As such you may find that when upgrading from versions older than 3.4 that you will lose any custom rules. Following the steps above should ensure your rules are maintained for updates beyond 3.4 in the foreseeable future.


Q:When installing the perfSONAR Toolkit 3.4 netinstall image with static IP, it does not keep this when the system first boots. The BOOTPROTO option remains set to "dhcp" in the /etc/sysconfig/network-scripts/ifcfg-eth0 after the installation. Is this a bug? A:

See the documentation in http://docs.perfsonar.net/install_centos_netinstall.html#step-by-step-guide for more information, this is a known problem with anaconda (the RHEL/CentOS installer). After installation you may need to re-enter the network settings. These settings are only expected for use during the installation process and may be discarded once the process completes.


Q:I have yum updated from 3.3.x to 3.4 and rebooted. It is asking for a user to configure when log on again ("Once you set passwords, you can login to the web interface and finish configuration."). Is this expected? Why is it not respecting the configuration set in 3.3.2? A:

Users in the wheel (e.g. root) group are no longer allowed to login to the web interface. If after looking at all your users there are none that can login to the web interface but aren't in wheel, then it asks you to create a new one.


Q:There is a new repository in version 3.4 for Datastax, what is this for? Can it be reached over R&E networking connections? A:

The datastax repo is used to provide the "cassandra" package for version 3.4 of the perfSONAR Toolkit software. The datastax repo is not currently reachable from nodes that lack commodity connectivity. Starting with version 3.4.1 of the perfSONAR toolkit, the main repository will contain the same build of cassandra that datastax contains. Those wishing to delete the datastax repo completely can do so.


Q:I don't see a way to control SSH via the 'Enabled Services' screen on version 3.4 of my toolkit, where did it go? A:

To clean up the operational behavior of the perfSONAR toolkit, some services were removed from 'Enabled Services', and users are encouraged to use the chkconfig utility instead. This can be done by first establishing when the service will start:

 

# sudo chkconfig --list sshd
sshd            0:off   1:off   2:off    3:off    4:off    5:off    6:off

 

 

Using something like this, the service can be modified to start for different runlevels (and restarted so it is running now):

 

 

sudo chkconfig sshd on --level 2345
sudo /etc/init.d/sshd start

 

 

Finally, a service can be completely disabled in this manner:

 

 

# sudo chkconfig sshd off

 


Q:When accessing graphs on my 3.4 host, I see this error: Error retrieving test data: undefined A:

Check the /var/log/esmond/django.log logs, if something like this is seen:

 

2014-10-07 14:56:36,173 [ERROR] /opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py: Internal Server Error: /esmond/perfsonar/archive/
Traceback (most recent call last):
  File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 103, in get_response
    resolver_match = resolver.resolve(request.path_info)
  File "/opt/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 319, in resolve
    for pattern in self.url_patterns:
  File "/opt/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 347, in url_patterns
    patterns = getattr(self.urlconf_module, "urlpatterns", self.urlconf_module)
  File "/opt/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 342, in urlconf_module
    self._urlconf_module = import_module(self.urlconf_name)
  File "/opt/esmond/lib/python2.7/site-packages/django/utils/importlib.py", line 35, in import_module
    __import__(name)
  File "/opt/esmond/esmond/urls.py", line 8, in 
    from esmond.api.perfsonar.api import perfsonar_api
  File "/opt/esmond/esmond/api/perfsonar/api.py", line 47, in 
    'histogram': db.raw_cf,
AttributeError: 'NoneType' object has no attribute 'raw_cf'

 

 

Try to restart the service:

 

 

/sbin/service cassandra restart
/sbin/service httpd restart

 

 

If this fails to restart the service, check the cassandra logs. If the following message is seen:

 

 

Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

 

 

Check the settings in /etc/cassandra/default.conf/cassandra-env.sh. It may be the case that they should be reduced, e.g.:

 

 

MAX_HEAP_SIZE="2048M"
HEAP_NEWSIZE="400M"

 

 

By default, cassandra will set the heap size to 1/4 of system memory (more information: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html). If this is not available (e.g. if the system is running as a 32 bit machine, or has limited system memory) the message above will appear.

 


Q:How can I write the fullinstall or netinstall ISO to a USB key for installation? A:

For Linux and MacOS X, download the ISO and use the DD tool as in the example below. Note that DD will blow away any existing partitions (e.g. it treats the resource as a raw disk). Replace the ISO name and name of USB drive:

 

sudo dd if=pS-Performance_Toolkit-3.4-FullInstall-x86_64.iso of=/dev/disk3

 


Q:I've just updated from 3.3.2 to 3.4. However, but the host is indicating there is no address configured. What should I do? A:

The host may show this on the web interface:

 

Primary Address shows "No External Address Configured"
MTU "Could not retrieve MTU"

 

 

As of 3.4, the /opt/perfsonar_ps/toolkit/etc/external_addresses software will not be used.

 

If your host as a private address, you will need to edit the /opt/perfsonar_ps/toolkit/web/root/gui/services/etc/web_admin.conf and set the allow_internal_addresses to 1. After doing this, refresh the page (no service restarts required, etc).


Q:After upgrading from 3.3 to 3.4, I see some of the data for my Throughput/Latency tests, but not all of it is shown. Specifically the three columns of "Throughput Latency(ms) Loss(%)" are missing, only the Source and Destination nodes are displayed. If I click on a line where tests are running, a graph with the results is displayed. Only the main page seems to have lost the three columns. Is the data gone? A:

This is expected behavior in 3.4. If there are more than a certain number of tests (>10) the data in the other columns isn't shown by default to save loading time of the page.


Q:After upgrading to 3.4 from a 3.4 RC, the web interface is not running and the following error is seen:

/etc/init.d/httpd start
Starting httpd: Syntax error on line 1 of /etc/httpd/conf.d/apache-esmond.conf:
Invalid command 'WSGIScriptAlias', perhaps misspelled or defined by a module not included in the server configuration
                                                           [FAILED]

. What should we do? A:

Prior to RC3 (and again when we released RC3) there were special upgrade instructions for RC2 users to prevent this. RC2 points at the CentOS SCL repo, which uses a different version of python27-mod_wsgi which does not work with the standard apache and overwrites the version we provided. To fix this, do the following:

 

yum remove centos-release-SCL python27-mod_wsgi python27 python27-python httpd24-httpd httpd24-apr httpd24-apr-util httpd24-httpd-tools httpd24-runtime scl-utils
yum clean all
rm -rf /var/lib/pgsql/data/*
yum install perl-perfSONAR_PS-Toolkit perl-perfSONAR_PS-Toolkit-SystemEnvironment
reboot

 

 

If errors still persist, the pgsql database can be reset in this manner (N.B. this will delete all data):

 

 

rm -rf /var/lib/pgsql/data/*
/opt/perfsonar_ps/toolkit/scripts/system_environment/configure_esmond 
/opt/perfsonar_ps/toolkit/scripts/system_environment/configure_regular_testing
/etc/init.d/httpd restart
/etc/init.d/regular_testing start

 


Q:We are seeing the following errror retrieving test data, this is from the django log:

File
"/opt/esmond/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg
2/base.py", line 182, in _cursor
   self.connection = Database.connect(**conn_params)
 File "/opt/esmond/lib/python2.7/site-packages/psycopg2/__init__.py", line
164, in connect
   conn = _connect(dsn, connection_factory=connection_factory, async=async)

OperationalError: FATAL:  Ident authentication failed for user "esmond"


A:

The following series of commands will reset the database (N.B. this will delete all data):

 

rm -rf /var/lib/pgsql/data/*
/opt/perfsonar_ps/toolkit/scripts/system_environment/configure_esmond
/opt/perfsonar_ps/toolkit/scripts/system_environment/configure_regular_testing
/etc/init.d/httpd restart
/etc/init.d/regular_testing start

 

 

 



Q:How can I force testing over IPv4 or IPv6 in the mesh configuration?
A:

 

 

There is both a ipv4_only and ipv6_only option you can set in the test parameters of a mesh config. Setting them both at the same time gives an error.

 

 



Q:Our dashboard completely stopped working after upgrading to 3.4. All boxes in the grid are now showing the orange "Unable to retrieve data" color. What should we do?
A:

 

 

Checking in the maddash-server.netlogger.log file may reveal errors that look like the following:

 

level=INFO ts=2014-10-14T08:20:06.313974Z event=maddash.RunCheckJob.execute.runCheck.end guid=0e7c7ee5-840b-487f-8e26-c662648f45ad resultMsg=" Unable to contact MA. Please check that the MA is running and the URL is correct."

 

 

The maURL in the maddash.yaml file may look like this

:

 

 

maUrl: default: "http://%row:8085/perfSONAR_PS/services/pSB"

This will need to be changed to:

 

 

maUrl: default: "http://%row/esmond/perfsonar/archive/"

Additionally, make this change for the graphURL entry (replacing SERVERNAME with the name of the server):

 

graphUrl: "SERVERNAME/serviceTest/graphWidget.cgi?url=%maUrl&dest=%col&source=%row"

 

And then the service should be restarted.

 

 

 



Q:The "Enabled Services" screen on 3.4 is different than on 3.3, why where things changed?
A:

 

 

Version 3.4 offers a more streamlined control view, featuring 5 controllable services. More information on each can be found in the documentation: http://docs.perfsonar.net/manage_services.html.

As a mapping of old to new, please see this list:

  • PingER - Removed as stand-alone service. Functionality is delivered via regular testing interface.
  • perfSONAR-BUOY Throughput Testing - Now controlled via the current "BWCTL" service.
  • perfSONAR-BUOY Latency Testing - Now controlled via the current "OWAMP" service.
  • perfSONAR-BUOY Measurement Archive - Now controlled via the current "BWCTL" and "OWAMP" services.
  • NDT - Daemon is controlled in same manner.
  • NPAD - Daemon is controlled in same manner.
  • BWCTL - Daemon is controlled in same manner.
  • OWAMP - Daemon is controlled in same manner.
  • SSH - Removed from control in the interface, use chkconfig to control service stop/start.
  • SNMP MA - Removed as stand-alone service, along with the CACTI tool. No replacement is available.
  • Traceroute MA - Removed as stand-alone service. Functionality is delivered via regular testing interface.
  • Traceroute Scheduler - Removed as stand-alone service. Functionality is delivered via regular testing interface.
  • Lookup Service - Removed as stand-alone service. Functionality is automatic after system configuration.

 

 



Q:When trying a clean install with perfSONAR Toolkit version 3.4, the system doesn't recognize any disks/doesn't see my RAID controller. Things work with other systems (CentOS 6, Scientific Linux 6). What should I do?
A:

 

 

If you have started with a different OS, you can attempt to install the necessary packages manually.

Once it’s installed, reboot the machine.

Alternatively, if you are upgrading from a LiveCD, use 'dban' to wipe out the data partition placed there from previous use.

 

 



Q:I am using Java Version 8 Update 25 and NDT/NPAD no longer work - what should I do?
A:

 

 

Some users have reported that Java Version 7 Update 71 has the same feature and security set as Version 8 Update 25, and that NDT/NPAD do work after adding them to the whitelist. Java security is important, so please evaluate the choice to run older or newer instances.

 

 



Q:When trying to save administrative information, I see "Problem saving configuration: Problem saving administrative information". What should I do?
A:

 

 

Run service config_daemon start, this should normally be started before httpd (along with cassandra).

 

 



Q:When running OWAMP 3.3 on OS X Yosemite, the tests stalls. What should I do?
A:

 

 

The error may look like this:

 

owping: FILE=time.c, LINE=124, NTP syscalls unavail: Status UNSYNC (clock
offset issues likely)
owping: FILE=endpoint.c, LINE=2198, setitimer(wake=6,1582366)
seq=140733193388032: Invalid argument
Approximately 12.9 seconds until results available
owping: FILE=endpoint.c, LINE=1527, run_sender: Exiting from signal
owping: FILE=api.c, LINE=671, I2Readn(skiprecfd): Interrupted system call

 

 

A temporary patch is available that can be applied to the source (and then the code should be recompiled):

 

 

cd owamp-3.3-directory
wget -c http://ndb1.internet2.edu/~aaron/owamp_time_fix.patch
patch -p1 -i /path/to/owamp_time_fix.patch

 

 

 



Q:Which repository addresses will be used to get updates to the perfSONAR software?
A:

 

 

By default, the Internet2 repo points at a mirror list hosted by software.internet2.edu. In this mirror list is linux.mirrors.es.net. In order to use the default configuration you will need to allow access to software.internet2.edu so you can grab the mirrorlist. After that, the packages can be downloaded from any of the sites listed which includes linux.mirrors.es.net, software.internet2.edu, and a few other places. You should be able to get away with just opening up access to software.internet2.edu (so it can get the mirror list) and linux.mirrors.es.net (so you can get the packages). Those should be the only places you need as linux.mirrors.es.net also has a mirror for all the base CentOS packages.

The initial version of 3.4 contained a pointer to another repo only hosted on datstax.com, but this was dropped for release 3.4.1 and later. If you do encounter this repo you can safely run a 'yum remove datastax-repo' as the packages it provides have been copied to the Internet2 repo.

 

 



Q:My toolkit web page (http://HOST/toolkit) won't load when visiting in a web browser. How can I debug?
A:

 

 

Check the apache logs, if the following appears:

 

[Fri Dec 19 14:22:47 2014] [error]
[Fri Dec 19 14:22:47 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:22:47 2014] [error] Traceback (most recent call last):
[Fri Dec 19 14:22:47 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 77, in emit
[Fri Dec 19 14:22:47 2014] [error]     self.doRollover()
[Fri Dec 19 14:22:47 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 140, in doRollover
[Fri Dec 19 14:22:47 2014] [error] os.rename(self.baseFilename, dfn)
[Fri Dec 19 14:22:47 2014] [error] OSError: [Errno 13] Permission denied
[Fri Dec 19 14:22:47 2014] [error] Logged from file base.py, line 212
[Fri Dec 19 14:22:47 2014] [error]
[Fri Dec 19 14:22:47 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161] mod_wsgi (pid=15710): Exception occurred processing WSGI script '/opt/esmond/esmond/wsgi.py'.
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161] Traceback (most recent call last):
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]   File "/opt/esmond/esmond/wsgi.py", line 31, in application
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161] return _application(environ, start_response)
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 255, in __call__
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161] response = self.get_response(request)
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 178, in get_response
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161] response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 220, in handle_uncaught_exception
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]     if resolver.urlconf_module is None:
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 342, in urlconf_module
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161] self._urlconf_module = import_module(self.urlconf_name)
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/utils/importlib.py", line 35, in import_module
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161] __import__(name)
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]   File "/opt/esmond/esmond/urls.py", line 8, in 
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]     from esmond.api.perfsonar.api import perfsonar_api
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161]   File "/opt/esmond/esmond/api/perfsonar/api.py", line 47, in 
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161] 'histogram': db.raw_cf,
[Fri Dec 19 14:22:47 2014] [error] [client 129.79.53.161] AttributeError: 'NoneType' object has no attribute 'raw_cf'
[Fri Dec 19 14:22:56 2014] [error] [client 129.79.53.159] (13)Permission denied: exec of '/opt/perfsonar_ps/toolkit/web/root/index.cgi' failed
[Fri Dec 19 14:22:56 2014] [error] [client 129.79.53.159] Premature end of script headers: index.cgi
[Fri Dec 19 14:23:05 2014] [error]
[Fri Dec 19 14:23:05 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:05 2014] [error] Traceback (most recent call last):
[Fri Dec 19 14:23:05 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 77, in emit
[Fri Dec 19 14:23:05 2014] [error]     self.doRollover()
[Fri Dec 19 14:23:05 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 140, in doRollover
[Fri Dec 19 14:23:05 2014] [error] os.rename(self.baseFilename, dfn)
[Fri Dec 19 14:23:05 2014] [error] OSError: [Errno 13] Permission denied
[Fri Dec 19 14:23:05 2014] [error] Logged from file base.py, line 212
[Fri Dec 19 14:23:05 2014] [error]
[Fri Dec 19 14:23:05 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] mod_wsgi (pid=15712): Exception occurred processing WSGI script '/opt/esmond/esmond/wsgi.py'.
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] Traceback (most recent call last):
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110]   File "/opt/esmond/esmond/wsgi.py", line 31, in application
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] return _application(environ, start_response)
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 255, in __call__
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] response = self.get_response(request)
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 178, in get_response
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 220, in handle_uncaught_exception
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110]     if resolver.urlconf_module is None:
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110]   File "/opt/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 342, in urlconf_module
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] self._urlconf_module = import_module(self.urlconf_name)
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110]   File "/opt/esmond/lib/python2.7/site-packages/django/utils/importlib.py", line 35, in import_module
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] __import__(name)
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110]   File "/opt/esmond/esmond/urls.py", line 8, in 
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] from esmond.api.perfsonar.api import perfsonar_api
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110]   File "/opt/esmond/esmond/api/perfsonar/api.py", line 47, in 
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] 'histogram': db.raw_cf,
[Fri Dec 19 14:23:05 2014] [error] [client 192.41.231.110] AttributeError: 'NoneType' object has no attribute 'raw_cf'
[Fri Dec 19 14:23:06 2014] [error] [client 129.79.53.161] (13)Permission denied: exec of '/opt/perfsonar_ps/toolkit/web/root/index.cgi' failed
[Fri Dec 19 14:23:06 2014] [error] [client 129.79.53.161] Premature end of script headers: index.cgi
[Fri Dec 19 14:23:14 2014] [error]
[Fri Dec 19 14:23:14 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:14 2014] [error] Traceback (most recent call last):
[Fri Dec 19 14:23:14 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 77, in emit
[Fri Dec 19 14:23:14 2014] [error]     self.doRollover()
[Fri Dec 19 14:23:14 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 140, in doRollover
[Fri Dec 19 14:23:14 2014] [error] os.rename(self.baseFilename, dfn)
[Fri Dec 19 14:23:14 2014] [error] OSError: [Errno 13] Permission denied
[Fri Dec 19 14:23:14 2014] [error] Logged from file base.py, line 212
[Fri Dec 19 14:23:14 2014] [error]
[Fri Dec 19 14:23:14 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159] mod_wsgi (pid=15712): Exception occurred processing WSGI script '/opt/esmond/esmond/wsgi.py'.
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159] Traceback (most recent call last):
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]   File "/opt/esmond/esmond/wsgi.py", line 31, in application
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159] return _application(environ, start_response)
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 255, in __call__
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159] response = self.get_response(request)
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 178, in get_response
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159] response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 220, in handle_uncaught_exception
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]     if resolver.urlconf_module is None:
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 342, in urlconf_module
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159] self._urlconf_module = import_module(self.urlconf_name)
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/utils/importlib.py", line 35, in import_module
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159] __import__(name)
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]   File "/opt/esmond/esmond/urls.py", line 8, in 
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]     from esmond.api.perfsonar.api import perfsonar_api
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159]   File "/opt/esmond/esmond/api/perfsonar/api.py", line 47, in 
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159] 'histogram': db.raw_cf,
[Fri Dec 19 14:23:14 2014] [error] [client 129.79.53.159] AttributeError: 'NoneType' object has no attribute 'raw_cf'
[Fri Dec 19 14:23:24 2014] [error]
[Fri Dec 19 14:23:24 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:24 2014] [error] Traceback (most recent call last):
[Fri Dec 19 14:23:24 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 77, in emit
[Fri Dec 19 14:23:24 2014] [error]     self.doRollover()
[Fri Dec 19 14:23:24 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 140, in doRollover
[Fri Dec 19 14:23:24 2014] [error] os.rename(self.baseFilename, dfn)
[Fri Dec 19 14:23:24 2014] [error] OSError: [Errno 13] Permission denied
[Fri Dec 19 14:23:24 2014] [error] Logged from file base.py, line 212
[Fri Dec 19 14:23:24 2014] [error]
[Fri Dec 19 14:23:24 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159] mod_wsgi (pid=15711): Exception occurred processing WSGI script '/opt/esmond/esmond/wsgi.py'.
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159] Traceback (most recent call last):
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]   File "/opt/esmond/esmond/wsgi.py", line 31, in application
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159] return _application(environ, start_response)
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 255, in __call__
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159] response = self.get_response(request)
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 178, in get_response
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159] response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 220, in handle_uncaught_exception
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]     if resolver.urlconf_module is None:
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 342, in urlconf_module
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159] self._urlconf_module = import_module(self.urlconf_name)
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/utils/importlib.py", line 35, in import_module
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159] __import__(name)
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]   File "/opt/esmond/esmond/urls.py", line 8, in 
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]     from esmond.api.perfsonar.api import perfsonar_api
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159]   File "/opt/esmond/esmond/api/perfsonar/api.py", line 47, in 
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159] 'histogram': db.raw_cf,
[Fri Dec 19 14:23:24 2014] [error] [client 129.79.53.159] AttributeError: 'NoneType' object has no attribute 'raw_cf'
[Fri Dec 19 14:23:28 2014] [error] [client 192.41.231.110] (13)Permission denied: exec of '/opt/perfsonar_ps/toolkit/web/root/index.cgi' failed
[Fri Dec 19 14:23:28 2014] [error] [client 192.41.231.110] Premature end of script headers: index.cgi
[Fri Dec 19 14:23:31 2014] [error]
[Fri Dec 19 14:23:31 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:31 2014] [error] Traceback (most recent call last):
[Fri Dec 19 14:23:31 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 77, in emit
[Fri Dec 19 14:23:31 2014] [error]     self.doRollover()
[Fri Dec 19 14:23:31 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 140, in doRollover
[Fri Dec 19 14:23:31 2014] [error] os.rename(self.baseFilename, dfn)
[Fri Dec 19 14:23:31 2014] [error] OSError: [Errno 13] Permission denied
[Fri Dec 19 14:23:31 2014] [error] Logged from file base.py, line 212
[Fri Dec 19 14:23:31 2014] [error]
[Fri Dec 19 14:23:31 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161] mod_wsgi (pid=15712): Exception occurred processing WSGI script '/opt/esmond/esmond/wsgi.py'.
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161] Traceback (most recent call last):
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]   File "/opt/esmond/esmond/wsgi.py", line 31, in application
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161] return _application(environ, start_response)
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 255, in __call__
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161] response = self.get_response(request)
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 178, in get_response
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161] response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 220, in handle_uncaught_exception
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]     if resolver.urlconf_module is None:
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 342, in urlconf_module
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161] self._urlconf_module = import_module(self.urlconf_name)
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]   File "/opt/esmond/lib/python2.7/site-packages/django/utils/importlib.py", line 35, in import_module
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161] __import__(name)
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]   File "/opt/esmond/esmond/urls.py", line 8, in 
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]     from esmond.api.perfsonar.api import perfsonar_api
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161]   File "/opt/esmond/esmond/api/perfsonar/api.py", line 47, in 
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161] 'histogram': db.raw_cf,
[Fri Dec 19 14:23:31 2014] [error] [client 129.79.53.161] AttributeError: 'NoneType' object has no attribute 'raw_cf'
[Fri Dec 19 14:23:38 2014] [error] [client 188.184.161.86] (13)Permission denied: exec of '/opt/perfsonar_ps/toolkit/web/root/index.cgi' failed
[Fri Dec 19 14:23:38 2014] [error] [client 188.184.161.86] Premature end of script headers: index.cgi
[Fri Dec 19 14:23:39 2014] [error] [client 192.170.227.156] (13)Permission denied: exec of '/opt/perfsonar_ps/toolkit/web/root/gui/reverse_traceroute.cgi' failed
[Fri Dec 19 14:23:39 2014] [error] [client 192.170.227.156] Premature end of script headers: reverse_traceroute.cgi
[Fri Dec 19 14:23:54 2014] [error]
[Fri Dec 19 14:23:54 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:54 2014] [error] Traceback (most recent call last):
[Fri Dec 19 14:23:54 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 77, in emit
[Fri Dec 19 14:23:54 2014] [error]     self.doRollover()
[Fri Dec 19 14:23:54 2014] [error]   File "/opt/rh/python27/root/usr/lib64/python2.7/logging/handlers.py", line 140, in doRollover
[Fri Dec 19 14:23:54 2014] [error] os.rename(self.baseFilename, dfn)
[Fri Dec 19 14:23:54 2014] [error] OSError: [Errno 13] Permission denied
[Fri Dec 19 14:23:54 2014] [error] Logged from file base.py, line 212
[Fri Dec 19 14:23:54 2014] [error]
[Fri Dec 19 14:23:54 2014] [error] Unable to connect - presuming stand-alone testing mode...
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159] mod_wsgi (pid=15711): Exception occurred processing WSGI script '/opt/esmond/esmond/wsgi.py'.
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159] Traceback (most recent call last):
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]   File "/opt/esmond/esmond/wsgi.py", line 31, in application
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159] return _application(environ, start_response)
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 255, in __call__
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159] response = self.get_response(request)
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 178, in get_response
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159] response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/handlers/base.py", line 220, in handle_uncaught_exception
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]     if resolver.urlconf_module is None:
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/core/urlresolvers.py", line 342, in urlconf_module
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159] self._urlconf_module = import_module(self.urlconf_name)
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]   File "/opt/esmond/lib/python2.7/site-packages/django/utils/importlib.py", line 35, in import_module
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159] __import__(name)
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]   File "/opt/esmond/esmond/urls.py", line 8, in 
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]     from esmond.api.perfsonar.api import perfsonar_api
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159]   File "/opt/esmond/esmond/api/perfsonar/api.py", line 47, in 
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159] 'histogram': db.raw_cf,
[Fri Dec 19 14:23:54 2014] [error] [client 129.79.53.159] AttributeError: 'NoneType' object has no attribute 'raw_cf'
[Fri Dec 19 14:23:56 2014] [error] [client 129.79.53.159] (13)Permission denied: exec of '/opt/perfsonar_ps/toolkit/web/root/index.cgi' failed
[Fri Dec 19 14:23:56 2014] [error] [client 129.79.53.159] Premature end of script headers: index.cgi

 

 

Check the SELinux settings in /etc/sysconfig/selinux. If it is set to enforcing, consider setting to permissive and rebooting.

 

 

 



Q:I see that the latency/loss tests in 3.4 use bwping. Does this mean that we can run both throughput and latency/loss tests on the same interface without interference due to the way bwctl scheduling works?
A:

 

 

Ping tests can be scheduled on the same host as throughput tests, but owamp tests can’t currently (they use powstream instead of bwping).

 

 



Q:Where can I find documentation on interacting with perfSONAR via a custom client?
A:

 

 

There is a web page which outlines the REST API: http://software.es.net/esmond/perfsonar_client_rest.html, the perl API: http://software.es.net/esmond/perfsonar_client_perl.html and the python API: http://software.es.net/esmond/perfsonar_client_python.html.

Setting up the environment on a perfSONAR Toolkit host can be done with these steps:

 

cd /opt/esmond
source /opt/rh/python27/enable
/opt/rh/python27/root/usr/bin/virtualenv --prompt="(esmond)" .
. bin/activate
pip install esmond-client

 

 

 



Q:When setting up a dual homed host, how can one get individual tests to use one interface or another?
A:

 

 

In the GUI when configuring the regular tests, you configure the interface that the tests will be performed using.

The easiest way to force bwctld and owampd to listen only on their respective interfaces is to set the ‘srcnode’ variable in /etc/bwctld/bwctld.conf and /etc/owampd/owampd.conf, and then restart bwctld and owampd. Look at owampd.conf.default and bwctld.conf.default in those same directories for an example.

http://docs.perfsonar.net/manage_dual_xface.html

 

 



Q:We have some perfsonar servers having multiple NICs/IPs/Hostnames, the web interface appears to choose the “Primary Address” at its own will, is there a way that I can FIX the “Primary Address” manually, rather than changing from time to time?
A:

 

 

This is configured in:

 

/opt/perfsonar_ps/toolkit/web/root/gui/services/etc/web_admin.conf

 

 

Edit the file and add:

 

 

primary_interface       YOUR_PRIMARY_INTERFACE_NAME

 

 

For proper LS registration you’ll also want to make a similar change to:

 

 

/opt/perfsonar_ps/ls_registration_daemon/etc/ls_registration_daemon.conf

 

 

adding/editing:

 

 

external_address_if_name YOUR_PRIMARY_INTERFACE_NAME

 

 

 



Q:Which licenses do perfSONAR products use?
A:

 

 

BWCTL, OWAMP, and perfSONAR components are licensed under the Apache 2.0 license. NDT has a slightly different license: https://ndt.googlecode.com/svn/trunk/copyright.html based on an open source policy. Other products on the perfSONAR toolkit (NPAD, Web100 + other components such as Dojo) may have other restrictions worth checking into.

 

 



Q:When attempting to publish data to esmond, a "301 MOVED PERMANENTLY" is returned. What should I do?
A:

 

 

If using the examples from:

When querying the local esmond instance, the following could be returned:

 

HTTP/1.1 301 MOVED PERMANENTLY
Date: Sat, 18 Apr 2015 01:05:19 GMT
Server: Apache/2.2.15 (CentOS)
Location: http://localhost/esmond/perfsonar/archive/
Content-Length: 0
Connection: close
Content-Type: text/html; charset=utf-8

 

 

Check the script and make sure the URL is:

 

 

http://localhost/esmond/perfsonar/archive/

and not

 

 

http://localhost/esmond/perfsonar/archive

 

 

 



Q:Is it possible to use non-intel SFP+ optics in the Intel X520-SR2 NIC?
A:

 

 

The ixgbe driver has an option to allow alternative optics:

 

allow_unsupported_sfp=1

 

 

This can be tested using the fillow commands:

 

 

sudo modprobe -r ixgbe
sudo modprobe ixgbe allow_unsupported_sfp=0

 

 

 



Q:How can I clean up the data in my esmond instance?
A:

 

 

Information on this can be found here: http://docs.perfsonar.net/multi_ma_backups.html#multi-ma-backups-delete.

 

 



Q:How can I backup the data in my esmond instance?
A:

 

 

Information on this can be found here: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html.

 

 



Q:How do I backup a perfSONAR instance?
A:

 

 

To back up perfSONAR configurations and logs:

  • /opt/perfsonar_ps/ls_registration_daemon/etc/*
  • /opt/perfsonar_ps/mesh_config/etc/*
  • /opt/perfsonar_ps/regular_testing/etc/*
  • /opt/perfsonar_ps/toolkit/etc/*
  • /opt/perfsonar_ps/traceroute_ma/etc/*
  • /opt/perfsonar_ps/serviceTest/etc/*
  • /opt/perfsonar_ps/snmp_ma/etc/*
  • /etc/httpd/conf.d/apache-toolkit_web_gui.conf
  • /etc/owampd/*
  • /etc/bwctld/*
  • /etc/cassandra/*
  • /var/lib/cassandra/*
  • /var/lib/perfsonar/*
  • /var/log/perfsonar/*

To back up perfSONAR data:

 

 



Q:How can I tune a Dell server for a low latency environment?
A:

 

 

Dell offers a guide on tuning: http://i.dell.com/sites/content/shared-content/data-sheets/en/Documents/configuring-low-latency-environments-on-dell-poweredge-12g-servers.pdf.

 

 



Q:How do I disable global registration?
A:

 

 

The following commands will stop, and disable, this service:

/sbin/service ls_registration_daemon stop
chkconfig ls_registration_daemon off

 

 



Q:Regular testing is running on my host, but results are not being stored into the database?
A:

 

 

Check the /var/log/perfsonar/regular_testing.log file for messages similar to this:

2015/07/17 10:33:13 (21220) WARN> daemon:103 main::__ANON__ - Warned: Use of uninitialized value in pattern match (m//) at /opt/perfsonar_ps/regular_testing/bin/../lib/perfSONAR_PS/RegularTesting/MeasurementArchives/EsmondBase.pm line 334.
2015/07/17 10:33:13 (21220) ERROR> EsmondBase.pm:53 perfSONAR_PS::RegularTesting::MeasurementArchives::EsmondBase::__ANON__ - Error writing metadata (401) 401 UNAUTHORIZED
2015/07/17 10:33:13 (21220) ERROR> MeasurementArchiveChild.pm:209 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::handle_results - Problem storing results: Error writing metadata: 401 UNAUTHORIZED
2015/07/17 10:33:13 (21220) ERROR> MeasurementArchiveChild.pm:125 perfSONAR_PS::RegularTesting::Master::MeasurementArchiveChild::__ANON__ - Problem handling test results: Problem storing results: Error writing metadata: 401 UNAUTHORIZED at /opt/perfsonar_ps/regular_testing/bin/../lib/perfSONAR_PS/RegularTesting/Master/MeasurementArchiveChild.pm line 122.

The username and password in /opt/perfsonar_ps/regular_testing/etc/regular_testing.conf may be incorrect. On the measurement archive server, run the commands here to create/retrieve the username and password:

http://docs.perfsonar.net/multi_ma_install.html

 

 


 

 

Q:A maddash that was working suddenly stopped.
A:

 

 

 

The following errors may appear in maddash-server.log:

ERROR 2015-08-03 00:04:23,872 Error executing CheckSchedulerJob: The conglomerate (6,161) requested does not exist.
ERROR 2015-08-03 00:04:44,105 Error scheduling job The conglomerate (6,161) requested does not exist.
ERROR 2015-08-03 00:04:45,641 Error executing CheckSchedulerJob: The conglomerate (6,161) requested does not exist.
ERROR 2015-08-03 00:05:05,837 Error scheduling job The conglomerate (6,161) requested does not exist.
ERROR 2015-08-03 00:05:07,371 Error executing CheckSchedulerJob: The conglomerate (6,161) requested does not exist.
ERROR 2015-08-03 00:05:17,261 Error cleaning database The conglomerate (6,353) requested does not exist. 

And the following errors may appear in maddash-server.out:

java.sql.SQLException: The conglomerate (6,161) requested does not exist.
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedPreparedStatement40.<init>(Unknown Source)
at org.apache.derby.jdbc.Driver40.newEmbedPreparedStatement(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source)
at com.mchange.v2.c3p0.impl.NewProxyConnection.prepareStatement(NewProxyConnection.java:213)
at net.es.maddash.jobs.CheckSchedulerJob.execute(CheckSchedulerJob.java:75)
at net.es.maddash.jobs.CheckSchedulerJob.run(CheckSchedulerJob.java:44)
Caused by: java.sql.SQLException: The conglomerate (6,161) requested does not exist.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
... 16 more
Caused by: ERROR XSAI2: The conglomerate (6,161) requested does not exist.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.store.access.btree.index.B2IFactory.readConglomerate(Unknown Source)
at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(Unknown Source)
at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(Unknown Source)
at org.apache.derby.impl.store.access.RAMTransaction.openStoreCost(Unknown Source)
at org.apache.derby.impl.sql.compile.CompilerContextImpl.getStoreCostController(Unknown Source)
at org.apache.derby.impl.sql.compile.FromBaseTable.getStoreCostController(Unknown Source)
at org.apache.derby.impl.sql.compile.FromBaseTable.estimateCost(Unknown Source)
at org.apache.derby.impl.sql.compile.OptimizerImpl.estimateTotalCost(Unknown Source)
at org.apache.derby.impl.sql.compile.OptimizerImpl.costBasedCostOptimizable(Unknown Source)
at org.apache.derby.impl.sql.compile.OptimizerImpl.costOptimizable(Unknown Source)
at org.apache.derby.impl.sql.compile.FromBaseTable.optimizeIt(Unknown Source)
at org.apache.derby.impl.sql.compile.ProjectRestrictNode.optimizeIt(Unknown Source)
at org.apache.derby.impl.sql.compile.OptimizerImpl.costPermutation(Unknown Source)
at org.apache.derby.impl.sql.compile.SelectNode.optimize(Unknown Source)
at org.apache.derby.impl.sql.compile.DMLStatementNode.optimizeStatement(Unknown Source)
at org.apache.derby.impl.sql.compile.CursorNode.optimizeStatement(Unknown Source)
at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown Source) 

It is most likely the case that the database got corrupted somehow. The easiest way to fix it is to clear out the last few days of alert history (whether something was red, yellow, green, or orange) and restart maddash. You will NOT lose any of your measurements since they are stored in the measurement archives and the database in question is just the alert history.

Run the following:

rm -rf /var/lib/maddash/* 
/sbin/service maddash-server restart

When it comes back up and you load the page your boxes will likely all be gray. That will change as maddash has a chance to run and fill-in the status of each check.

 

 


 

 

Q:I have a measurement archive machine with esmond running, and there is a separate disk partition mounted on the machine where I want to store all the incoming measurement data from measurement points. What is the proper way to change the default directory location for storing the measurement archive data?
A:

 

 

 

Change the directory where cassandra and postgres store data. This is controlled through the respective tools configuration files and not esmond directly.

  • For cassandra, in /etc/cassandra/conf/cassandra.yaml change the commitlog_directory, data_file_directories and saved_caches_directory to the new locations you desire. Restart cassandra: /sbin/service cassandra restart
  • For postgres, in /var/lib/pgsql/data/postgresql.conf change the data_directory to the new location. Restart postgres: /etc/init.d postgresql restart. You will need to rebuild the esmond tables after this change is made. To do so, follow the instructions here: http://software.es.net/esmond/rpm_install.html#configuration

An alternative way to do what you want is to stop both postgres and cassandra, move /var/lib/cassandra and /var/lib/pgsql to the new partition and then create symlinks to the new location in /var/lib/cassandra and /var/lib/pgsql. That saves the need to rebuild postgres and preserves any existing data.

 

 


 

 

Q:I have an Intel X520 10Gbps NIC, and the load on my machine is very high. What should I do?
A:

The issue may have the following symptoms:

  • Very choppy CLI response - frequent pauses of 1 second. This happens over ssh, and on one box we even saw it on the console.
  • Load average goes to 1.00 even when the machine is completely idle.
  • We see a process (events/1) stuck in uninterruptable sleep (D).

The problem could occur under these circumstances:

  • Intel X520-DA2 network card (this is a dual 10G NIC)
  • perfsonar 3.4.2
  • bring up one of the 10G interfaces. Note that perfSONAR 3.4.2 did come with NetworkManager installed and enabled, and this will automatically bring up all the interfaces on the box periodically in order to look for link.
  • SFP is *not* present in the slot

You can turn off NetworkManager:

service NetworkManager stop
chkconfig NetworkManager off

After that, you can introduce the problem like this:

ifconfig eth1 1.1.1.1/30 up

And the problem goes away as soon as you reverse it:

ifconfig eth1 0.0.0.0 down

This appears related to the ixgbe driver:

# modinfo ixgbe
filename:
/lib/modules/2.6.32-573.3.1.el6.web100.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko
version: 4.0.1-k
license: GPL
description: Intel(R) 10 Gigabit PCI Express Network Driver

Removing NetworkManager or disabling it by default addresses this issue, and it will not be present on perfSONAR 3.5.

Additionally, with both driver versions 4.0.3 and 4.1.2 there were no issues with load or jerkiness on the box with SFP interfaces removed, inserted, with NIC interfaces configured or not configured. Recent README files with these new drivers do note this:

WARNING: The ixgbe driver compiles by default with the Large Receive Offload
(LRO) feature enabled. This option offers the lowest CPU utilization for
receives but is completely incompatible with *routing/ip forwarding* and
*bridging*. If enabling ip forwarding or bridging is a requirement, it is
necessary to disable LRO using compile time options as noted in the LRO
section later in this document. The result of not disabling LRO when combined
with ip forwarding or bridging can be low throughput or even a kernel panic.

It has not been confirmed that this is the problem, but it is provided as a note.


 

 

Q:How can I nuke all of the data in esmond, and start from scratch?
A:

Note, many of these commands should be done as root, and will destroy existing data and metadata collections. An additional step to load the esmond key into a regular testing file may be required (read output of tools to know for sure):

/etc/init.d/postgresql stop
rm -rf /var/lib/pgsql/data/*
/sbin/service postgresql initdb
/sbin/service postgresql start

wait about 20 seconds

sudo -u postgres psql -c "CREATE USER esmond WITH PASSWORD '7hc4m1'"
sudo -u postgres psql -c "CREATE DATABASE esmond"
sudo -u postgres psql -c "GRANT ALL ON DATABASE esmond to esmond"
cp -f /opt/perfsonar_ps/toolkit/etc/default_service_configs/pg_hba.conf /var/lib/pgsql/data/pg_hba.conf
/sbin/service postgresql restart
sed -i "s/sql_db_name = .*/sql_db_name = esmond/g" /opt/esmond/esmond.conf
sed -i "s/sql_db_user = .*/sql_db_user = esmond/g" /opt/esmond/esmond.conf
sed -i "s/sql_db_password = .*/sql_db_password = 7hc4m1/g" /opt/esmond/esmond.conf
/opt/perfsonar_ps/toolkit/scripts/system_environment/configure_esmond 

 

 

Q:I am running a small node, and seeing a lot of IO. What is going on?
A:

Some users report abnormalities on their small nodes related to I/O activity (e.g. iostat reports long w_await times - sometimes measured in multiple seconds). These coincide with intervals of testing, in particular related to OWAMP.

Deeper investigation found that there is too much I/O going on: syslogd and systemd-journald processing syslog messages from "owampd, bwctld, and powstream” in “/var/log/messages”, sometimes up to 30-40 syslog messages per second depending on the testing configuration of a host. Given that small nodes are based on flash memory, changes should be made to ensure a more balanced approach to logging:

  • Do journaling on memory by editing “/etc/systemd/journald.conf”.
    • Make option "Storage=volatile” instead of the default “Storage=auto”. Make sure to limit the maximum usage of memory for journaling. You can do this by fiddling with “RuntimeKeepFree” and “RuntimeMaxUse” options.
    • Don’t *restart* the journaling service (i.e., don’t do “systemctl restart systemd-journald”). Do an *OS reboot* instead.
  • Turn down owampd syslog level.
    • For perfSONAR before 3.5rc2, the only way is to comment out the “verbose” option in “/etc/owampd/owampd.conf” (so that it defaults to non-verbose. For perfSONAR 3.5rc2, I believe you can set the logging level in the configuration.
    • Restart owampd service.