The paradigm of virtualized hosting environments is attractive in modern networks. However, virtualization can call into question the accuracy and stability of measurements, particularly those that are sensitive to environmental considerations on a host or operating system. perfSONAR was designed to "level the playing field" when it comes to network measurements, by removing host performance from the equation as much as possible. The use of virtualized environments can introduce unforeseen complications into the act of measurement, which may call into question the accuracy of results.
The figure below shows a block diagram of a typical virtualized architecture.
Note that any measurement tool (e.g. OWAMP, BWCTL, NDT, etc.) would need to pass through an additional two layers of control (Hypervisor and Virtual Hardware) before touching the physical network in this example. These additional layers impart several challenges:
- Time Keeping: Some virtualization environments implement clock management as a function of the hypervisor and virtual machine communication channel, rather than using a stabilizing daemon such as NTP. This could mean time skips forward, or backward, and is generally unpredictable for measurement use
- Data Path: Network packets are timed from when they leave the application on one end, till their arrival in the application on the other end. The additional layers add additional latency, queuing, and potentially packet ordering and errors, to the end calculation
- Resource Management: Virtual machines share the physical hardware with other resources, and may be removed ("swapped") from physical hardware to allow other virtual machines to run as needed. This exercise of swapping virtual resources could impart additional error measurement values that may only be seen if a long term view of data is established and matched against virtualized environment performance.
The perfSONAR Toolkit development team has analyzed several VM architectures (VM Ware, KVM, QEMU) and has found that the basic problems listed above exist to a certain degree in all implementations. It is possible that small improvements will be made in all implementations over time, and tuning of individual systems may produce gains for some use cases. The development team has found several use cases that can work in a virtualized environment, provided that the user has applied certain configurations to mitigate the known risks:
- Measurement Storage: A virtual machine makes a good centrailized home for data collected from multiple bare metal beacon hosts. Note that this VM should be provisioned with several cores and adaquate RAM and storage to function in this role.
- NDT/NPAD Measurement: Both of these tools rely on an instrumented kernel for measurement acquisition, and use inter-packet timings vs. using a wall clock provided by the hosts involved in measurement. Since there is no hard requirement for a stable clock, the tools can work in a virtualized environment with the caveat that end to end measurements will involve additional layers of software to traverse. The virtual host should be tuned to bind to specific hardware resources (CPU, Network Card) if possible to minimize the impact of hypervisor resource management activities. Measurements taken from these tools should be cross-referenced against non-virtualized hardware when possible to identify error.
- Throughput Measurement: Tools such as BWCTL do rely on a accurate and precise NTP configuration for measurement. This requirement is not stringent if the operator is willing to tolerate some measurement error. A stable clock that is shown to have accuracy to within ~100ms of a reference clock is sufficient to establish most throughput measurements, but drifting clocks may prevent BWCTL from negotiating test windows. Caveats to this use case revolve around hypervisor resource management; it is vital that the virtual machine that is performing throughput tests have direct control over a network card, and preferable will utilize the same CPU and regions of memory to minimize the introduction of errors to the measurement. Failure to do so will result in unpredictable results when compared against actual hardware tests.
- Passive Measurement: Passive collection tools such as SNMP, NAGIOS, Netflow, and sFlow work well in virtualized environments. Considerations for hypervisor resource management should be explored, but are not as vital as in the active testing case.
- Packet Loss/Jitter Measurement: A tool that uses active methods to measure packet loss and jitter, such as OWAMP, can function in a virtualized environment, provided care is taken to ensure that the hypervisor provides dedicated resources for this host as well (e.g. binding to a specific CPU and Network Card). These measurements are traditionally performed either through instrumentation to a kernel (in the TCP case), or by counting the results and variance in UDP packet streams, which does not require a stable reference clock. Factors such as congestion, particularly within the host itself, exhibit a regular pattern in practice, and can be identified even if the clock is drifting beyond true time.
We have found that virtualized environments do not work well for other forms of measurement:
- High Speed Throughput Measurement: Virtualize environments can function well at 1Gbps network speeds, but may struggle to higher (e.g. 10Gbps) capacities. Please consult the operations guide to your specific virtualization environment before attempting this type of measurement.
- One Way Latency Measurement: Clock stability is extremely important to latency based measurements, typically a clock should be under 5ms away from true time to be considered accurate enough to measure network latency. Virtualized environments can introduce additional latency and error into the timekeeping operation, which reduces the accuracy in the measurement as a whole. The perfSONAR-PS project does not recommend this use case for virtualized systems.
To summarize, it is possible to use virtualized environments provided that care is taken to tune them to provide a stable footing for the measurement tools. Bare metal servers are still recommended for mission critical applications.
Instructions are available for installing the perfSONAR Toolkit as a Virtual Machine: