The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing

The Network Weather Service is designed to provide accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources. It's implemented on Unix and TCP/IP sockets. Current NWS consists four different component processes: Persistent State process, Name Server process, Sensor process and Forecaster process. NWS has NWS sensor which gather and store time-stamp performance measurement pairs for a specific resource, and CPU sensor provide measurements of CPU availability on timeshared Unix systems. The NWS exports a lightweight and portable C API that contacts the system via sockets so that applications can quickly retrieve short term performance forecasts.


Dynamically Forecasting Network Performance Using the Network Weather Service

To generate a forecast, the NWS operates several different models simultaneously and computes a forecast from each. The forecasting methods of NWS include mean-based methods, median-based methods and autoregressive methods. This paper shows the detail discussion of these methods and performance comparison.


Synchronizing Network Probes to avoid Measurement Intrusiveness with the Network Weather Service

The NWS uses TCP to send small, fixed-size probes measuring in the kilobytes with a frequency ranges from tens of seconds to several minutes,and then applies fast statistical models to probe histories to make performance forecasts. To avoide probe collisions as much as possible, the NWS uses a token-protocol. This paper shows the further discussion about this token-protocol.


The Architecture of Coral Reef: An Internet Traffic Monitoring Software Suite

The CoralReef architecture is organized primarily into two "stacks" of software components: "raw traffic stack", which deals directly with individual PDUs (packets or cells), read from a network traffic stream, and "flows stack" which deals with trac data that have been aggregated into "flow intervals". The core of the the raw traffic stack is the libcoral C library, which provides a consistent API for capturing trac from specialized ATM and POS capture cards from multiple vendors and pcap interfaces. One of the design goal is to provide the same API in C, C++, and Perl. The flows stack includes modules for storage and manipulation of tables of frequently collected aggregate data.


CoralReef software suite as a tool for system and network administrators

CoralReef, designed by CAIDA, is a convient set of passive data tools which can provide a consistent interface for a wide range of network analysis applications, including raw capture, flows analysis and real-time report generation. CoralReef runs on Unix, and supports device independent access to network data from OCXmon hardware, native OS network interfaces, and trace files; programming APIs; a variety of bundled analysis applications; and greater flexibility in remote access and administration. The unique feature of CoralReef is that supports a large number of features at many layers, and provides APIs and hooks at every layer, making it easier for anyone to apply it in unanticipated ways and develop new applications with minimum duplicated effort. Its core library libpcap is based on part of tcpdump.


Measuring the Immeasurable: Global Internet Measurement Infrastructure

This is a survey of existing public and mission-specfic Internet measurement infrastructures, including CoralReef, IEPM, I2(Abilene), NIMI, NLANR AMP, NLANR PMA, NPACI NWS,etc. The comparisons of various criteria are very comprehensive.


A System for Flexible Network Performance Measurement

National Internet Measurement Infrastructure (NIMI) is a software system for building network measurement infrastructures. A key NIMI design goal is scalability to potentially thousands of NIMI probes within a single infrastructure. Security is a key concern of the design. By Using measurement "module", NIMI probes can reflect various different measurement tools, include traceroute, TReno, mtrace, and zing. However, this measurement flexibility only limited on the integration of simple measurement tools as its probe. In contrast to our design of Resource monitor, we intend to provide common interface on integration of different measurement infrastructure or architecture. But we might still be interested in using NIMI's design idea: the stand-alone third-party measurement software modules can be "plugged in" to NIMI and are executed by "exec()" calls. And one thing worth mention is that NIMI infrastructure consists of Measurement Client(MC), Data Analysis Client(DAC) and NIMI probe, to which we designed Resource monitor in a very similar way. And NIMI is being used by two different projects: MINC and Web100.


Passively Monitoring Networks at Gigabit Speeds Using Commodity Hardware and Open Source Software

Nowadays people cannot use the same traffic monitoring tools that they use three years ago, such as MRTG polling traffic information out of network routers/switches interfaces via SNMP MIB-II variables, to monitor the gigabit networks. The author provides a new passive monitoring architecture special for gigabit networks. The solution is to use Juniper Switch as the hardware connect to PC which runs a new monitoring software-ntop. Ntop performs several measurements including:RMON-like measurements, NetFlow-like measurements and Security measurements. The Juniper is used for accounting "easy to count" traffic(e.g. traffic volume), mirroring and filtering traffic (e.g. discard uninteresting packets), whereas the PC is deployed for complex accounting. In order to overcome performance bottleneck, the author uses a two-layer architecture based on a preprocessor plus the existing monitoring application. The simplest preprocessor is a traffic sampler that discards packets according to a specific rate similar to what sFlow-based probes do. Juniper instrumentation is performed transparently by libpcap by means of JUNOScript.


The NLANR Network Analysis Infrastructure

The National Laboratory for Applied Network Research (NLANR) is developing a Network analysis Infrastructure (NAI) which is mainly focus on passive measurement, active measurement, SNMP and BGP data analysis, and presenting the results of analysis to the HPC community. Fortunately, we have a very similar goal for our Resource Monitor to NAI. Thus, our design might be benefited from NAI design.

The NLANR NAI infrastructure supports all three types of network monitoring: passive measurement, active measurement and control monitoring. It also includes a number of projects:

-Cichlid have been developed for large volume data representation and analysis.

-The NLANR passive measurement project utilizes OCXmon and FDDI monitors. Its software CoralReef include many data analysis program, such as ows analysis, packet size and frequency histograms, packet size run lengths, protocol and port break down, host and autonomous system matrices, type of service breakdown and ASCII dumping of packets.

-AMP is NLANR's active measurement project. The focus is on making site to site measurements of round trip time (RTT), packet loss, topology and throughput across the National Science Foundation (NSF) approved HPC networks.The monitors sends ICMP packet to each other to record time and the route to each other monitor is recorded using traceroute. AMP also uses an active mirror system to keep the monitoring data updated in real-time interval. In this project, Automatic Event Detection would be useful to allow an interested party to register to receive noti cation of interesting events on some or all paths.


Network performance visualization: insight through animation

The National Laboratory for Applied Network Research (NLANR) has a number of network measurement projects that produce gigabytes of data and thousands of Web pages of graphs and summaries each day. These include the OCxMON Passive Monitoring and Analysis project (PMA), the Active Monitoring Project (AMP), and the collection and analysis of SNMP and BGP data. In order to avoid overlooking important network event, NLANR has developed the Cichlid tool for visualizing and animating real-time data sets in three dimensions.

Cichlid is free C Code, using the OpenGL and GLUT graphics libraries, portable, and is currently being used on the FreeBSD,Linux, Microsoft Windows, and IRIX platforms.

Cichlid supports graphs that show matrices of network delays between pairs of sites, traffic distribution over address blocks, traffic volume by protocol and source, latencies from a single site to others, as well as network evolution over time.

The good thing that we can learn from Cichlid is that Cichlid provides an entire visualization infrastructure, including the rendering and data transfer functionalities. We have very similar design of function modules for Resource Monitor to Cichlid's design of its three primary functions--(1)Abstraction and Modeling,(2)Collection and Distribution,and (3)Visualization. How to abstract data using a good model is one of the keys in the design. Here, Cichlid uses DataSet to abstract and transfer data from server to client. However, it doesn't mention the abstraction policy using Dataset in this paper. Dataset abstraction model is a good design approach that we might want to try. All of the data transport in Cichlid is implemented on top of TCP, however, the data-independence restriction prohibits Cichlid from being applied on UDP.


Integrating Active Methods and Flow Meters - an implementation using NeTraMet

The goal of this paper is to integrate active methods and performance metrics into flow meter NeTraMet to obtain richer functionality. The approach is to use two monitoring systems which consist of passive as well as active components: traffic meters that identify the traffic flows and keep up counters, and monitoring packets sent periodically or according to a statistical distribution.


Developing a Web100 Based Network Diagnostic Tool (NDT)

Network Diagnostic Tool (NDT) is developed to perform this basic investigative function to Find and fix a network performance or configuration problem using a Web100 based approach. By combining the Web100 data with measured TCP throughput data NDT can identify:Duplex Mismatch, Hardware Fault, Faulty Bandwidth Estimation, Network Link Speed, Full or Half Duplex link and Congestion. A command line version of the web based NDT is developed based on NLANR Iperf tool.