next up previous
Next: Using the GUI Up: PlanetFlow2: Flow logging and Previous: PlanetFlow2: Flow logging and

Introduction

PlanetFlow is PlanetLab's flow monitoring system. Over 4TB of data is transmitted over PlanetLab every day. Sometimes, some of this data upsets users on the public Internet who assume it to constitute an attack, or the prelude to an attack. Reassuring such users and resolving their complaints is the main goal of PlanetFlow. The previous version of PlanetFlow was written from scratch with the aim of resolving the specific class of complaints received by the PlanetLab operations staff. These complaints would specify an offending flow that would need to be attributed to a slice, so that the researcher running the slice could clarify its purpose. The current version of PlanetFlow (PlanetFlow2) was developed with two goals: 1.) To be able to understand traffic even when the flow was specified vaguely, or not specified. Such ambiguous specifications involve traffic patterns such as port scans and IP scans, volume patterns such as 'bursty traffic' etc. 2.) For the logging, archiving and query system to scale to hundreds of nodes without significantly losing its efficiency.

PlanetFlow collects traffic logs in a slightly-modified version of the Netflow v5 format. The slight modification adds support for slices, associating each flow record with a 32-bit slice id. These records are periodically compressed and transferred to a central location, where they are converted into the Silk data format (http://silktools.sf.net). This format can be operated upon by a Netflow query system known as Silk, which is popular for its speed and extensive querying facilities. The silk data is retained for a period of 30 days, after which it is discarded. A web-based GUI wraps the querying tools included in silk and allows users to use a web-based form to search and analyze the data. High-level statistics about the data are also collected and displayed in the GUI, as a way of characterizing PlanetFlow data.

Figure [*] gives an overview of the architecture of PlanetFlow2.

Figure: High-level overview of PlanetFlow2
\begin{figure}\epsfig{file=pf2,width=\linewidth}
\end{figure}

PlanetFlow can be broken up into the following components:

On each node of PlanetLab, an iptables rule delivers the header of every incoming packet to a special Netlink socket. Netlink is a Linux-specific protocol for IPC that is used to transfer data between the user and the kernel. Fprobe, PlanetFlow's data collector listens on this socket and indexes every packet into a flow record. PlanetFlow benefits from the natural compression of Netflow that is the result of the combination of several packets into a single flow record. The resulting data is stored in hourly data files.

A special slice called Netflow accesses the flow logs and makes them available to Planetflow Central (PFC), the location that all PlanetFlow data is brought to. The Netflow slice accesses this data via Vsys (http://www.cs.princeton.edu/ sapanb/vsys). An rsync daemon runs within the slice and enables a data-amassing process running on PFC to synchronize with these flow logs.

The data-amassing process runs in the context of a component called pdelta, which runs on PFC. When pdelta retrieves new data, it converts it into Silk format, which is about 2 orders of magnitude more space efficient than the raw Netflow format. Furthermore, the data is compressed using gzip compression.

A set of tools that are available as part of the Silk project (http://silktools.sf.net) are then used to query and aggregate the Silk data. Since these queries are fairly complex to formulate, a web-based GUI wraps around them and simplifies the querying process.

In the remainder of this document, we describe the use of the web-based query interface and the commandline-based query interface. We also describe the installation of all of the components of PlanetFlow2 on a MyPLC installation, and the installation of the query tools on a desktop box. The latter can be used to manually fetch and query data. Finally, we give an overview of the design decisions made in building PlanetFlow and of its implementation.


next up previous
Next: Using the GUI Up: PlanetFlow2: Flow logging and Previous: PlanetFlow2: Flow logging and
2008-09-23