[netsa-tools-discuss] rwflowpack

Mark Thomas mthomas at cert.org
Fri Oct 17 12:25:27 EDT 2014


Hello John.

It is certainly good to hear that you have managed to find relief
for some of your initial concerns.

Additional comments below.

On Thu, 16 Oct 2014 15:28:40 +0000, John Green wrote:

> On Wed, 2014-10-15 at 13:36 -0400, Mark Thomas wrote:
>> If there is some way you can use either SNMP interfaces or VLAN
>> ids to categorize the flow records, you will see noticeable
>> improvement in rwflowpack performance.
>
> I can for some of my probes which are on the network edge.  This
> has helped quite a bit as this includes some of the busiest.

Your issue with having lots of CIDR blocks has prompted me to
determine what is involved to allow rwflowpack to use IPsets for
categorization.  Adding this support does not appear to be too
difficult.  A debate I am having with myself is whether this should
be exposed to the users or happen behind the scenes.

>> * For rwflowappend, use the --threads switch (added in SiLK 3.8.2).
>>   Without that switch, the default is a single thread.
>
> I've added that.

I hope this is helping.  (We ought to modify the default number of
threads to something other than one.)

>> * If your configuration has a collection process (that is, flowcap
>>   or rwflowpack) near the sensor that is sending files to a central
>>   repository...
>
> My initial approach was
>
> netflow -> box1 <- box2
> box1 running rwflowpack --incremental and rwsender --server
> box2 running rwreceiver --client, rwflowappend & pipeline & rwpollexec
>
> with defaults for polling intervals and flush timeout
>
> rwflowpack produced a large number of files which slowly backed up
> in the rwsender processing directory.
>
> I am now doing
> box1 running flowcap and rwsender --server
> box2 running rwreceiver --client, rwflowpack --incremental, rwflowappend
> & pipeline & rwpollexec
>
> and now there is very little queuing on box1, which makes me think
> it was number of files rather than volume which was the issue.  I
> will try tweaking the polling and timeout values further.

I am glad to hear this is working better for you.

The overhead of setting up the file transfer is expensive compared
to the transfer of the file's contents.  Supporting simultaneous
file transfer within rwsender/rwreceiver is another item on the very
long-term wish list.

Increasing the timeouts will produce fewer files, and that should
produce better throughput for you.

>> * For rwsender, try adjusting the --block-size.  There is a small
>>   amount of overhead for each block.
>
> I increased this to 65535 with little noticeable difference

I did not expect it to have much of an effect.  I mentioned the
block-size switch only to ensure I was not forgetting anything.

>> > I am trying to process around 500GB/day.
>> 
>> Is that 500GB of the raw traffic, or 500GB/day of NetFlow?
>
> 500GB of uncompressed netflow.

A colleague suggested I point you to the SiLK Provisioning
Spreadsheet (available from this FAQ entry
http://tools.netsa.cert.org/silk/faq.html#baud-to-rwsender ), which
is an Excel spreadsheet designed to help you determine the amount
SiLK flow data generated for each network links of various sizes.
Additional documentation is within the spreadsheet itself.

The spreadsheet is more useful when you know you have a 10Gbit link
and you want to determine the transfer and storage requirements.  In
your situation, you already know the amount of netflow data you
have.  However, you may still find it useful.

> Is there a reason why you don't use something like inotify rather
> than polling?  Portability?

Yes, portability and simplicity are the primary reasons.  The
developer who uses Linux has had adding inotify support to the
directory polling interface on his radar for a long time.

> Thanks
> John

Once again let me say that I am glad you are seeing improved
performance, and I appreciate you contacting us to ask for our
input.

Best of luck to you.  Please let us know if we can be of additional
assistance or if you have additional comments or suggestions.

-Mark


More information about the netsa-tools-discuss mailing list