[netsa-tools-discuss] Beginner's Question

Mark Thomas mthomas at cert.org
Tue Sep 16 13:35:37 EDT 2014


Dr. Leune-

Thank you for you interest in SiLK and for your question.

The remainder of my reply is in-line.

On Tue, 16 Sep 2014 10:51:06 -0400, Kees Leune wrote:

> Greetings and Salutations,
>
> We are currently in the process of evaluating if SiLK is a viable
> alternative for nfdump/nfsen. While we are not unhappy about the
> latter, but I believe that considering alternatives is good
> practice.
>
> We configured one of our smaller routers to send v9 flowdata to a
> rwflowpack instance, which, at first glance, seems to work
> well. Data is being written to directories and a simple 'cat' of
> those files filtered through rwcut does indeed produce results
> that look reasonable.

This is good to hear.

Since you are a beginner with SiLK, please let me engage in a bit of
pedantry that I hope will save you difficulty later: There is no
requirement that rwcut get its input from the standard input;
instead, you can name the files on the rwcut command line.  You can
replace:

  cat out-S0_20140916.14 |rwcut

with

  rwcut out-S0_20140916.14

This becomes particularly important when processing multiple files.
If you use ordinary UNIX cat, rwcut will treat the header of the
second file as flow record data.  Consider the file "file.rw" that
contains a single flow record:

  $ rwcut --fields=1-7 /tmp/file.rw /tmp/file.rw
              sIP|            dIP|sPort|dPort|pro|packets|bytes|
  192.168.111.201|   172.24.2.123|29617|   53| 17|      1|   56|
  192.168.111.201|   172.24.2.123|29617|   53| 17|      1|   56|
  
  $ cat /tmp/file.rw /tmp/file.rw | rwcut --fields=1-7
              sIP|            dIP|sPort|dPort|pro|packets|bytes|
  192.168.111.201|   172.24.2.123|29617|   53| 17|      1|   56|
          0.0.0.0|        0.0.0.0|13312| 1280|  0|      0|    0|
  192.168.111.201|   172.24.2.123|29617|   53| 17|      1|   56|

When you want to join SiLK flow files into a single stream, use
rwcat which will properly handle the files' headers:

  $ rwcat /tmp/file.rw /tmp/file.rw | rwcut --fields=1-7
              sIP|            dIP|sPort|dPort|pro|packets|bytes|
  192.168.111.201|   172.24.2.123|29617|   53| 17|      1|   56|
  192.168.111.201|   172.24.2.123|29617|   53| 17|      1|   56|


> If I understand the Analyst's Handbook correctly, the preferred SiLK
> workflow is to use rwfilter to narrow down results to a working set
> and then use another tool, like rwcut, to visualize the results.

rwfilter is a tool designed to make analysis easier, tough we
realize the number of switches on rwfilter can make it intimidating
to newcomers.  The primary purposes of rwfilter are to allow it to
maintain knowledge of the directory structure and to use it to
select only the flow records that are of interest for your analysis.

> Unfortunately, rwfilter doesn't seem to select any data; I suspect a
> simple configuration error somewhere, but I have not been able to find
> anything.
>
> For example:
>
> kees at delaware:~/opt/data/out/2014/09/16$ cat out-S0_20140916.14 |rwcut
> |head -2
>                                     sIP|
>       dIP|sPort|dPort|pro|   packets|     bytes|   flags|
>      sTime| duration|                  eTime|sen|
>                             10.73.2.243|
> 72.73.207.40| 3074| 3074| 17|         6|       468|
> |2014/09/16T14:24:42.250|    0.750|2014/09/16T14:24:43.000| S0|
>
> in total, this rwcut produces 253 lines of output. However, I would
> have expected that rwfilter with a sIP of 10.0/8 would produce similar
> results. Unfortunately, it does not:
>
> kees at delaware:~/opt/data/out/2014/09/16$ rwfilter
> --start-date=2014/09/16  --saddress 10.0.0.0/8 --print-statistics
> Files     0.  Read          0.  Pass          0. Fail           0.
>
> kees at delaware:~/opt/data/out/2014/09/16$ rwfilter
> --start-date=2014/09/16  --saddress 10.0.0.0/8 --pass stdout |rwcut
>                                     sIP|
>       dIP|sPort|dPort|pro|   packets|     bytes|   flags|
>      sTime| duration|                  eTime|sen|
>
>
> Any clues as to where to look for this discrepancy, or, even better,
> how to solve it, would be greatly appreciated! The current
> configuration is set up to do all processing on a single host.

The first common problem with rwfilter finding data is that rwfilter
is not looking in the directory where you are expect it to look.
Given your example commands above, it appears that your repository
is in

  $HOME/opt/data/out/2014/09/16

To tell rwfilter the location of your repository, you can do one of
the following

 1. include the following command line switch to rwfilter:

      --data-rootdir=$HOME/opt/data

 2. set the SILK_DATA_ROOTDIR environment variable:

      SILK_DATA_ROOTDIR=$HOME/opt/data

 3. reconfigure and recompile SiLK.  When you run ./configure,
    include the switch

      --enable-data-rootdir=$HOME/opt/data

The description of the "--data-rootdir" switch in the output of
"rwfilter --help" tells you where rwfilter is looking by default.
Given the voluminous output from rwfilter --help, use sed to print
only the section of interest:

  rwfilter --help | sed -n '/^--data/,/^--/p'

This shows me:

  --data-rootdir Req Arg. Root of directory tree containing packed data.
  	Currently '/data'. Def. $SILK_DATA_ROOTDIR or '/data'
  --site-config-file Req Arg. Location of the site configuration file.

saying that rwfilter is looking in "/data".

A second common problem with rwfilter finding data is that, by
default, rwfilter only looks at the data entering your network.
That is, rwfilter behaves as if you had specified "--type=in,inweb".

The sample commands you provide show data in the "out" subdirectory
of your repository.  If there is no data in the "in" and "inweb"
subdirectories, rwfilter may be looking in the correct location and
just not finding any data.  Try adding either --type=out or
--type=all to the rwfilter command line.

A third common problem, which is not occurring in your example, is
providing either no dates to rwfilter (which cause it to look at the
current date), or specifying dates for which rwfilter does not have
data.

A debugging aid for rwfilter is to use the --print-missing switch.
That switch causes rwfilter to print to the standard error the name
of every file that it attempts to open.

> Thank you for any assistance that could be offered.

I hope you find my response useful.

Thank you again for evaluating SiLK.

> --
> Dr. Kees Leune
> Adelphi University
> Information Security Officer

-Mark


More information about the netsa-tools-discuss mailing list