[netsa-tools-discuss] Segv in super_mediator 1.5.4

Ulrik Haugen ulrik.haugen at liu.se
Thu May 17 09:44:08 EDT 2018


Hello!

I'm trying to deploy super_mediator to distribute our ipfix streams to
multiple destinations.

Unfortunately it suffers frequent crashes with signal segv. I have
examined the core dump from one of them and it seems that the err
pointer passed to mdForwardFlow is not always populated on false
returns:

# gdb -q /usr/bin/super_mediator core.21218
Reading symbols from /usr/bin/super_mediator...Reading symbols from /usr/lib/debug/usr/bin/super_mediator.debug...done.
done.
[New LWP 21218]
[New LWP 21219]
[New LWP 21220]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/super_mediator -c /etc/super_mediator.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000041aef2 in mdCollectFBuf (err=0x7ffc414e9800, collector=0x11e7870, ctx=0x7ffc414e9810) at mediator_open.c:1114
1114                g_warning("Error: %s", (*err)->message);
(gdb) bt
#0  0x000000000041aef2 in mdCollectFBuf (err=0x7ffc414e9800, collector=0x11e7870, ctx=0x7ffc414e9810) at mediator_open.c:1114
#1  mdCollectorWait (ctx=ctx at entry=0x7ffc414e9810, err=err at entry=0x7ffc414e9800) at mediator_open.c:1184
#2  0x000000000040536f in main (argc=1, argv=0x7ffc414e9a88) at mediator.c:544
(gdb) p err
$1 = (GError **) 0x7ffc414e9800
(gdb) p *err
$2 = (GError *) 0x0
(gdb) p ctx
$3 = (mdContext_t *) 0x7ffc414e9810
(gdb) p *ctx
$4 = {cfg = 0x650ca0 <md_config>, stats = 0x11e9080, err = 0x0}
(gdb) p *(ctx->cfg)
$5 = {flowsrc = 0x11e7870, flowexit = 0x11e9040, maps = 0x0, log = 0x0, mdspread = 0x0, collector_name = 0x11dcf50 "C1", log_cond = {__data = {__lock = 0, __futex = 1, __total_seq = 1, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x650d00 <md_config+96>, __nwaiters = 2, __broadcast_seq = 0}, 
    __size = "\000\000\000\000\001\000\000\000\001", '\000' <repeats 24 times>, "\re\000\000\000\000\000\002\000\000\000\000\000\000", __align = 4294967296}, log_mutex = {__data = {__lock = 1, __count = 0, __owner = 21218, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = "\001\000\000\000\000\000\000\000\342R\000\000\002", '\000' <repeats 26 times>, __align = 1}, no_stats = 0, ipfixSpreadTrans = 0, lockmode = 0, dns_base64_encode = 0, dns_print_lastseen = 0, shared_filter = 0, gen_tombstone = 0, tombstone_configured_id = 0, 
  tombstone_unique_id = 28115, udp_template_timeout = 600, ctime = 1526483954688, current_domain = 524288, usec_sleep = 0, num_listeners = 0 '\000', collector_id = 1 '\001'}
(gdb) p *(ctx->stats)
$6 = {recvd_flows = 1269957, dns = 0, recvd_filtered = 0, recvd_stats = 0, nonstd_flows = 0, uniflows = 0, files = 0, restarts = 0}
(gdb) p ipfixFullFlow
$7 = {flowStartMilliseconds = 1526483922688, flowEndMilliseconds = 1526483922688, octetTotalCount = 40, reverseOctetTotalCount = 0, octetDeltaCount = 40, reverseOctetDeltaCount = 0, packetTotalCount = 1, reversePacketTotalCount = 0, packetDeltaCount = 1, reversePacketDeltaCount = 0, 
  sourceIPv6Address = '\000' <repeats 15 times>, destinationIPv6Address = '\000' <repeats 15 times>, sourceIPv4Address = 96209668, destinationIPv4Address = 2196545307, sourceTransportPort = 43134, destinationTransportPort = 1559, flowAttributes = 0, reverseFlowAttributes = 0, protocolIdentifier = 6 '\006', 
  flowEndReason = 1 '\001', silkAppLabel = 0, reverseFlowDeltaMilliseconds = 0, tcpSequenceNumber = 0, reverseTcpSequenceNumber = 0, initialTCPFlags = 0 '\000', unionTCPFlags = 0 '\000', reverseInitialTCPFlags = 0 '\000', reverseUnionTCPFlags = 0 '\000', vlanId = 10, reverseVlanId = 0, 
  ingressInterface = 523, egressInterface = 527, ipClassOfService = 0 '\000', reverseIpClassOfService = 0 '\000', mplsTopLabelStackSection = "\000\000", mplsLabelStackSection2 = "\000\000", mplsLabelStackSection3 = "\000\000", paddingOctets = 0 '\000', observationDomainId = 524288, 
  yafFlowKeyHash = 797897288, nDPIL7Protocol = 0, nDPIL7SubProtocol = 0, subTemplateMultiList = {firstEntry = 0x0, numElements = 0, semantic = 0 '\000'}}
(gdb) p tid
$8 = 512
(gdb) quit


The relevant parts of the configuration look like this:

# grep -v '^#' /etc/super_mediator.conf
COLLECTOR UDP
   PORT 9997
COLLECTOR END

EXPORTER UDP
   HOST "localhost"
   PORT 4739
EXPORTER END

EXPORTER UDP
   HOST "RE.DA.CT.ED"
   PORT 9997
EXPORTER END

LOGLEVEL DEBUG

LOG local3

PIDFILE "/srv/netsa/var/super_mediator.pid"


Also we are getting lots of log messages like this, i don't think they
are related to this but rather the nic or some network equipment along
the way not keeping up:

local3/warning super_mediator[30953]: IPFIX Message out of sequence (in domain 00080000, expected 5eaa66d2, got 5eaa66d6)

I will try to address this with hardware replacements so this is just
meant as a fyi.


Please let me know if there is any other information i can supply to
help you fix this!


Best regards
/Ulrik Haugen
Linköping university incident response team


More information about the netsa-tools-discuss mailing list