Sunday, June 11, 2017

Troubleshooting Packet Drops in SolarFlare Onload 10G PCI Card


If you see lots of packet drops in your onload accelerated application even after going the troubleshooting discussion we did over here, you still see drops and are you are nowhere, you can investigate your application design/OS context scheduling and how the application is reading consuming the data packets. The potential reasons for packet drops could be that,
a) another task interrupting the threads reading the traffic or
b) You have run out of packet buffers because the socket receive buffers aren’t being emptied because read()/recv() isn’t called often enough.

It would be a good idea to monitor how many context switches the application/threads are experience because if this suddenly increases when you have a problem it would indicate another thread is competing for processing time on that core.

You can check this using the following command:

# cat /proc/<pid>/status | grep ctxt_switches
voluntary_ctxt_switches:        58
nonvoluntary_ctxt_switches:     1
Replace “<pid>” with the process ID for your application and the “voluntary” count is the number of times the application blocked and another thread was allowed to run. The “nonvoluntary” count is the number of times the thread was stopped from running by the kernel so something else could run in preference.

You can check the individual threads for the process by monitoring the ‘status’ files in the underlying ‘task’ directory:

# grep ctxt_switches /proc/<pid>/task/*/status
/proc/<pid>/task/<task-n1>/status:voluntary_ctxt_switches:    630
/proc/<pid>/task/<task-n1>/status:nonvoluntary_ctxt_switches: 32
/proc/<pid>/task/<task-n2>/status:voluntary_ctxt_switches:    2
/proc/<pid>/task/<task-n2>/status:nonvoluntary_ctxt_switches: 0
/proc/<pid>/task/<task-n3>/status:voluntary_ctxt_switches:    1
/proc/<pid>/task/<task-n3>/status:nonvoluntary_ctxt_switches: 0

Troubleshooting Packet Drops in SolarFlare Onload 10G PCI Card

If you see lots of packet drops in your onload accelerated application even after going the troubleshooting discussion we did over here ,...