NAME
polling
—
network device driver polling
support
SYNOPSIS
options IFPOLL_ENABLE
DESCRIPTION
Network device polling (polling
for brevity) refers to
a technique that lets the operating system periodically poll network devices,
instead of relying on the network devices to generate interrupts when they
need attention. This might seem inefficient and counterintuitive, but when
done properly, polling
gives more control to the
operating system on when and how to handle network devices, with a number of
advantages in terms of system responsiveness and performance.
In particular, polling
reduces the
overhead for context switches which is incurred when servicing interrupts,
and gives more control on the scheduling of a CPU between various tasks
(user processes, software interrupts, device handling) which ultimately
reduces the chances of livelock in the system.
Principles of Operation
In the normal, interrupt-based mode, network devices generate an interrupt whenever they need attention. This in turn causes a context switch and the execution of an interrupt handler which performs whatever processing is needed by the network device. The duration of the interrupt handler is potentially unbounded unless the network device driver has been programmed with real-time concerns in mind (which is generally not the case for DragonFly drivers). Furthermore, under heavy traffic load, the system might be persistently processing interrupts without being able to complete other work, either in the kernel or in userland.
Network device polling disables interrupts by polling network devices on clock interrupts. This way, the context switch overhead is removed. Furthermore, the operating system can control accurately how much work to spend in handling network device events, and thus prevent livelock by reserving some amount of CPU to other tasks.
Enabling polling
also changes the way
software network interrupts are scheduled, so there is never the risk of
livelock because packets are not processed to completion.
Enabling polling
It is turned on and off with help of
ifconfig(8) command. An interface does not have to be
“up” in order to turn on its polling
feature.
Loader Tunables
The following tunables can be set from loader.conf(5) (X is the CPU number):
- net.ifpoll.burst_max
- Default value for net.ifpoll.X.rx.burst_max sysctl nodes.
- net.ifpoll.each_burst
- Default value for net.ifpoll.X.rx.each_burst sysctl nodes.
- net.ifpoll.user_frac
- Default value for net.ifpoll.X.rx.user_frac sysctl nodes.
- net.ifpoll.pollhz
- Default value for net.ifpoll.X.pollhz sysctl nodes.
- net.ifpoll.status_frac
- Default value for net.ifpoll.0.status_frac sysctl node.
- net.ifpoll.tx_frac
- Default value for net.ifpoll.X.tx_frac sysctl nodes.
MIB Variables
The operation of polling
is controlled by
the following per CPU
sysctl(8) MIB variables (X is the CPU number):
- net.ifpoll.X.pollhz
- The polling frequency, whose range is 1 to 30000. Default is 6000.
- net.ifpoll.X.rx.user_frac
- When
polling
is enabled, and provided that there is some work to do, up to this percent of the CPU cycles is reserved to userland tasks, the remaining fraction being available forpolling
processing. Default is 50. - net.ifpoll.X.rx.burst
- Maximum number of packets grabbed from each network interface in each timer tick. This number is dynamically adjusted by the kernel, according to the programmed user_frac, burst_max, CPU speed, and system load.
- net.ifpoll.X.rx.each_burst
- The burst above is split into smaller chunks of this number of packets,
going round-robin among all interfaces registered for
polling
. This prevents the case that a large burst from a single interface can saturate the IP interrupt queue. Default is 50. - net.ifpoll.X.rx.burst_max
- Upper bound for net.ifpoll.X.rx.burst. Note that
when
polling
is enabled, each interface can receive at most (pollhz * burst_max) packets per second unless there are spare CPU cycles available forpolling
in the idle loop. This number should be tuned to match the expected load. Default is 250 which is adequate for 1000Mbit network and pollhz=6000. - net.ifpoll.X.rx.handlers
- How many active network devices have registered for packet reception
polling
. - net.ifpoll.X.tx_frac
- Controls how often (every tx_frac / pollhz seconds) the tranmission queue is checked for packet transmission done events. Increasing this value reduces the time spent on checking packets transmission done events thus reduces bus load, but it also increases chance that the transmission queue getting saturated. Default is 1.
- net.ifpoll.X.tx.handlers
- How many active network devices have registered for packet transmission
polling
. - net.ifpoll.0.status_frac
- Controls how often (every status_frac / pollhz seconds) the status registers of the network device are checked for error conditions and the like. Increasing this value reduces the load on the bus, but also delays the error detection. Default is 120.
- net.ifpoll.0.status.handlers
- How many active network devices have registered for status
polling
. - net.ifpoll.X.rx.short_ticks
- net.ifpoll.X.rx.lost_polls
- net.ifpoll.X.rx.pending_polls
- net.ifpoll.X.rx.residual_burst
- net.ifpoll.X.rx.phase
- net.ifpoll.X.rx.suspect
- net.ifpoll.X.rx.stalled
- net.ifpoll.X.tx.short_ticks
- net.ifpoll.X.tx.lost_polls
- net.ifpoll.X.tx.pending_polls
- net.ifpoll.X.tx.residual_burst
- net.ifpoll.X.tx.phase
- net.ifpoll.X.tx.suspect
- net.ifpoll.X.tx.stalled
- Debugging variables.
SUPPORTED DEVICES
Network device polling requires explicit modifications to the
network device drivers. As of this writing, the
bce(4), bge(4),
bnx(4), dc(4),
em(4), emx(4),
fwe(4), fxp(4),
igb(4), ix(4),
jme(4),
mxge(4), nfe(4),
nge(4), re(4),
rl(4), sis(4),
stge(4), vge(4),
vr(4), and xl(4) devices are supported, with others in the works. The
bce(4), bnx(4),
emx(4), igb(4),
ix(4), jme(4), and
mxge(4), support multiple reception queues based
polling
. The
bce(4), bnx(4), certain types of
emx(4), igb(4), and
ix(4) support multiple transmission queues based
polling
. The modifications are rather
straightforward, consisting in the extraction of the inner part of the
interrupt service routine and writing a callback function,
*_npoll
(),
which is invoked to probe the network device for events and process them.
(See the conditionally compiled sections of the network devices mentioned
above for more details.)
In order to reduce the latency in processing packets, it is advisable to set the sysctl(8) variable net.ifpoll.X.pollhz to at least 1000.
HISTORY
Network device polling first appeared in FreeBSD 4.6. It was rewritten in DragonFly 1.3.
AUTHORS
The network device polling code was rewritten by Matt Dillon based on the original code by Luigi Rizzo <luigi@iet.unipi.it>. Sepherosa Ziehau made the polling frequency settable at runtime, added per CPU polling and added multiple reception and tranmission queue polling support.