NAME
BUFFERIO
, biodone
,
biowait
, getiobuf
,
putiobuf
, nestiobuf_setup
,
nestiobuf_done
—
block I/O buffer transfers
SYNOPSIS
#include
<sys/buf.h>
void
biodone
(struct
buf *bp);
int
biowait
(struct
buf *bp);
struct buf *
getiobuf
(struct
vnode *vp, bool
waitok);
void
putiobuf
(struct
buf *bp);
void
nestiobuf_setup
(struct
buf *mbp, struct buf
*bp, int offset,
size_t size);
void
nestiobuf_done
(struct
buf *mbp, int
donebytes, int
error);
DESCRIPTION
The BUFFERIO
subsystem manages block I/O
buffer transfers, described by the struct buf
structure, which serves multiple purposes between users in
BUFFERIO
, users in
buffercache(9), and users in block device drivers to execute
transfers to physical disks.
BLOCK DEVICE USERS
Users of BUFFERIO
wishing to submit a
buffer for block I/O transfer must obtain a struct
buf, e.g. via
getiobuf
(),
fill its parameters, and submit it to a block device with
bdev_strategy(9), usually via
VOP_STRATEGY(9).
The parameters to an I/O transfer described by bp are specified by the following struct buf fields:
- bp
->b_flags
- Flags specifying the type of transfer.
B_READ
- Transfer is read from device. If not set, transfer is write to device.
B_ASYNC
- Asynchronous I/O. Caller must not provide
bp
->b_iodone
and must not callbiowait
(bp).
B_WRITE
, which is zero. - bp
->b_data
- Pointer to kernel virtual address of source/target for transfer.
- bp
->b_bcount
- Nonnegative number of bytes requested for transfer.
- bp
->b_blkno
- Block number at which to do transfer.
- bp
->b_iodone
- I/O completion callback.
B_ASYNC
must not be set in bp->b_flags
.
Additionally, if the I/O transfer is a write associated with a
vnode(9) vp, then before the user submits it to
a block device, the user must increment
vp->v_numoutput
. The user
must not acquire vp's vnode lock between incrementing
vp->v_numoutput
and
submitting bp to a block device — doing so will
likely cause deadlock with the syncer.
Block I/O transfer completion may be notified by
the bp->b_iodone
callback,
by signalling
biowait
()
waiters, or not at all in the B_ASYNC
case.
- If the user sets the
bp
->b_iodone
callback to a non-NULL
function pointer, it will be called in soft interrupt context when the I/O transfer is complete. The user may not callbiowait
(bp) in this case. - If
B_ASYNC
is set, then the I/O transfer is asynchronous and the user will not be notified when it is completed. The user may not callbiowait
(bp) in this case. - Otherwise, if
bp
->b_iodone
isNULL
andB_ASYNC
is not specified, the user may wait for the I/O transfer to complete withbiowait
(bp).
Once an I/O transfer has completed, its struct
buf may be reused, but the user must first clear the
BO_DONE
flag of
bp->b_oflags
before reusing
it.
NESTED I/O TRANSFERS
Sometimes an I/O transfer from a single buffer in memory cannot go to a single location on a block device: it must be split up into smaller transfers for each segment of the memory buffer.
After initializing the
b_flags
, b_data
, and
b_bcount
parameters of an I/O transfer for the
buffer, called the
master buffer, the
user can issue smaller transfers for segments of the buffer using
nestiobuf_setup
().
When nested I/O transfers complete, in any order, they debit from the amount
of work left to be done in the master buffer. If any segments of the buffer
were skipped, the user can report this with
nestiobuf_done
() to debit the skipped part of the
work.
The master buffer's I/O transfer is completed
when all nested buffers' I/O transfers are completed, and if
nestiobuf_done
()
is called in the case of skipped segments.
For writes associated with a vnode
vp,
nestiobuf_setup
()
accounts for
vp->v_numoutput
, so the
caller is not allowed to acquire vp's vnode lock
before submitting the nested I/O transfer to a block device. However, the
caller is responsible for accounting the master buffer in
vp->v_numoutput
. This must
be done very carefully because after incrementing
vp->v_numoutput
, the caller
is not allowed to acquire vp's vnode lock before
either calling nestiobuf_done
() or submitting the
last nested I/O transfer to a block device.
For example:
struct buf *mbp, *bp; size_t skipped = 0; unsigned i; int error = 0; mbp = getiobuf(vp, true); mbp->b_data = data; mbp->b_resid = mbp->b_bcount = datalen; mbp->b_flags = B_WRITE; KASSERT(0 < nsegs); KASSERT(datalen == nsegs*segsz); for (i = 0; i < nsegs; i++) { struct vnode *devvp; daddr_t blkno; vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); error = VOP_BMAP(vp, i*segsz, &devvp, &blkno, NULL); VOP_UNLOCK(vp); if (error == 0 && blkno == -1) error = EIO; if (error) { /* Give up early, don't try to handle holes. */ skipped += datalen - i*segsz; break; } bp = getiobuf(vp, true); nestiobuf_setup(bp, mbp, i*segsz, segsz); bp->b_blkno = blkno; if (i == nsegs - 1) /* Last segment. */ break; VOP_STRATEGY(devvp, bp); } /* * Account v_numoutput for master write. * (Must not vn_lock before last VOP_STRATEGY!) */ mutex_enter(&vp->v_interlock); vp->v_numoutput++; mutex_exit(&vp->v_interlock); if (skipped) nestiobuf_done(mbp, skipped, error); else VOP_STRATEGY(devvp, bp);
BLOCK DEVICE DRIVERS
Block device drivers implement a ‘strategy’ method,
in the d_strategy
member of struct
bdevsw
(driver(9)), to queue a buffer for disk I/O. The inputs to the
strategy method are:
- bp
->b_flags
- Flags specifying the type of transfer.
B_READ
- Transfer is read from device. If not set, transfer is write to device.
- bp
->b_data
- Pointer to kernel virtual address of source/target for transfer.
- bp
->b_bcount
- Nonnegative number of bytes requested for transfer.
- bp
->b_blkno
- Block number at which to do transfer, relative to partition start.
If the strategy method uses bufq(9), it must additionally initialize the following fields before queueing bp with bufq_put(9):
- bp
->b_rawblkno
- Block number relative to volume start.
When the I/O transfer is complete, whether it succeeded or failed, the strategy method must:
FUNCTIONS
biodone
(bp)- Notify that the I/O transfer described by bp has
completed.
To be called by a block device driver. Caller must first set bp
->b_error
to an error code and bp->b_resid
to the number of bytes remaining to transfer. biowait
(bp)- Wait for the synchronous I/O transfer described by
bp to complete. Returns the value of
bp
->b_error
.To be called by a user requesting the I/O transfer.
May not be called if bp has a callback or is asynchronous — that is, if bp
->b_iodone
is set, or ifB_ASYNC
is set in bp->b_flags
. getiobuf
(vp, waitok)- Allocate a struct buf for an I/O transfer. If
vp is non-
NULL
, the transfer is associated with it. If waitok is false, returnsNULL
if none can be allocated immediately.The resulting struct buf pointer must eventually be passed to
putiobuf
() to release it. Do not use brelse(9).The buffer may not be used for an asynchronous I/O transfer, because there is no way to know when it is completed and may be safely passed to
putiobuf
(). Asynchronous I/O transfers are allowed only for buffers in the buffercache(9).May sleep if waitok is true.
putiobuf
(bp)- Free bp, which must have been allocated by
getiobuf
(). Either bp must never have been submitted to a block device, or the I/O transfer must have completed.
CODE REFERENCES
The BUFFERIO
subsystem is implemented in
sys/kern/vfs_bio.c.
SEE ALSO
BUGS
The BUFFERIO
abstraction provides no way
to cancel an I/O transfer once it has been submitted to a block device.
The BUFFERIO
abstraction provides no way
to do I/O transfers with non-kernel pages, e.g. directly to buffers in
userland without copying into the kernel first.
The struct buf type is all mixed up with the buffercache(9).
The BUFFERIO
abstraction is a totally
idiotic API design.
The v_numoutput
accounting required of
BUFFERIO
callers is asinine.