In order to implement a multiple-block allocator based on the existing block allocation bitmap, Mingming Cao first changed ext3_new_block() to accept a new argument specifying how many contiguous blocks the function should attempt to allocate, on a best efforts basis. The function now allocates the first block in the existing way, and then continues allocating up to the requested number of adjacent physical blocks at the same time if they are available.
The modified ext3_new_block() function was then used to implement ext3's get_blocks() method, the standardized filesystem interface to translate a file offset and a length to a set of on-disk blocks. It does this by starting at the first file offset and translating it into a logical block number, and then taking that logical block number and mapping it to a physical block number. If the logical block has already been mapped, then it will continue mapping the next logical block until the requisite number of physical blocks have been returned, or an unallocated block is found.
If some blocks need to be allocated, first ext3_get_blocks() will look ahead to see how many adjacent blocks are needed, and then passes this allocation request to ext3_new_blocks(), searches for the requested free blocks, marks them as used, and returns them to ext3_get_blocks(). Next, ext3_get_blocks() will update the inode's direct blocks, or a single indirect block to point at the allocated blocks.
Currently, this ext3_get_blocks() implementation does not allocate blocks across an indirect block boundary. There are two reasons for this. First, the JBD journaling requests the filesystem to reserve the maximum of blocks that will require journaling, when a new transaction handle is requested via ext3_journal_start(). If we were to allow a multiple block allocation request to span an indirect block boundary, it would be difficult to predict how many metadata blocks may get dirtied and thus require journaling. Secondly, it would be difficult to place any newly allocated indirect blocks so they are appropriately interleaved with the data blocks.
Currently, only the Direct I/O code path uses the get_blocks() interfaces; the mpage_writepages() function calls mpage_writepage() which in turn calls get_block(). Since only a few workloads (mainly databases) use Direct I/O, Suparna Bhattacharya has written a patch to change mpage_writepages() use get_blocks() instead. This change should be generically helpful for any filesystems which implement an efficient get_blocks() function.
Draft patches have already been posted to the ext2-devel mailing list. As of this writing, we are trying to integrate Mingming's ext3_get_blocks() patch, Suparna Bhattacharya's mpage_writepage() patch and Badari Pulavarty's generic delayed allocation patch (discussed in Section 4.2) in order to evaluate these three patches together using benchmarks.