Now I think the main problem is having the filesystem block (and do IO) in
inode reclaim. The problem is that this doesn't get accounted well and
penalizes a random allocator with a big latency spike caused by work
generated from elsewhere.
I think the best idea would be to avoid this. By design if possible, or
by deferring the hard work to an asynchronous context. If the latter,
then the fs would probably want to throttle creation of new work with
queue size of the deferred work, but let's not get into those details.
Anyway, the other obvious thing we looked at is the iprune_mutex which is
causing the cascading blocking. We could turn this into an rwsem to
improve concurrency. It is unreasonable to totally ban all potentially
slow or blocking operations in inode reclaim, so I think this is a cheap
way to get a small improvement.
This doesn't solve the whole problem of course. The process doing inode
reclaim will still take the latency hit, and concurrent processes may end
up contending on filesystem locks. So fs developers should keep these
problems in mind.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jan Kara <jack@ucw.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>