The cause of the multi-filesystem NFS export problem

November 4, 2009

There is a famous irritation with managing NFS filesystems which boils down to that NFS clients have to know about your filesystem boundaries. It goes like this: suppose that /home and /home/group1 are separate filesystems and you NFS export both of them. What you would like is that clients NFS mount /home and automatically get /home/group1 too, because this lets you transparently add /home/group2 next month. However, this doesn't work (although some systems will try hard to fake it if you tell them to).

(This issue is a lot more pertinent these days in light of things like ZFS, where filesystems are cheap objects.)

Although it superficially looks like the NFS re-export problem, the problem here isn't telling NFS filehandles for the different real filesystems apart. Provided that all of the filesystems can be NFS exported normally, your NFS server can just give out the same filehandles it would if the client had explicitly mounted the filesystems separately (the filehandle is opaque to the client, after all).

The real problem is what common NFS clients expect about the inode numbers; specifically, they expect the inode number to be unique in the client's view of the filesystem, and from the client's view it only mounted one filesystem. Meanwhile, on the server there are multiple filesystems and their inode numbers are almost certain to overlap. The result is explosions in some programs on the client under some circumstances, as the programs see duplicate inode numbers for files that are not actually hardlinks to each other.

(The client kernels generally don't care; the inode numbers that user programs see are unrelated to the NFS filehandles that the kernel uses.)

Technically this is a client side problem, but I doubt that any NFS client implementation actually gets it right. (And it is very hard to get right, since the client has to somehow make up unique yet ideally persistent inode numbers.)

(This is the kind of thing that I write down in part so that I can remember the logic the next time I wonder about it.)

Sidebar: the more subtle failures

Okay, that's not quite all that goes wrong if the server lets NFS clients transparently cross filesystem boundaries, because there are various operations that don't work across server filesystem boundaries despite looking like they should on the client. For example, if /home on the client is all one single NFS mount, a program is rationally entitled to believe that it can hardlink /home/fred/a to /home/group1/jim/b. In practice this is going to fail with an error because on the server that's a cross-filesystem hardlink.

Written on 04 November 2009.
« Are security bugs always code bugs?
Why the NFS client is at fault in the multi-filesystem NFS problem »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Nov 4 00:48:11 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.