Another Level of Indirection > From Filesystems to Filesystem Layers

17.3. From Filesystems to Filesystem Layers

For a concrete example of filesystem layering, consider the case where you mount on your computer a remote filesystem using the NFS (Network File System) protocol. Unfortunately, in your case, the user and group identifiers on the remote system don't match those used on your computer. However, by interposing a umapfs filesystem over the actual NFS implementation, we can specify through external files the correct user and group mappings. Figure 17-3 illustrates how some operating system kernel function calls first get routed through the bypass function of umpafsumap_bypass—before continuing their journey to the corresponding NFS client functions.

In contrast to the null_bypass function, the implementation of umap_bypass actually does some work before making a call to the underlying layer. The vop_generic_args structure passed as its argument contains a description of the actual arguments for each vnode operation:

	/*
	 * A generic structure.
	 * This can be used by bypass routines to identify generic arguments.
	 */
	struct vop_generic_args {
	       struct vnodeop_desc *a_desc;
	       /* other random data follows, presumably */
	};

	/*
	 * This structure describes the vnode operation taking place.
	 */
	struct vnodeop_desc {
	       char    *vdesc_name;            /* a readable name for debugging */
	       int      vdesc_flags;           /* VDESC_* flags */
	       vop_bypass_t    *vdesc_call;    /* Function to call */

	       /*
	        * These ops are used by bypass routines to map and locate arguments.
	        * Creds and procs are not needed in bypass routines, but sometimes
	        * they are useful to (for example) transport layers.
	        * Nameidata is useful because it has a cred in it.
	        */
	       int     *vdesc_vp_offsets;     /* list ended by VDESC_NO_OFFSET */
	       int      vdesc_vpp_offset      /* return vpp location */
	       int      vdesc_cred_offset;    /* cred location, if any */
	       int      vdesc_thread_offset   /* thread location, if any *
	       int      vdesc_componentname_offset; /* if any */ 
	};


					    

For instance, the vnodeop_desc structure for the arguments passed to the vop_read operation is the following:

	struct vnodeop_desc vop_read_desc = {
	        "vop_read",
	        0,
	        (vop_bypass_t *)VOP_READ_AP,
	        vop_read_vp_offsets,
	        VDESC_NO_OFFSET,
	        VOPARG_OFFSETOF(struct vop_read_args,a_cred),
	        VDESC_NO_OFFSET,
	        VDESC_NO_OFFSET,
	};

Importantly, apart from the name of the function (used for debugging purposes) and the underlying function to call (VOP_READ_AP), the structure contains in its vdesc_cred_offset field the location of the user credential data field (a_cred) within the read call's arguments. By using this field, umap_bypasscan map the credentials of any vnode operation with the following code:

	if (descp->vdesc_cred_offset != VDESC_NO_OFFSET) {
	        credpp = VOPARG_OFFSETTO(struct ucred**,
	            descp->vdesc_cred_offset, ap);
	        /* Save old values */
	        savecredp = (*credpp);
	        if (savecredp != NOCRED)
	               (*credpp) = crdup(savecredp);
	        credp = *credpp;
	        /* Map all ids in the credential structure. */
	        umap_mapids(vp1->v_mount, credp);
	}

What we have here is a case of data describing the format of other data: a redirection in terms of data abstraction. This metadata allows the credential mapping code to manipulate the arguments of arbitrary system calls.