MinorFs2 Design
From MinorFs Wiki
Where the old MinorFs was designed to exists next to the regular $HOME and $TEMP facilities, MinorFs2 is explicitly designed so that it can (in theory) fully replace these facilities. The following gives a high level idea of how the different MinorFs components work together. Please note that each of the above MinorFs2 processes runs as under a different exclusive uid.
Contents |
temp_fs
The temp_fs filesystem would idealy be mounted under /tmp. When a process accesses temp_fs, each process gets its own private /temp/ directory. It can put its own temporary files under /tmp/, create directories under /tmp/ etc, but it can not access files that other processes placed under /tmp/. In fact it can't even see any files other processes placed into /tmp. Each process effectively gets its own private /tmp. The temp_fs is a simple overlay file-system that simply maps /tmp/ to /mnt/minorfs/$TMPFSROOTCAP/$PID/. When temp_fs is accessed by a process it has not seen before it creates an empty directory for it. When a process with a uid that belongs to the die_notify subsystem accesses /tmp/, temp_fs will act differently than normal. Instead of presenting the process with a private directory, it presents a directory with all active $PID's it has knowledge of, each process id represented as an empty read only directory. When such a pseudo directory gets removed by die_notify, this acts as a notification that the process has died and its directory can be (recursively) deleted.
home_fs
The home_fs filesystem works pretty much like the temp_fs file-system. Its also a simple overlay-filesystem that would idealy be mounted under /home/. It than maps /home/$USERNAME/ to /mnt/minorfs/$PERSISTENCENODECAP/. When a directory listing of /home/ is requested, it returns a single sub directory with the same name as that belonging to the current process its uid. When home_fs is accessed by a process it has not seen before, it asks the id_service for the $PERSISTENCEID belonging to this process its $PID. This $PERSISTENCEID is than used with $HOMEFSROOTCAP to determine (and cache) the $PERSISTENCENODECAP for this $PID. When a process with a uid that belongs to the die_notify accesses /home, home_fs will act differently than normal. Instead of presenting the process with a private directory, it presents a directory with all active $PID's it has knowledge of as empty pseudo directories. When such a pseudo directory gets removed by die_notify this acts as a notification that the process has died and the cache for the $PERSISTENCEID of this $PID should be cleared.
die_notify
The die_notify monitoring process constantly looks at both its special /tmp/ view, its special /home/ view and at /proc. When it sees that a process id that exists under /tmp/ or /home/ does not exist under /proc/, it simply deletes all such directories under /tmp/ or /home/. This action acts as a signal to temp_fs/home_fs that the given process has died.
id_service
The id_service dbus service is a rather complex look-up service that converts $PID/$UID combinations into a unique $PERSISTENCEID. It does this using a combination of information found in a set of configuration files an information retrieved from the /proc/ pseudo file-system. The configuration allows persistence id's to be given for different granularity levels. A persistence-ID can convey both hierarchical and attenuation information that is to be used by home_fs appropriately. Multiple levels of granularity may exist on the same machine and even for the same user. A single process is however by definition always confined to a single granularity level. A typical persistence-id may look something like:
- rw+1001/usr.bin.python/xc-opt.foo.bar.py-d3fea8360adcebc233b8a8f3b4c21301443f30d5
- rw+1001/usr.bin.foo/mce-baaac3d0958f883f9efd29954423a4ccc9432891
- ro+1001
The top most is could belong to the python script /opt/foo/bar.py that is running under the python interpreter /usr/bin/python. The second persistence-id could belong to the program /usr/bin/foo when it has been started with a particular set of environment variables and commandline options. The last example may belong to a special user level admin tool like for example home_du.
cap_fs
Both tmp_fs and home_fs run on top of cap_fs that in turn runs on top of an underlying file-system. Access to anything in cap_fs starts out with an access node that is accessed using a sparse-capability (a relatively long unguessable random looking string). Both home_fs and tmp_fs have their own sparse-capability for accessing cap_fs. For each (read/write accessible) file or directory, cap_fs allows two distinct sparse-caps to be extracted using the extended attribute API:
- A read/write access-node sparse-capability.
- A read-only sparse capability.
This facility basically means that while the cap_fs sub-trees made accessible by home_fs are initially private, cap_fs allows the owner (or holder) of a cap_fs sub tree to decompose, attenuate and delegate sub trees to other processes (using communication channels like for example the dbus for passing capabilities between processes). A receiver of an access node capability can make delegation permanent by creating a symbolic link to it.
The owner of a read-only sparse capability for a directory may only retrieve read-only sparse capability (no read/write) from any of its sub nodes.
It is important to note that a delegated sub tree may itself again be decomposed, attenuated and delegated by the receiver. The cap_fs file-system thus allows for an extreme form of discretionary access control.
The design of cap_fs is such that cap_fs itself has no way to access the data it stores without the sparse capability used to access it. The code is written in such a way that knowledge of capabilities used to access the data does not survive the lifetime of file handles used by the accessing process (given that a crypto module is used).
The cap_fs file_system has support for crypto_modules. The initial release of MinorFs2 will however come with only a Null-Crypto module.
home_du
The home_du tool is a user level admin tool. While MinorFs will not allow you to use an arbitrary program to access the private data of an other arbitrary program, it comes with a small set of admin tools that help you keep track of and manage how your disk space is used. The tool home_du is such a tool. It works by the fact that the id_service gives it a special persistence-id that makes home_fs grant it a read only capability to the uid level home_fs sub node. This capability allows the home_du tool to provide the user a 'disk usage' view of all persistence-id's that actively use disk storage.
home_rm_all
Where home_du allows the user to find out what persistence-id is using up how much disk storage, the home_rm_all tool allows to delete a whole sub tree belonging to a parsistence-id. It works by the fact that the id_service gives it a special persistence-id that makes home_fs grant it a read/write capability to the uid level home_fs sub node. This capability allows the home_du tool to provide the user a tool to delete all data owned by a single persistence-id at once.
