| ========================== |
| General Filesystem Caching |
| ========================== |
| |
| ======== |
| OVERVIEW |
| ======== |
| |
| This facility is a general purpose cache for network filesystems, though it |
| could be used for caching other things such as ISO9660 filesystems too. |
| |
| FS-Cache mediates between cache backends (such as CacheFS) and network |
| filesystems: |
| |
| +---------+ |
| | | +--------------+ |
| | NFS |--+ | | |
| | | | +-->| CacheFS | |
| +---------+ | +----------+ | | /dev/hda5 | |
| | | | | +--------------+ |
| +---------+ +-->| | | |
| | | | |--+ |
| | AFS |----->| FS-Cache | |
| | | | |--+ |
| +---------+ +-->| | | |
| | | | | +--------------+ |
| +---------+ | +----------+ | | | |
| | | | +-->| CacheFiles | |
| | ISOFS |--+ | /var/cache | |
| | | +--------------+ |
| +---------+ |
| |
| Or to look at it another way, FS-Cache is a module that provides a caching |
| facility to a network filesystem such that the cache is transparent to the |
| user: |
| |
| +---------+ |
| | | |
| | Server | |
| | | |
| +---------+ |
| | NETWORK |
| ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| | |
| | +----------+ |
| V | | |
| +---------+ | | |
| | | | | |
| | NFS |----->| FS-Cache | |
| | | | |--+ |
| +---------+ | | | +--------------+ +--------------+ |
| | | | | | | | | |
| V +----------+ +-->| CacheFiles |-->| Ext3 | |
| +---------+ | /var/cache | | /dev/sda6 | |
| | | +--------------+ +--------------+ |
| | VFS | ^ ^ |
| | | | | |
| +---------+ +--------------+ | |
| | KERNEL SPACE | | |
| ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~ |
| | USER SPACE | | |
| V | | |
| +---------+ +--------------+ |
| | | | | |
| | Process | | cachefilesd | |
| | | | | |
| +---------+ +--------------+ |
| |
| |
| FS-Cache does not follow the idea of completely loading every netfs file |
| opened in its entirety into a cache before permitting it to be accessed and |
| then serving the pages out of that cache rather than the netfs inode because: |
| |
| (1) It must be practical to operate without a cache. |
| |
| (2) The size of any accessible file must not be limited to the size of the |
| cache. |
| |
| (3) The combined size of all opened files (this includes mapped libraries) |
| must not be limited to the size of the cache. |
| |
| (4) The user should not be forced to download an entire file just to do a |
| one-off access of a small portion of it (such as might be done with the |
| "file" program). |
| |
| It instead serves the cache out in PAGE_SIZE chunks as and when requested by |
| the netfs('s) using it. |
| |
| |
| FS-Cache provides the following facilities: |
| |
| (1) More than one cache can be used at once. Caches can be selected |
| explicitly by use of tags. |
| |
| (2) Caches can be added / removed at any time. |
| |
| (3) The netfs is provided with an interface that allows either party to |
| withdraw caching facilities from a file (required for (2)). |
| |
| (4) The interface to the netfs returns as few errors as possible, preferring |
| rather to let the netfs remain oblivious. |
| |
| (5) Cookies are used to represent indices, files and other objects to the |
| netfs. The simplest cookie is just a NULL pointer - indicating nothing |
| cached there. |
| |
| (6) The netfs is allowed to propose - dynamically - any index hierarchy it |
| desires, though it must be aware that the index search function is |
| recursive, stack space is limited, and indices can only be children of |
| indices. |
| |
| (7) Data I/O is done direct to and from the netfs's pages. The netfs |
| indicates that page A is at index B of the data-file represented by cookie |
| C, and that it should be read or written. The cache backend may or may |
| not start I/O on that page, but if it does, a netfs callback will be |
| invoked to indicate completion. The I/O may be either synchronous or |
| asynchronous. |
| |
| (8) Cookies can be "retired" upon release. At this point FS-Cache will mark |
| them as obsolete and the index hierarchy rooted at that point will get |
| recycled. |
| |
| (9) The netfs provides a "match" function for index searches. In addition to |
| saying whether a match was made or not, this can also specify that an |
| entry should be updated or deleted. |
| |
| (10) As much as possible is done asynchronously. |
| |
| |
| FS-Cache maintains a virtual indexing tree in which all indices, files, objects |
| and pages are kept. Bits of this tree may actually reside in one or more |
| caches. |
| |
| FSDEF |
| | |
| +------------------------------------+ |
| | | |
| NFS AFS |
| | | |
| +--------------------------+ +-----------+ |
| | | | | |
| homedir mirror afs.org redhat.com |
| | | | |
| +------------+ +---------------+ +----------+ |
| | | | | | | |
| 00001 00002 00007 00125 vol00001 vol00002 |
| | | | | | |
| +---+---+ +-----+ +---+ +------+------+ +-----+----+ |
| | | | | | | | | | | | | | |
| PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak |
| | | |
| PG0 +-------+ |
| | | |
| 00001 00003 |
| | |
| +---+---+ |
| | | | |
| PG0 PG1 PG2 |
| |
| In the example above, you can see two netfs's being backed: NFS and AFS. These |
| have different index hierarchies: |
| |
| (*) The NFS primary index contains per-server indices. Each server index is |
| indexed by NFS file handles to get data file objects. Each data file |
| objects can have an array of pages, but may also have further child |
| objects, such as extended attributes and directory entries. Extended |
| attribute objects themselves have page-array contents. |
| |
| (*) The AFS primary index contains per-cell indices. Each cell index contains |
| per-logical-volume indices. Each of volume index contains up to three |
| indices for the read-write, read-only and backup mirrors of those volumes. |
| Each of these contains vnode data file objects, each of which contains an |
| array of pages. |
| |
| The very top index is the FS-Cache master index in which individual netfs's |
| have entries. |
| |
| Any index object may reside in more than one cache, provided it only has index |
| children. Any index with non-index object children will be assumed to only |
| reside in one cache. |
| |
| |
| The netfs API to FS-Cache can be found in: |
| |
| Documentation/filesystems/caching/netfs-api.txt |
| |
| The cache backend API to FS-Cache can be found in: |
| |
| Documentation/filesystems/caching/backend-api.txt |
| |
| A description of the internal representations and object state machine can be |
| found in: |
| |
| Documentation/filesystems/caching/object.txt |
| |
| |
| ======================= |
| STATISTICAL INFORMATION |
| ======================= |
| |
| If FS-Cache is compiled with the following options enabled: |
| |
| CONFIG_FSCACHE_STATS=y |
| CONFIG_FSCACHE_HISTOGRAM=y |
| |
| then it will gather certain statistics and display them through a number of |
| proc files. |
| |
| (*) /proc/fs/fscache/stats |
| |
| This shows counts of a number of events that can happen in FS-Cache: |
| |
| CLASS EVENT MEANING |
| ======= ======= ======================================================= |
| Cookies idx=N Number of index cookies allocated |
| dat=N Number of data storage cookies allocated |
| spc=N Number of special cookies allocated |
| Objects alc=N Number of objects allocated |
| nal=N Number of object allocation failures |
| avl=N Number of objects that reached the available state |
| ded=N Number of objects that reached the dead state |
| ChkAux non=N Number of objects that didn't have a coherency check |
| ok=N Number of objects that passed a coherency check |
| upd=N Number of objects that needed a coherency data update |
| obs=N Number of objects that were declared obsolete |
| Pages mrk=N Number of pages marked as being cached |
| unc=N Number of uncache page requests seen |
| Acquire n=N Number of acquire cookie requests seen |
| nul=N Number of acq reqs given a NULL parent |
| noc=N Number of acq reqs rejected due to no cache available |
| ok=N Number of acq reqs succeeded |
| nbf=N Number of acq reqs rejected due to error |
| oom=N Number of acq reqs failed on ENOMEM |
| Lookups n=N Number of lookup calls made on cache backends |
| neg=N Number of negative lookups made |
| pos=N Number of positive lookups made |
| crt=N Number of objects created by lookup |
| tmo=N Number of lookups timed out and requeued |
| Updates n=N Number of update cookie requests seen |
| nul=N Number of upd reqs given a NULL parent |
| run=N Number of upd reqs granted CPU time |
| Relinqs n=N Number of relinquish cookie requests seen |
| nul=N Number of rlq reqs given a NULL parent |
| wcr=N Number of rlq reqs waited on completion of creation |
| AttrChg n=N Number of attribute changed requests seen |
| ok=N Number of attr changed requests queued |
| nbf=N Number of attr changed rejected -ENOBUFS |
| oom=N Number of attr changed failed -ENOMEM |
| run=N Number of attr changed ops given CPU time |
| Allocs n=N Number of allocation requests seen |
| ok=N Number of successful alloc reqs |
| wt=N Number of alloc reqs that waited on lookup completion |
| nbf=N Number of alloc reqs rejected -ENOBUFS |
| int=N Number of alloc reqs aborted -ERESTARTSYS |
| ops=N Number of alloc reqs submitted |
| owt=N Number of alloc reqs waited for CPU time |
| abt=N Number of alloc reqs aborted due to object death |
| Retrvls n=N Number of retrieval (read) requests seen |
| ok=N Number of successful retr reqs |
| wt=N Number of retr reqs that waited on lookup completion |
| nod=N Number of retr reqs returned -ENODATA |
| nbf=N Number of retr reqs rejected -ENOBUFS |
| int=N Number of retr reqs aborted -ERESTARTSYS |
| oom=N Number of retr reqs failed -ENOMEM |
| ops=N Number of retr reqs submitted |
| owt=N Number of retr reqs waited for CPU time |
| abt=N Number of retr reqs aborted due to object death |
| Stores n=N Number of storage (write) requests seen |
| ok=N Number of successful store reqs |
| agn=N Number of store reqs on a page already pending storage |
| nbf=N Number of store reqs rejected -ENOBUFS |
| oom=N Number of store reqs failed -ENOMEM |
| ops=N Number of store reqs submitted |
| run=N Number of store reqs granted CPU time |
| pgs=N Number of pages given store req processing time |
| rxd=N Number of store reqs deleted from tracking tree |
| olm=N Number of store reqs over store limit |
| VmScan nos=N Number of release reqs against pages with no pending store |
| gon=N Number of release reqs against pages stored by time lock granted |
| bsy=N Number of release reqs ignored due to in-progress store |
| can=N Number of page stores cancelled due to release req |
| Ops pend=N Number of times async ops added to pending queues |
| run=N Number of times async ops given CPU time |
| enq=N Number of times async ops queued for processing |
| can=N Number of async ops cancelled |
| rej=N Number of async ops rejected due to object lookup/create failure |
| dfr=N Number of async ops queued for deferred release |
| rel=N Number of async ops released |
| gc=N Number of deferred-release async ops garbage collected |
| CacheOp alo=N Number of in-progress alloc_object() cache ops |
| luo=N Number of in-progress lookup_object() cache ops |
| luc=N Number of in-progress lookup_complete() cache ops |
| gro=N Number of in-progress grab_object() cache ops |
| upo=N Number of in-progress update_object() cache ops |
| dro=N Number of in-progress drop_object() cache ops |
| pto=N Number of in-progress put_object() cache ops |
| syn=N Number of in-progress sync_cache() cache ops |
| atc=N Number of in-progress attr_changed() cache ops |
| rap=N Number of in-progress read_or_alloc_page() cache ops |
| ras=N Number of in-progress read_or_alloc_pages() cache ops |
| alp=N Number of in-progress allocate_page() cache ops |
| als=N Number of in-progress allocate_pages() cache ops |
| wrp=N Number of in-progress write_page() cache ops |
| ucp=N Number of in-progress uncache_page() cache ops |
| dsp=N Number of in-progress dissociate_pages() cache ops |
| |
| |
| (*) /proc/fs/fscache/histogram |
| |
| cat /proc/fs/fscache/histogram |
| JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS |
| ===== ===== ========= ========= ========= ========= ========= |
| |
| This shows the breakdown of the number of times each amount of time |
| between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The |
| columns are as follows: |
| |
| COLUMN TIME MEASUREMENT |
| ======= ======================================================= |
| OBJ INST Length of time to instantiate an object |
| OP RUNS Length of time a call to process an operation took |
| OBJ RUNS Length of time a call to process an object event took |
| RETRV DLY Time between an requesting a read and lookup completing |
| RETRIEVLS Time between beginning and end of a retrieval |
| |
| Each row shows the number of events that took a particular range of times. |
| Each step is 1 jiffy in size. The JIFS column indicates the particular |
| jiffy range covered, and the SECS field the equivalent number of seconds. |
| |
| |
| =========== |
| OBJECT LIST |
| =========== |
| |
| If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a |
| list of all the objects currently allocated and allow them to be viewed |
| through: |
| |
| /proc/fs/fscache/objects |
| |
| This will look something like: |
| |
| [root@andromeda ~]# head /proc/fs/fscache/objects |
| OBJECT PARENT STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA OBJECT_KEY, AUX_DATA |
| ======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================ |
| 17e4b 2 ACTV 0 0 0 0 0 0 7b 4 0 8 | NFS.fh DT 0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a |
| 1693a 2 ACTV 0 0 0 0 0 0 7b 4 0 8 | NFS.fh DT 0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a |
| |
| where the first set of columns before the '|' describe the object: |
| |
| COLUMN DESCRIPTION |
| ======= =============================================================== |
| OBJECT Object debugging ID (appears as OBJ%x in some debug messages) |
| PARENT Debugging ID of parent object |
| STAT Object state |
| CHLDN Number of child objects of this object |
| OPS Number of outstanding operations on this object |
| OOP Number of outstanding child object management operations |
| IPR |
| EX Number of outstanding exclusive operations |
| READS Number of outstanding read operations |
| EM Object's event mask |
| EV Events raised on this object |
| F Object flags |
| S Object slow-work work item flags |
| |
| and the second set of columns describe the object's cookie, if present: |
| |
| COLUMN DESCRIPTION |
| =============== ======================================================= |
| NETFS_COOKIE_DEF Name of netfs cookie definition |
| TY Cookie type (IX - index, DT - data, hex - special) |
| FL Cookie flags |
| NETFS_DATA Netfs private data stored in the cookie |
| OBJECT_KEY Object key } 1 column, with separating comma |
| AUX_DATA Object aux data } presence may be configured |
| |
| The data shown may be filtered by attaching the a key to an appropriate keyring |
| before viewing the file. Something like: |
| |
| keyctl add user fscache:objlist <restrictions> @s |
| |
| where <restrictions> are a selection of the following letters: |
| |
| K Show hexdump of object key (don't show if not given) |
| A Show hexdump of object aux data (don't show if not given) |
| |
| and the following paired letters: |
| |
| C Show objects that have a cookie |
| c Show objects that don't have a cookie |
| B Show objects that are busy |
| b Show objects that aren't busy |
| W Show objects that have pending writes |
| w Show objects that don't have pending writes |
| R Show objects that have outstanding reads |
| r Show objects that don't have outstanding reads |
| S Show objects that have slow work queued |
| s Show objects that don't have slow work queued |
| |
| If neither side of a letter pair is given, then both are implied. For example: |
| |
| keyctl add user fscache:objlist KB @s |
| |
| shows objects that are busy, and lists their object keys, but does not dump |
| their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is |
| not implied. |
| |
| By default all objects and all fields will be shown. |
| |
| |
| ========= |
| DEBUGGING |
| ========= |
| |
| If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime |
| debugging enabled by adjusting the value in: |
| |
| /sys/module/fscache/parameters/debug |
| |
| This is a bitmask of debugging streams to enable: |
| |
| BIT VALUE STREAM POINT |
| ======= ======= =============================== ======================= |
| 0 1 Cache management Function entry trace |
| 1 2 Function exit trace |
| 2 4 General |
| 3 8 Cookie management Function entry trace |
| 4 16 Function exit trace |
| 5 32 General |
| 6 64 Page handling Function entry trace |
| 7 128 Function exit trace |
| 8 256 General |
| 9 512 Operation management Function entry trace |
| 10 1024 Function exit trace |
| 11 2048 General |
| |
| The appropriate set of values should be OR'd together and the result written to |
| the control file. For example: |
| |
| echo $((1|8|64)) >/sys/module/fscache/parameters/debug |
| |
| will turn on all function entry debugging. |