This is a project developed as a part of the Google Summer of Code program.

Introduction

A Log Structured File System (LFS) writes all the file system data sequentially in a log-like structure. A log consists of a series of segments, where each segment contains both data and inode blocks. Traditional file systems like ext2 usually write inode blocks at a fixed place on the disk, causing overhead due to disk seeks. A log structured file system gathers a segment worth of data in memory and appends the segment at the end of the log. This dramatically improves the write performance while maintaining the same read performance. The sequential nature of the log also helps in crash recovery as less checkpointing information need to be stored. As the file system grows and files are deleted, holes are created in the file system. A cleaner is required to fill the holes and compact the file system allowing large extents of free blocks to be found. The novel aspect in this work is the addition of snapshotting capability to log-structured file systems. Currently, no Linux file system offers this capability.

The primary objective of this work is to create a log-structured file system for Linux that supports snapshots. A snapshot is a copy of the files taken at a particular time. This is very similar to backup of a file system at a particular time except that it is maintained within the same file system without wasting any space. We believe that LFS is the ideal system for maintaining snapshots, because its design renders naturally to maintain snapshots.

Motivation

Why do we need yet another file system for Linux? When LFS was originally proposed, the idea of append-to-end-of-log to improve write performance was novel and produced great results on various micro benchmarks. However, later studies have shown that in transaction processing environments LFS performs poorly due to the cleaner overhead. We believe that advances in disk and memory technologies will help log structured file systems. In the past decade, huge improvements are seen in the disk and memory sizes for a typical machine. Increase in memory size allows LFS to gather more segments in memory and with larger disk space, the cleaner need not be run as often.

Currently, no Linux file system supports snapshots. Snapshots are usually considered a special capability supported by network attached storage devices (NASD) developed by companies like NetApp. The cost of these NASDs is prohibitive for small businesses and we believe that we can develop an open source file system that supports snapshots. Since LFS lends itself naturally to support snapshots, we propose to implement an LFS for Linux.

Status

An experimental version of LFS can be download from the sourceforge website. The code can also be obtained from the CVS, and instructions on compiling and using LFS are available here.

In the current state, one can perform various normal file system operations like mkdir, rmdir, link, unlink .... A working cleaner and basic snapshotting framework is available as well. The code compiles cleanly on 2.6.11 kernel and may or may not compile on other 2.6 kernels. Contact me, if you are interested in testing it.

I maintain notes about LFS development on my blog

Disclaimer: The file system is still experimental and may eat up your disk/memory and/or lock up your machine. I am not responsible for any damage you might incur. That said, it probably would only cause damage to the LFS partition.

Mailing List

Subscribe to the mailing list, if you are interested in following LFS development. This is also the right place for feature requests, bug reports etc.

People

Documents

FAQ

Have you checked other implemenations? Why are you reinventing the wheel?

Yes. The project takes its inspiration and data structures from the NetBSD LFS implementation. There have been various attempts to implement a logfs for Linux.
  • LinLogFS: Originally developed for 2.2.x kernels as a modification to ext2's lower layers. A lot has changed since 2.2 (for example merging of buffer and page caches) and a new file system that directly manipulates the buffer cache is required. The original author lists various cool additions to LFS including snapshots and mentioned
    It's probably best to implement them from scratch (or starting with ext2 or so)
    rather than trying to port LinLogFS forward to Linux 2.6 and then add these
    ideas.
    
    The project originally did not include a cleaner (see below).
  • LinLogFS Cleaner: This is developed as a part of Master's thesis project by David Gatwood. The cleaner is pretty limited and I wanted a modularized cleaner to implement new cleaning algorithms. Also, the code is not available online, and my e-mails to David are unanswered.
  • The Swarm Scalable Storage System: This project uses logfs concepts to implement a storage solution for the cluster. A lot of interesting ideas are discussed in their paper, but where is the code?
  • Neil Brown submitted a paper to LCA 2003 discussing various aspects of developing a log structured file system. No code has been released yet. I contacted him in 2004 and he mentioned that he is working on a user-space prototype.
No current Linux file system supports snapshots and implementing a file system that inherently supports a file system will be a great addition to Linux.

What are these snapshots and Why do I need them?

Some people call them versions, but I would like to call them snapshots as they represent snapshots of a whole file system rather than a single file. Netapp's WAFL file system provides snapshots of file system over time. For example, if you have accidentally deleted your home directory, you can just go to .snapshot directory and you can see snapshots of the directory from various points of time. This is an invaluable feature as it provides backups within the file system without wasting unnecessary space.