Linux|系统管理|WEB开发

关注Linux,系统管理,WEB开发以及开源世界

Tag-based Filesystem

| Comments

这两天和同事在讨论传统分类和标签(tag,label)的问题,突然想起很早以前看到的一个视频,讲述的是下一代文件系统的特征,是真正桌面应用,其中提到了标签,也就是用户关注不再文件系统这个层次,而是逻辑层。
所以,我想到了基于标签的文件系统,“我总不是第一个想到的”,google一把后,果然发现有不少组织在做这个,也下载了一些代码,不过目前都处于原型阶段,生产应用恐怕还是有些难度。

其中目前稍微成熟的是tagsistant(http://www.tagsistant.net/)),它基于FUSE(http://fuse.sf.net,再一次说明FUSE真是好东西),是一个语义文件系统,其目标是采用标签(tag,label)的方式分类文件,而不是传统的目录结构。
当然,它是一个文件系统,因此它底层还是要基于目录这种结构来存储文件。不过tagsistant使用目录还有一个意思,就是把目录当成一个标签,但是多个目录之间可以存在与(AND)或者或(OR)的逻辑关系。
目录和目录本身具有包含和相等关系,也就将来能包括笛卡尔集的全部吧。
tagsistant给出了一个例子,说明了这点:

In our example, we’ll assume that tags/ is a directory which is
under tagsistant control, and that, inside this directory there
are two directories called “music” and “photo”, filled as in the
following scheme:

tags/
music/
song1.mp3
song2.wav
photo/
img_3233.jpg
img_3459.jpg

Assuming this hierarchy is a Tagsistant managed space, the files
song1.mp3 and song2.wav are tagged as music, and the files
img_3233.jpg and img_3459.jpg are tagged as photo. Let suppose that
image img_3233.jpg and file song2.wav came from a friend named
Jack and you want to record that information inside your Tagsistant
space. All you have to do is:

$ mkdir tags/jack/
$ cp music/song2.wav tags/jack/
$ cp photo/img_3233.jpg tags/jack/

Resulting hierarchy will be as follows:

tags/
music/
song1.mp3
song2.wav
photo/
img_3233.jpg
img_3459.jpg
jack/
img_3233.jpg
song2.wav

You may ask: that can be done with a normal filesystem. And you’ll be
right. But with a difference. Tagsistant will store song2.wav and
img_3233.jpg just one time, saving space.

But having such a hierarchy would be of little or no help in searching
your files if you are not able to perform a logical query. Here enters
Tagsistant. Using the path, a concept your are already used to play with,
you can perform queries inside a Tagsistant space. How? Look at following
example:

$ ls tags/jack/
AND/ OR/ song2.wav img_3233.jpg
$ ls tags/jack/AND/
photo/ music/
$ ls tags/jack/AND/music/
song2.wav
$

The path “jack/AND/music/” will result in all the files tagged as both
“jack” and “music”. The AND special directory can be used to create a
set of criteria which should match togheter to fullfill the query. As
opposite, the OR special directory allows more powerful queries, by
concatenating the results of more than one set of AND-chained criteria.

As an example:

$ ls tags/jack/AND/music/
song2.wav
$ ls tags/photos/
img_3233.jpg img_3459.jpg
$ ls tags/jack/AND/music/OR/photos/
song2.wav img_3233.jpg img_3459.jpg
$

我试用了一下,初始化数据到不是一个难题,目前困难的是没有一个优化的查询结构,现在还是依赖目录这种模式,只是加入了逻辑运算,类似下面这样:

"tag-based filesystem"

其中的“AND”是逻辑词,不是一个目录,develop和os是两个目录,当然也是标签。那个ppt文档就是同时打上了os和develop的标签。当然实际上只存储了一份。

要真的实用,还需要一段路要走,期待着!

Update: 另外尝试的几个是:

1) Tag-based file sytem in Python 清华大学章淼老师的一个开源项目 http://py-tag-fs.sourceforge.net/

2) tagfsai 一个AI界面,但是没有搞明白能用到什么程序,也许才刚刚开始吧
http://tagfs.googlecode.com 3

3) tag-perl 一个简单的perl程序,给每一个文件增加若干个标签,只有命令行方式
http://blueslugs.com/~sch/tag/tag-latest.pl

4) tag2find: windows平台的一个文件浏览器扩展,我还没有测试,有windows系统的,可以尝试,是商业版本 http://www.tag2find.com

Comments