Normally, git-annex stores annexed files in the repository, locked down, which prevents the content of the file from being modified. That's a good thing, because it might be the only copy, you wouldn't want to lose it in a fumblefingered mistake.
# git annex add some_file
add some_file
# echo oops > some_file
bash: some_file: Permission denied
Sometimes though you want to modify a file. Maybe once, or maybe repeatedly. To support this, git-annex also supports unlocked files. They are stored in the git repository differently, and they appear as regular files in the working tree, instead of the symbolic links used for locked files.
adding unlocked files
Instead of using git annex add
, use git add
, and the file will be
stored in git-annex, but left unlocked.
Want git add
to add some file contents to the annex, but store the contents of
smaller files in git itself? Configure annex.largefiles to match the former.
See largefiles.
# cp ~/my_cool_big_file .
# git add my_cool_big_file
# git commit -m "added my_cool_big_file to the annex"
[master (root-commit) 92f2725] added my_cool_big_file to the annex
1 file changed, 1 insertion(+)
create mode 100644 my_cool_big_file
# git annex find
my_cool_big_file
You can make whatever modifications you want to unlocked files, and commit your changes.
# echo more stuff >> my_cool_big_file
# git mv my_cool_big_file my_cool_bigger_file
# git commit -a -m "some changes"
[master 196c0e2] some changes
2 files changed, 1 insertion(+), 1 deletion(-)
delete mode 100644 my_cool_big_file
create mode 100644 my_cool_bigger_file
Under the hood, this uses git's smudge filter interface, and git-annex converts between the content of the big file and a pointer file, which is what gets committed to git. All the regular git-annex commands (get, drop, etc) can be used on unlocked files too.
By default, git-annex commands will add files in locked mode,
unless used on a filesystem that does not support symlinks, when unlocked
mode is used. To make them always use unlocked mode, run:
git config annex.addunlocked true
mixing locked and unlocked files
A repository can contain both locked and unlocked files. You can switch
a file back and forth using the git annex lock
and git annex unlock
commands. This changes what's stored in git between a git-annex symlink
(locked) and a git-annex pointer file (unlocked). To add a file to
the repository in locked mode, use git annex add
; to add a file in
unlocked mode, use git add
.
If you want to mostly keep files locked, but be able to locally switch
to having them all unlocked, you can do so using git annex adjust
--unlock
. See git-annex-adjust for details. This is particularly
useful when using filesystems like FAT, and OS's like Windows that don't
support symlinks. Indeed, git-annex init
detects such filesystems and
automatically sets up a repository to use all unlocked files.
imperfections
Unlocked files mostly work very well, but there are a few imperfections which you should be aware of when using them.
git stash
,git cherry-pick
andgit reset --hard
don't update the working tree with the content of unlocked files. The files will contain pointers, the same as if the content was not in the repository. So after running these commands, you will need to manually rungit annex smudge --update
.When git-annex is running a command that gets or drops the content of an unlocked file, git's index will briefly be locked, which might prevent you from running a
git commit
at the same time.Conversely, if you have a git commit in progress, running git-annex may complain that the index is locked, though this will not prevent it from working.
When an operation such as a checkout or merge needs to update a large number of unlocked files, it can become slow. So can be
git add
of a large number of files (git annex add
is faster).
(The technical reasons behind these imperfections are explained in detail in git smudge clean interface suboptiomal.)
using less disk space
Unlocked files are handy, but they have one significant disadvantage compared with locked files: They use more disk space.
While only one copy of a locked file has to be stored, often
two copies of an unlocked file are stored on disk. One copy is in
the git work tree, where you can use and modify it,
and the other is stashed away in .git/annex/objects
(see internals).
The reason for that second copy is to preserve the old version of the file, when you modify the unlocked file in the work tree. Being able to access old versions of files is an important part of git after all!
That's a good safe default. But there are ways to use git-annex that make the second copy not be worth keeping:
- When you're using git-annex to sync the current version of files across devices, and don't care much about previous versions.
- When you have set up a backup repository, and use git-annex to copy your files to the backup.
In situations like these, you may want to avoid the overhead of the second local copy of unlocked files. There's a config setting for that.
Note that setting annex.thin only has any effect on systems that support hard links. It is supported on Windows, but not on FAT filesystems.
git config annex.thin true
After changing annex.thin, you'll want to fix up the work tree to match the new setting:
git annex fix
When a direct mode repository is upgraded, annex.thin is automatically set, because direct mode made the same single-copy tradeoff.
Setting annex.thin can save a lot of disk space, but it's a tradeoff between disk usage and safety.
Keeping files locked is safer and also avoids using unnecessary disk space, but trades off easy modification of files.
Pick the tradeoff that's right for you.
annex.thin
doesn't support FAT. What's the best option to save disk space when you are using FAT? I'm currently trying to put files that are more than 50% of a drive's size on that drive, with a v7 repository. Is that possible?On the doc it's said that
"Note that setting annex.thin only has any effect on systems that support hard links. It is supported on Windows, but not on FAT filesystems."
Having read that, I was thinking that I'd be able to use annex.thin with NTFS but it doesn't work. I'd specify clearly that NTFS would also not work with annex.thin
Thanks
I guess adding a hourly cronjob that drop all unused filed would be accaptable maybe?
Or is there a better solution?
My problem is following, I delete files from a directory over normal delete functionality. I expect this files than be really deleted, at least on that repos. so that the diskspace for it is free.
I thought direct mode or now v6 with addunlocked setting is the solution to that. But either with thinmode there is a hardlink still there or without there is a copy in the directory.
I would rather not have to use dropunused to get rid of that, it would be good if git annex sync or assist could just add this changes to the history. I dont care if that would be the last copy, that does not matter for me in that usecase.
I want:
Do I need therefor a special repos like web/directory/rsync or can I do that somehow with such a normal repos? as far as I understand even if I would use web with a directory as parameter it would not save the files normaly in that directory?
@wsha.code, if you opt to use annex.thin, then commit a file, and then edit the same file again and commit again, the older commit will be in git's history, but if you check it out, the old content of the file won't be available. This is very similar to what happens when not using annex.thin, but later running git-annex unused and dropping the "unused" intermediate version of the file.
Running
git annex sync --content
or justgit annex copy --to remote
will get the thin version of the file saved on a remote, and then editing it won't lose the content. But note that if you edited a file while it was being copied off to the remote, the previous version would still get lost.If these seem like troublesome behaviors, well that's why annex.thin is not enabled by default.
sync
ing to a remote that does not haveannex.thin
set?add
andcommit
a file multiple times in a repo withoutsync
ing to a remote, what does the commit history look like on a remote when you dosync
it? It just has several commits for which the file contents are not available?annex.thin
set, do you just have tosync
manually after each commit? I guess you might want to set up a git commit hook to do that in that case.Direct mode is not going away any time soon.
git add
adds the file to the annex in unlocked mode, andgit annex sync
commits any such adds the same as any other changes, so all you need isgit add --all; git annex sync --content
Whether a file is locked or unlocked is a property of the file, that gets committed to git, so when you commit some unlocked files, they'll be unlocked when they appear in other clones of the repository.
@ginquistador it may or may not have been the best decision, but this tip is not a good place to discuss it. A bug would be a good place.
I first have to say, I have been following and using git annex for ages (5+ years at least), and is my trusted source for all my data. However, for the first time in all these years, I'm seeing a decision that I do not agree with or understand.
Specifically, using
git add .
to add a file to git annex as the default pattern just seems a fundamentally wrong design to me (at least for my usage pattern). I want to be able to use git normally, and have git-annex only get involved when I explicitly request it to, and not for all files. AFAIK, git-lfs does do it right. I understand annex.largefiles: configuring mixed content repositories can be configured to get the behavior I want. However, the default behavior should add it to vanilla git, and any other desired behavior can be obtained by the user via annex attributes, or extra command line flags togit annex add
Knowing Joey, I assume there's a strong rationale as always, and would love to hear it, but I would still like to STRONGLY REQUEST changing the default behavior.
.git/annex/objects
(superthin
?) and files just get installed in-place.annex.thin without hardlinks is a tracking bug for annex.thin not working on FAT etc.
git annex sync
unlocks everything again.This sounds interesting. But OTOH I'm curious about upgrades from direct mode (which I assume will soon go away):
If currently I just use
annex add; annex sync --content
on a media repo, would that change togit add --all; git commit -m whatever; annex unlock *; annex sync --content
? That is, will v6 require the manual commit step?Also, when
annex get
orannex sync
retrieve files from another repo, will there be an option to have the files unlocked by default, as in v5 direct mode?(I'm kinda hoping for
annex init --thin
or something similar to the v5annex direct
, as manually setting config options is easy to forget.)