How big are my files?#
Our server luv shows you how much storage space your project is using and also how much each individual project member is contributing. For technical reasons, we can only show the apparent size (see below) of a user’s files. For the entire project, however, we show the actually occupied disk space. We are not happy about this inconsistency but for now, we have to live with it.
Here we try to explain the difference between the two ways to measure the size of files.
du and –apparent-size#
If you want to know how much space you are using on disk, you may have
come across du
. It works for individual files but also for entire
folder structures and the files in them. An example would be
$ du -h core.1346782
128M core.1346782
The -h
option tells du
to show the result in human readable
form. Many tools know that option. Okay, so this core file occupies
128M on disk. However, if I look at it with ls
, I get
$ ls -lh core.1346782
-rw-------. 1 k202009 k20200 1.5G Oct 17 11:45 core.1346782
Actually, du
can give me the same answer
$ du -h --apparent-size core.1346782
1.5G core.1346782
So what is the meaning of --apparent-size
and how does it relate to
the output of du
without that option?
The apparent size is relevant to you when you read the file. If you open that core file in a program and start reading from it until you reach the end of the file, you will have read 1.5G of data. Much of it will be zeroes but still, this is the amount of data you will get.
However, we will only account 128M for that file because this is what it occupies on our disks. The reason for the difference is that the file contains large holes where your program will read zeroes but those holes are not stored on disk.
Core files are a dump of a program’s memory when it crashed. A program often occupies several regions in memory which are not contiguous in address space. The absolute location in address space is important when you want to examine a core file with a debugger, hence the holes.
By the way, you probably don’t want to keep core files unless you are
actively debugging a program crash. Just delete the files or suppress
them altogether with ulimit -c 0
.
Blocks#
In the previous case, --apparent-size
is larger than what du
without that option shows. But it can also be the other way
around. Storage on our disks is organized in blocks and each block has
a size of 4K. If your file size is not a multiple of 4K, then part of
the last block won’t hold any data but du
and our quota mechanism
will still account for that entire block. This may not seem much but
it can add up, especially if you have many small files.
Conclusion#
Both numbers, the apparent size and the block size, are useful.
You want to keep an eye on the blocks your project occupies to avoid
running into the quota limit. That’s why du
, lfsquota.sh
, and
also luv show you the block size your
project occupies on disk and what the limits for that size are.
Apparent size tells you how large your buffer needs to be if you read the entire file into memory. In the case of luv, the apparent size gives an indication for projects which user occupies a lot of project space and may have to start archiving data.