How big are my files?

Our server luv shows you how much storage space your project is using and also how much each individual project member is contributing. For technical reasons, we can only show the apparent size (see below) of a user’s files. For the entire project, however, we show the actually occupied disk space. We are not happy about this inconsistency but for now, we have to live with it.

Here we try to explain the difference between the two ways to measure the size of files.

du and –apparent-size

If you want to know how much space you are using on disk, you may have come across du. It works for individual files but also for entire folder structures and the files in them. An example would be

$ du -h core.1346782
128M core.1346782

The -h option tells du to show the result in human readable form. Many tools know that option. Okay, so this core file occupies 128M on disk. However, if I look at it with ls, I get

$ ls -lh core.1346782
-rw-------. 1 k202009 k20200 1.5G Oct 17 11:45 core.1346782

Actually, du can give me the same answer

$ du -h --apparent-size core.1346782
1.5G core.1346782

So what is the meaning of --apparent-size and how does it relate to the output of du without that option?

The apparent size is relevant to you when you read the file. If you open that core file in a program and start reading from it until you reach the end of the file, you will have read 1.5G of data. Much of it will be zeroes but still, this is the amount of data you will get.

However, we will only account 128M for that file because this is what it occupies on our disks. The reason for the difference is that the file contains large holes where your program will read zeroes but those holes are not stored on disk.

Core files are a dump of a program’s memory when it crashed. A program often occupies several regions in memory which are not contiguous in address space. The absolute location in address space is important when you want to examine a core file with a debugger, hence the holes.

By the way, you probably don’t want to keep core files unless you are actively debugging a program crash. Just delete the files or suppress them altogether with ulimit -c 0.

Blocks

In the previous case, --apparent-size is larger than what du without that option shows. But it can also be the other way around. Storage on our disks is organized in blocks and each block has a size of 4K. If your file size is not a multiple of 4K, then part of the last block won’t hold any data but du and our quota mechanism will still account for that entire block. This may not seem much but it can add up, especially if you have many small files.

Conclusion

Both numbers, the apparent size and the block size, are useful.

You want to keep an eye on the blocks your project occupies to avoid running into the quota limit. That’s why du, lfsquota.sh, and also luv show you the block size your project occupies on disk and what the limits for that size are.

Apparent size tells you how large your buffer needs to be if you read the entire file into memory. In the case of luv, the apparent size gives an indication for projects which user occupies a lot of project space and may have to start archiving data.