Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001817 [1003.1(2016/18)/Issue7+TC2] System Interfaces Objection Omission 2024-02-25 05:34 2024-05-09 15:12
Reporter kre View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Resolved  
Name Robert Elz
Organization
User Reference
Section XSH 13 lseek
Page Number 1292-3
Line Number 43059, 43061-2 43070-2
Interp Status ---
Final Accepted Text Note: 0006777
Summary 0001817: lseek(2) - "size of a file" undefined
Description This came up on NetBSD mailing lists, when someone complained that

        lseek(fd, 0, SEEK_END)

where fd refers to a block device (like a disk, or a partition thereof)
did not return the size of the device.

lseek() says (lines 43070-1)

        Upon successful completion, the resulting offset, as measured in bytes
        from the beginning of the file, shall be returned.

It also says (line 43059):

        If whence is SEEK_END, the file offset shall be set to the size
        of the file plus offset.

The one caveat is (lines 43061-2)

        The behavior of lseek( ) on devices which are incapable of seeking
        is implementation-defined. The value of the file offset associated
        with such a device is undefined.

but that's not relevant here, as block devices are capable of seeking.

Hence, to determine what the above lseek() call is required to do,
we need to know what is meant by "the size of the file", so we can
add offset (here, 0) to it, to get the required return value.

But, as best I can tell, there's no definition anywhere of what is
meant by the "size" of a file. We'd think it is obvious, and in
many cases it is, but it really needs to be defined, as it isn't
always obvious.
 
I can find nothing at all in XBD 3 about file sizes, in XBD 3.164 ("File")
(page 59) several attributes of files are listed, but "size" is not one of
them (not that that is particularly meaningful, as the list given is clearly
not intended to be exhaustive).
 
The closest that I found (and it isn't very close) is in XBD 13 <sys/stat.h>
(pages 392-8) where the field st_size is defined (page 392, lines 13328-34)
 
        off_t st_size For regular files, the file size in bytes.
                        [symlinks, shared & typed mem, omitted here]
                        For other file types, the use of this field is
                        unspecified.
 
"other file types" includes block & char devices, fifos, ...
 
That tells us that for a "regular file" (another term I cannot find
defined anywhere, though it is used in many places) is apparently
required to have a "file size" which is to be returned here as a
value measuring bytes. But it doesn't say what that is. Most other
types are not required to do that (which doesn't necessarily mean that
they would not have a "file size", just that stat() is not required to
return it, or if it returns something in the st_size field, what that
means is not specified).

 
It might seem obvious what a file size is meant to be, but is it
really? Is it the amount of data that has been written to a file?
The lagest offset (plus 1 perhaps) of where any data was ever
written to the file? The maximum amount of data that can be writtem
to a file?

For "regular files" which most of us probably understand the meaning
of, the middle one of those three is generally accepted. So is that
what we're supposed to return as the file size (from lseek(SEEK_END))
as above, for a block device? That is, if I install a new disk, open
it, lseek() to some random position, and write a byte, is the size of
the disk defined to be that random position, plus 1 ? If so, that
wouldn't have satisfied the complainer, as what they really wanted
to discover was the capacity of the drive (how many bytes could be
written to it, without overwriting any, before it becomes full).

So perhaps that's what "file size" means - in which case what I have
seen implemented for the "file size" for a regular file in all
implementations I think, isn't correct.

Consider another case, what is the file size of /dev/null ?
It is (or should be) possible to seek on /dev/null (doesn't
achieve anything worthwhile, but there's no reason for it to
fail) so it could count as capable of seeking - which would
require lseek() using SEEK_END for whence, to determine the
size of /dev/null and return that (as modified by the offset).

This is filed against POSIX-2018 (Issue 7 TC2) as nothing relevant (and
important) has changed in Issue 8 that I can see (lseek() has gained SEEK_HOLE
and SEEK_DATA as additional "whence" options, and SEEK_HOLE also
refers to the size of the file - so that might also need some
additional work as part of fixing this).

Once Issue 8 is published, this report should be updated and moved there.
Desired Action I believe there is a need for a whole new chapter in XBD about
files, and their attributes, with definitions for what they mean
for the various different file types - along with a specification
(or at least, some kind of reasonable clue) what the various file
types represent.

That is, perhaps this issue should be filed as an omission against
XBD, rather than lseek() - but someone else can decide that and adjust
if appropriate.

As an interim measure, perhaps lseek()'s return value could be
defined only for regular files, rather than only for seekable files.
At least in the case of SEEK_END which is the one which requires
the file size to work.

If not that, then I really have no idea, but as specified at the minute
this is unimplementable in a portable way.
Tags tc1-2024
Attached Files

- Relationships

-  Notes
(0006745)
geoffclare (manager)
2024-04-11 11:12

One reason why some information about file size may seem to be missing is because the synonym "length" is sometimes used instead. For example, on the write() page there is:
On a regular file, if the position of the last byte written is greater than or equal to the length of the file, the length of the file shall be set to this position plus one.
(0006746)
kre (reporter)
2024-04-11 12:53

Re Note: 0006745

That's fine, but I cannot find a definition for what the "length" of a
file is either. When a definition for "size of a file" is added, a
definition for "length of a file" should probably be added as well, if
they are strictly identical, one can simply refer to the other.

Eg: what is the size of a terminal device, or a tape, or a disc ? A FIFO?
And for regular files, is the size of the file the number of bytes
written, the greatest offset (+1) of any byte written, ... Is it
required to be calculated in bytes, or characters, or blocks ??

As I said in the description, what we typically mean for this (at least
for regular files) is obvious (I hope) - but the standard cannot (or
should not) rely upon "we all know what we mean, you should too".

And this still begs the question, of what is the size of a disc, which
is where this question arose originally? Is it (like a file) the maximum
byte offset (+1) of any byte written to it? While I think most people
would agree (approximately, use of [f]truncate() can mangle things)
with this for regular files, I suspect that almost no-one measures the
size of a disc that way.

When considering this, also keep in mind non-writable disc like devices
(optical media containing non writable media for example) - is the size
of such a disc the maximum capacity it can hold (which is probably what
we'd expect for a regular magnetic of ssd type disc) or is it the amount
that was recorded upon it (which can never be altered) ? Optical media
are seekable devices, so lseek(optical_fd, SEEK_END, 0) is supposed to work,
but where should it go?

If some-one were to ask what is the size of the mouse I am using, would an
appropriate answer be (approx) 12x6x4 (cm) ?

Is the actual size of a disc file 3.5 inches? (Or perhaps 2.5 if it is a
small one.)

Without a definition, how can anyone say that any of those final (absurd)
examples is wrong?
(0006775)
eblake (manager)
2024-05-02 16:00

https://github.com/util-linux/util-linux/blob/master/lib/blkdev.c#L92 [^] is an example of an #ifdef chain for several ioctl attempts that tries to get block dev size across several systems, given the fact that neither st_size nor lseek(SEEK_END) are universally reliable for the purpose.
(0006777)
geoffclare (manager)
2024-05-09 15:10

On page 965 line 32823 section fstatat(), change:
The value of the st_size member shall be set to the length of the pathname contained in the symbolic link

to:
The value of the st_size member shall be set to the length of the contents of the symbolic link


On page 1292 line 43059 section lseek(), change:
the file offset shall be set to the size of the file plus offset.

to:
the file offset shall be set to the size of the file (as would be returned in st_size by the fstat() function) plus offset; except that for block special files, it is unspecified whether the offset is relative to the start of the file or to the corresponding device's capacity in bytes.


On page 1293 line 43104 section lseek(), add fstat() to SEE ALSO.

- Issue History
Date Modified Username Field Change
2024-02-25 05:34 kre New Issue
2024-02-25 05:34 kre Name => Robert Elz
2024-02-25 05:34 kre Section => XSH 13 lseek
2024-02-25 05:34 kre Page Number => 1292-3
2024-02-25 05:34 kre Line Number => 43059, 43061-2 43070-2
2024-04-11 11:12 geoffclare Note Added: 0006745
2024-04-11 12:53 kre Note Added: 0006746
2024-05-02 16:00 eblake Note Added: 0006775
2024-05-09 15:10 geoffclare Note Added: 0006777
2024-05-09 15:12 geoffclare Interp Status => ---
2024-05-09 15:12 geoffclare Final Accepted Text => Note: 0006777
2024-05-09 15:12 geoffclare Status New => Resolved
2024-05-09 15:12 geoffclare Resolution Open => Accepted As Marked
2024-05-09 15:12 geoffclare Tag Attached: tc1-2024


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker