Anonymous | Login | 2024-12-02 08:50 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||||||||
0001817 | [1003.1(2016/18)/Issue7+TC2] System Interfaces | Objection | Omission | 2024-02-25 05:34 | 2024-05-09 15:12 | ||||||||
Reporter | kre | View Status | public | ||||||||||
Assigned To | |||||||||||||
Priority | normal | Resolution | Accepted As Marked | ||||||||||
Status | Resolved | ||||||||||||
Name | Robert Elz | ||||||||||||
Organization | |||||||||||||
User Reference | |||||||||||||
Section | XSH 13 lseek | ||||||||||||
Page Number | 1292-3 | ||||||||||||
Line Number | 43059, 43061-2 43070-2 | ||||||||||||
Interp Status | --- | ||||||||||||
Final Accepted Text | Note: 0006777 | ||||||||||||
Summary | 0001817: lseek(2) - "size of a file" undefined | ||||||||||||
Description |
This came up on NetBSD mailing lists, when someone complained that lseek(fd, 0, SEEK_END) where fd refers to a block device (like a disk, or a partition thereof) did not return the size of the device. lseek() says (lines 43070-1) Upon successful completion, the resulting offset, as measured in bytes from the beginning of the file, shall be returned. It also says (line 43059): If whence is SEEK_END, the file offset shall be set to the size of the file plus offset. The one caveat is (lines 43061-2) The behavior of lseek( ) on devices which are incapable of seeking is implementation-defined. The value of the file offset associated with such a device is undefined. but that's not relevant here, as block devices are capable of seeking. Hence, to determine what the above lseek() call is required to do, we need to know what is meant by "the size of the file", so we can add offset (here, 0) to it, to get the required return value. But, as best I can tell, there's no definition anywhere of what is meant by the "size" of a file. We'd think it is obvious, and in many cases it is, but it really needs to be defined, as it isn't always obvious. I can find nothing at all in XBD 3 about file sizes, in XBD 3.164 ("File") (page 59) several attributes of files are listed, but "size" is not one of them (not that that is particularly meaningful, as the list given is clearly not intended to be exhaustive). The closest that I found (and it isn't very close) is in XBD 13 <sys/stat.h> (pages 392-8) where the field st_size is defined (page 392, lines 13328-34) off_t st_size For regular files, the file size in bytes. [symlinks, shared & typed mem, omitted here] For other file types, the use of this field is unspecified. "other file types" includes block & char devices, fifos, ... That tells us that for a "regular file" (another term I cannot find defined anywhere, though it is used in many places) is apparently required to have a "file size" which is to be returned here as a value measuring bytes. But it doesn't say what that is. Most other types are not required to do that (which doesn't necessarily mean that they would not have a "file size", just that stat() is not required to return it, or if it returns something in the st_size field, what that means is not specified). It might seem obvious what a file size is meant to be, but is it really? Is it the amount of data that has been written to a file? The lagest offset (plus 1 perhaps) of where any data was ever written to the file? The maximum amount of data that can be writtem to a file? For "regular files" which most of us probably understand the meaning of, the middle one of those three is generally accepted. So is that what we're supposed to return as the file size (from lseek(SEEK_END)) as above, for a block device? That is, if I install a new disk, open it, lseek() to some random position, and write a byte, is the size of the disk defined to be that random position, plus 1 ? If so, that wouldn't have satisfied the complainer, as what they really wanted to discover was the capacity of the drive (how many bytes could be written to it, without overwriting any, before it becomes full). So perhaps that's what "file size" means - in which case what I have seen implemented for the "file size" for a regular file in all implementations I think, isn't correct. Consider another case, what is the file size of /dev/null ? It is (or should be) possible to seek on /dev/null (doesn't achieve anything worthwhile, but there's no reason for it to fail) so it could count as capable of seeking - which would require lseek() using SEEK_END for whence, to determine the size of /dev/null and return that (as modified by the offset). This is filed against POSIX-2018 (Issue 7 TC2) as nothing relevant (and important) has changed in Issue 8 that I can see (lseek() has gained SEEK_HOLE and SEEK_DATA as additional "whence" options, and SEEK_HOLE also refers to the size of the file - so that might also need some additional work as part of fixing this). Once Issue 8 is published, this report should be updated and moved there. |
||||||||||||
Desired Action |
I believe there is a need for a whole new chapter in XBD about files, and their attributes, with definitions for what they mean for the various different file types - along with a specification (or at least, some kind of reasonable clue) what the various file types represent. That is, perhaps this issue should be filed as an omission against XBD, rather than lseek() - but someone else can decide that and adjust if appropriate. As an interim measure, perhaps lseek()'s return value could be defined only for regular files, rather than only for seekable files. At least in the case of SEEK_END which is the one which requires the file size to work. If not that, then I really have no idea, but as specified at the minute this is unimplementable in a portable way. |
||||||||||||
Tags | tc1-2024 | ||||||||||||
Attached Files | |||||||||||||
|
Notes | |
(0006745) geoffclare (manager) 2024-04-11 11:12 |
One reason why some information about file size may seem to be missing is because the synonym "length" is sometimes used instead. For example, on the write() page there is:On a regular file, if the position of the last byte written is greater than or equal to the length of the file, the length of the file shall be set to this position plus one. |
(0006746) kre (reporter) 2024-04-11 12:53 |
Re Note: 0006745 That's fine, but I cannot find a definition for what the "length" of a file is either. When a definition for "size of a file" is added, a definition for "length of a file" should probably be added as well, if they are strictly identical, one can simply refer to the other. Eg: what is the size of a terminal device, or a tape, or a disc ? A FIFO? And for regular files, is the size of the file the number of bytes written, the greatest offset (+1) of any byte written, ... Is it required to be calculated in bytes, or characters, or blocks ?? As I said in the description, what we typically mean for this (at least for regular files) is obvious (I hope) - but the standard cannot (or should not) rely upon "we all know what we mean, you should too". And this still begs the question, of what is the size of a disc, which is where this question arose originally? Is it (like a file) the maximum byte offset (+1) of any byte written to it? While I think most people would agree (approximately, use of [f]truncate() can mangle things) with this for regular files, I suspect that almost no-one measures the size of a disc that way. When considering this, also keep in mind non-writable disc like devices (optical media containing non writable media for example) - is the size of such a disc the maximum capacity it can hold (which is probably what we'd expect for a regular magnetic of ssd type disc) or is it the amount that was recorded upon it (which can never be altered) ? Optical media are seekable devices, so lseek(optical_fd, SEEK_END, 0) is supposed to work, but where should it go? If some-one were to ask what is the size of the mouse I am using, would an appropriate answer be (approx) 12x6x4 (cm) ? Is the actual size of a disc file 3.5 inches? (Or perhaps 2.5 if it is a small one.) Without a definition, how can anyone say that any of those final (absurd) examples is wrong? |
(0006775) eblake (manager) 2024-05-02 16:00 |
https://github.com/util-linux/util-linux/blob/master/lib/blkdev.c#L92 [^] is an example of an #ifdef chain for several ioctl attempts that tries to get block dev size across several systems, given the fact that neither st_size nor lseek(SEEK_END) are universally reliable for the purpose. |
(0006777) geoffclare (manager) 2024-05-09 15:10 |
On page 965 line 32823 section fstatat(), change:The value of the st_size member shall be set to the length of the pathname contained in the symbolic link to: The value of the st_size member shall be set to the length of the contents of the symbolic link On page 1292 line 43059 section lseek(), change: the file offset shall be set to the size of the file plus offset. to: the file offset shall be set to the size of the file (as would be returned in st_size by the fstat() function) plus offset; except that for block special files, it is unspecified whether the offset is relative to the start of the file or to the corresponding device's capacity in bytes. On page 1293 line 43104 section lseek(), add fstat() to SEE ALSO. |
Issue History | |||
Date Modified | Username | Field | Change |
2024-02-25 05:34 | kre | New Issue | |
2024-02-25 05:34 | kre | Name | => Robert Elz |
2024-02-25 05:34 | kre | Section | => XSH 13 lseek |
2024-02-25 05:34 | kre | Page Number | => 1292-3 |
2024-02-25 05:34 | kre | Line Number | => 43059, 43061-2 43070-2 |
2024-04-11 11:12 | geoffclare | Note Added: 0006745 | |
2024-04-11 12:53 | kre | Note Added: 0006746 | |
2024-05-02 16:00 | eblake | Note Added: 0006775 | |
2024-05-09 15:10 | geoffclare | Note Added: 0006777 | |
2024-05-09 15:12 | geoffclare | Interp Status | => --- |
2024-05-09 15:12 | geoffclare | Final Accepted Text | => Note: 0006777 |
2024-05-09 15:12 | geoffclare | Status | New => Resolved |
2024-05-09 15:12 | geoffclare | Resolution | Open => Accepted As Marked |
2024-05-09 15:12 | geoffclare | Tag Attached: tc1-2024 |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |