Anonymous | Login | 2025-02-11 17:48 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0000415 | [1003.1(2008)/Issue 7] System Interfaces | Objection | Enhancement Request | 2011-04-22 13:04 | 2024-06-11 08:53 | ||
Reporter | eblake | View Status | public | ||||
Assigned To | ajosey | ||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Eric Blake | ||||||
Organization | Red Hat | ||||||
User Reference | ebb.lseek | ||||||
Section | lseek | ||||||
Page Number | 1265 | ||||||
Line Number | 41627 | ||||||
Interp Status | --- | ||||||
Final Accepted Text | Note: 0000862 | ||||||
Summary | 0000415: add SEEK_HOLE, SEEK_DATA to lseek | ||||||
Description |
Solaris introduced a very useful idiom for rapidly traversing through sparse files, and Linux is now trying to copy it. The only way to guarantee that this idiom will be identically implemented everywhere is to standardize it in Issue 8. Note that this proposal offers a way to trivially implement this extension on all file systems by treating every seekable file as all data (no holes), while leaving it as a quality of implementation issue to provide the optimizations possible when holes can actually be detected. The Solaris man page, as of this writing, incorrectly implies that SEEK_HOLE does not reposition the file pointer; for more details, see http://thread.gmane.org/gmane.linux.kernel/1129696/focus=1129906 [^] Therefore, this proposal is a bit more explicit. This proposal only modifies <unistd.h> and lseek; we may also want to consider modifying <stdio.h> and fseek() with <CX> shading, although seeking for holes on a stream is not documented for Solaris fseek and may have unintended ramifications with stream buffering. The standard already has a non-normative use of hole (XCU ls APPLICATION USAGE, line 93984) and of sparse file (XCU du RATIONALE line 84253); this would be the first time it is made normative, but still optional. This proposal does not add fpathconf(fd, _PC_MIN_HOLE_SIZE). |
||||||
Desired Action |
After line 1866 [XCU 3.191 Hard Link], add a new section and renumber accordingly: 3.192 Hole A hole is a contiguous region of bytes within a file, all having the value of zero. Not all bytes with the value zero need belong to a hole; however, all seekable files shall have a virtual hole starting at the current size of the file. A hole is typically created via truncate( ), or if an lseek( ) call has been made to position beyond the end of a file and data subsequently written at that point, although it is up to the implementation to define when sparse files can be created and with what granularity for the size of holes. After line 2492 [XCU 3.352 Space Character], add a new section and renumber accordingly: 3.353 Sparse File A sparse file is one that contains more holes than just the virtual hole at the end of the file. At line 14869 [XBD <unistd.h> DESCRIPTION], after: The <unistd.h> header shall define SEEK_CUR, SEEK_END, and SEEK_SET as described in <stdio.h>. add: Additionally, it shall define the following macros which shall expand to integer constant expressions with distinct values: SEEK_HOLE Seek relative to start-of-file for a position within a hole. SEEK_DATA Seek relative to start-of-file for a position not within a hole. After line 41627 [XSH lseek DESCRIPTION], add two more bullets: If whence is SEEK_HOLE, the file offset shall be set to the smallest location of a byte within a hole and not less than offset, except that if offset falls within the last hole, then the file offset may be set to the file size instead. It shall be an error if offset is greater or equal to the size of the file. If whence is SEEK_DATA, the file offset shall be set to the smallest location of a byte not within a hole and not less than offset. It shall be an error if offset is greater or equal to the size of the file, or if offset falls within the last hole. At line 41628 [XSH lseek DESCRIPTION], change: The symbolic constants SEEK_SET, SEEK_CUR, and SEEK_END are defined in <unistd.h>. to: The symbolic constants SEEK_SET, SEEK_CUR, SEEK_END, SEEK_HOLE, and SEEK_DATA are defined in <unistd.h>. A hole is a contiguous region of bytes within a file, all having the value of zero. Not all bytes with the value zero need belong to a hole; however, all seekable files shall have a virtual hole starting at the current size of the file, whether or not the file is sparse. After line 41645 [XSH lseek ERRORS], add: [ENXIO] The whence argument is SEEK_HOLE or SEEK_DATA, and offset is greater or equal to the file size; or the whence argument is SEEK_DATA and the offset falls within the final hole of the file. After line 41668 [XSH lseek RATIONALE], add: Not all filesystems support holes, and even where sparse files are supported, not all contiguous blocks of zero bytes are required to be recognized as a hole. However, since all files are required to have a virtual hole starting at the current file size, application writers can use SEEK_HOLE and SEEK_DATA to optimize algorithms that can run faster when it is known that a block of bytes is all zeros, because a non-sparse file will correctly report the entire file as a single non-hole. A trivial recursive implementation for these two constants would be as follows, however, for filesystems that support sparse files, implementations are encouraged to do better. off_t lseek(int fildes, off_t offset, int whence) { off_t cur, end; switch (whence) { case SEEK_HOLE: case SEEK_DATA: cur = lseek(fildes, 0, SEEK_CUR); if (cur < 0) return cur; end = lseek(fildes, 0, SEEK_END); if (end < 0) return end; if (offset < end) return whence == SEEK_HOLE ? end : lseek(fildes, offset, SEEK_SET); lseek(fildes, cur, SEEK_SET); errno = ENXIO; return -1; default: ... /* Existing implementation */ } } |
||||||
Tags | issue8 | ||||||
Attached Files | |||||||
|
![]() |
|||||||
|
![]() |
|
(0000757) eblake (manager) 2011-04-25 15:48 |
The term "last hole" is problematic for files where data occurs after the last hole, when compared to file where a hole represents the last portion of the file. Nick Bowler suggested some improved wording: I think my confusion can be avoided by talking about the last non-hole data byte in the file (which is unambigious), instead of by talking about the last hole. For instance, the SEEK_HOLE/SEEK_DATA descriptions could be written as follows: If whence is SEEK_HOLE, the file offset shall be set to the smallest location of a byte within a hole and not less than offset, except that if offset falls beyond the last byte not within a hole, then the file offset may be set to the file size instead. It shall be an error if offset is greater or equal to the size of the file. If whence is SEEK_DATA, the file offset shall be set to the smallest location of a byte not within a hole and not less than offset. It shall be an error if no such byte exists. plus a corresponding update to the ENXIO description: ... or the whence argument is SEEK_DATA and the offset falls beyond the last byte not within a hole. |
(0000803) msbrown (manager) 2011-06-09 15:14 |
Targeting for next revision. |
(0000861) nick (manager) 2011-06-16 16:19 |
The style for the definitions should be updated to match the ISO guidelines: 3.192 Hole A contiguous region of bytes within a file, all having the value of zero. Not all bytes with the value zero need belong to a hole; however, all seekable files shall have a virtual hole starting at the current size of the file. A hole is typically created via truncate( ), or if an lseek( ) call has been made to position beyond the end of a file and data subsequently written at that point, although it is up to the implementation to define when sparse files can be created and with what granularity for the size of holes. After line 2492 [XCU 3.352 Space Character], add a new section and renumber accordingly: 3.353 Sparse File A file that contains more holes than just the virtual hole at the end of the file. |
(0000862) geoffclare (manager) 2011-06-16 16:25 edited on: 2020-03-02 16:31 |
After line 1866 [XCU 3.191 Hard Link], add a new section and renumber accordingly: 3.192 Hole A contiguous region of bytes within a file, all having the value of zero. Not all bytes with the value zero need belong to a hole; however, all seekable files shall have a virtual hole starting at the current size of the file. A hole is typically created via truncate( ), or if an lseek( ) call has been made to position beyond the end of a file and data subsequently written at that point, although it is up to the implementation to define when sparse files can be created and with what granularity for the size of holes. After line 2492 [XCU 3.352 Space Character], add a new section and renumber accordingly: 3.353 Sparse File A file that contains more holes than just the virtual hole at the end of the file. At line 14869 [XBD <unistd.h> DESCRIPTION], after: The <unistd.h> header shall define SEEK_CUR, SEEK_END, and SEEK_SET as described in <stdio.h>. add: Additionally, it shall define the following macros which shall expand to integer constant expressions with distinct values: SEEK_HOLE Seek forwards from offset relative to start-of-file for a position within a hole. SEEK_DATA Seek forwards from offset relative to start-of-file for a position not within a hole After line 41627 [XSH lseek DESCRIPTION], add two more bullets: If whence is SEEK_HOLE, the file offset shall be set to the smallest location of a byte within a hole and not less than offset, except that if offset falls beyond the last byte not within a hole, then the file offset may be set to the file size instead. It shall be an error if offset is greater or equal to the size of the file. If whence is SEEK_DATA, the file offset shall be set to the smallest location of a byte not within a hole and not less than offset. It shall be an error if no such byte exists. At line 41628 [XSH lseek DESCRIPTION], change: The symbolic constants SEEK_SET, SEEK_CUR, and SEEK_END are defined in <unistd.h>. to: The symbolic constants SEEK_SET, SEEK_CUR, SEEK_END, SEEK_HOLE, and SEEK_DATA are defined in <unistd.h>. A hole is a contiguous region of bytes within a file, all having the value of zero. Not all bytes with the value zero need belong to a hole; however, all seekable files shall have a virtual hole starting at the current size of the file, whether or not the file is sparse. After line 41645 [XSH lseek ERRORS], add: [ENXIO] The whence argument is SEEK_HOLE or SEEK_DATA, and offset is greater or equal to the file size; or the whence argument is SEEK_DATA and the offset falls beyond the last byte not within a hole. After line 41668 [XSH lseek RATIONALE], add: Not all filesystems support holes, and even where sparse files are supported, not all contiguous blocks of zero bytes are required to be recognized as a hole. However, since all files are required to have a virtual hole starting at the current file size, application writers can use SEEK_HOLE and SEEK_DATA to optimize algorithms that can run faster when it is known that a block of bytes is all zeros, because a non-sparse file will correctly report the entire file as a single non-hole. A trivial recursive implementation for these two constants would be as follows, however, for filesystems that support sparse files, implementations are encouraged to do better. off_t lseek(int fildes, off_t offset, int whence) { off_t cur, end; switch (whence) { case SEEK_HOLE: case SEEK_DATA: cur = lseek(fildes, 0, SEEK_CUR); if (cur < 0) return cur; end = lseek(fildes, 0, SEEK_END); if (end < 0) return end; if (offset < end) return whence == SEEK_HOLE ? end : lseek(fildes, offset, SEEK_SET); lseek(fildes, cur, SEEK_SET); errno = ENXIO; return -1; default: ... /* Existing implementation */ } } Note that although the above looks like user-space code, lseek() cannot be implemented with recursive calls in user space because this would not conform to the atomicity requirements in [xref to 2.9.7]. |
(0000926) eblake (manager) 2011-08-08 14:15 |
On the list, it was pointed out that the suggested non-normative example for how to implement the new lseek() constants on top of existing ones is not thread-safe if done in user space with multiple lseek() kernel calls; perhaps we need to add wording to the non-normative example to clarify that the implementation must still guarantee atomicity, to comply with the thread-safety requirements already present on lseek(). |
(0004113) eblake (manager) 2018-09-10 15:08 |
I've learned that MacOS currently has yet a different implementation of lseek(SEEK_DATA), which really ought to be repaired to comply with the text in this proposal rather than forcing applications to work around the subtle semantic differences. On MacOS, at least with the APFS file system, files are composed of multiple extents; if your current offset is in the middle of a data extent (rather than aligned to the beginning of one), then lseek(SEEK_DATA) incorrectly returns the offset of the NEXT data extent (even if that skips a hole) rather than the current offset. See https://lists.gnu.org/archive/html/bug-gnulib/2018-09/msg00054.html [^] for a discussion of some of the problems (including data corruption) that result from MacOS's different implementation. |
(0004791) geoffclare (manager) 2020-03-02 16:31 |
The resolution in Note: 0000862 has been updated to add the final paragraph in order to address the comment in Note: 0000926. |
![]() |
|||
Date Modified | Username | Field | Change |
2011-04-22 13:04 | eblake | New Issue | |
2011-04-22 13:04 | eblake | Status | New => Under Review |
2011-04-22 13:04 | eblake | Assigned To | => ajosey |
2011-04-22 13:04 | eblake | Name | => Eric Blake |
2011-04-22 13:04 | eblake | Organization | => Red Hat |
2011-04-22 13:04 | eblake | User Reference | => ebb.lseek |
2011-04-22 13:04 | eblake | Section | => lseek |
2011-04-22 13:04 | eblake | Page Number | => 1265 |
2011-04-22 13:04 | eblake | Line Number | => 41627 |
2011-04-22 13:04 | eblake | Interp Status | => --- |
2011-04-25 15:48 | eblake | Note Added: 0000757 | |
2011-06-09 15:14 | msbrown | Tag Attached: issue8 | |
2011-06-09 15:14 | msbrown | Note Added: 0000803 | |
2011-06-09 15:14 | msbrown | Resolution | Open => Future Enhancement |
2011-06-16 16:19 | nick | Note Added: 0000861 | |
2011-06-16 16:25 | geoffclare | Note Added: 0000862 | |
2011-06-16 16:27 | geoffclare | Final Accepted Text | => Note: 0000862 |
2011-06-16 16:27 | geoffclare | Status | Under Review => Resolved |
2011-06-16 16:27 | geoffclare | Resolution | Future Enhancement => Accepted As Marked |
2011-08-08 14:15 | eblake | Note Added: 0000926 | |
2018-09-10 15:08 | eblake | Note Added: 0004113 | |
2020-03-02 16:31 | geoffclare | Note Edited: 0000862 | |
2020-03-02 16:31 | geoffclare | Note Added: 0004791 | |
2020-05-05 14:28 | geoffclare | Status | Resolved => Applied |
2020-06-30 14:16 | geoffclare | Relationship added | related to 0001357 |
2024-06-11 08:53 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |