Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001798 [Issue 8 drafts] System Interfaces Objection Clarification Requested 2024-01-22 15:13 2024-02-16 10:18
Reporter eblake View Status public  
Assigned To
Priority normal Resolution Open  
Status New   Product Version Draft 4
Name Eric Blake
Organization Red Hat
User Reference ebb.posix_getdents
Section XSH posix_getdents
Page Number 1567
Line Number 52609
Final Accepted Text
Summary 0001798: Must posix_getdents remember file offsets across exec?
Description The RATIONALE for fdopendir( ) (page 922) states that POSIX imposes no constraints on what may happen for "the use or referencing of a dirp value or a dirent structure value ... after a fork( ) or one of the exec function calls." Issue 8 added the posix_getdents( ) interface, and one of our goals was to allow it to be implemented on top of a hidden DIR* object for implementations where readdir( ) and friends already track file types as an extension.

In trying to implement posix_getdents( ) for Cygwin, the choice was made to use a hidden DIR* object, opened on the first call to posix_getdents() for any fd , and where subsequent lseek() of the fd map to telldir()/seekdir() of the underlying DIR*. This works even for the case of dup() within a single process; but for fork() without exec, it is prohibitive to keep synchronization of the offset between the two copies, and after exec the underlying DIR* state is no longer available on the newly-exec'd process. It seems like most portable uses of getdent() were limited to a single process; it might help if the standard explicitly calls out the non-portability of exepcting directory offsets to be preserved across fork() and exec(), so that an implementation that uses an underlying DIR* is not hitting hard walls about the synchronized use of that DIR* across fork.
Desired Action (Draft 4 locations)
On page 1567, line 52616 (posix_getdents DESCRIPTION), change:
The behavior is unspecified if lseek( ) is used to set the file offset to a value other than zero or a value returned by a previous call to lseek( ) on the same open file description.
to:
The behavior is unspecified if lseek( ) is used to set the file offset to a value other than zero or a value returned by a previous call to lseek( ) on the same open file description; likewise, the behavior is unspecified if attempting to use posix_getdents( ) on a file descriptor after an exec call or in the child process of a fork( ) or _Fork( ) call if the file descriptor was at a non-zero offset before the call, without first using lseek( ) to set the file offset back to zero.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0006632)
eblake (manager)
2024-01-22 15:30
edited on: 2024-01-22 15:39

Correction - I'm told that the attempted Cygwin implementation also has problems after dup(); it is unclear whether the states should be linked (reading an entry on one fd, grabbing its offset, then using the other fd to read entries, it is unclear whether the second fd starts reading from the point where the fd was at the time of dup() or at the subsequent point reached by the first fd, and whether the second fd can safely lseek() to any subsequent offset read using the first fd). Easiest would be to state that dup() has the same limitations as fork()/exec - namely, that resuming any mid-stream directory traversal in either side of the split is unspecified, and the only portable thing is to start a new traversal by lseek'ing back to 0 (at which point, the implementation no longer has to worry about sharing a half-read DIR* across fd copies or processes).

(0006658)
corinna_vinschen (reporter)
2024-02-16 10:18

From Cygwin's side, the problem is this:

The underlying non-POSIXy kernel does not allow lseek(2) operations on
directory descriptors, not even requesting a position within the directory.
The only available seek-like operation is equivalent to lseek(dirfd, SEEK_SET, 0).

Therefore we have to use a DIR* and the entire operation of position
bookkeeping is performed in user space.

If the standard strives to allow implementing posix_getdents() using DIR*
under the hood, the standard should be clear on the subject that DIR is
not dup(2)'able the same way as the dir descriptor given as argument to
posix_getdents().

DIR is, by and large, a user-space object while the descriptor is a kernel
object. DIR has never been meant as a dup'able object and there's no
precedent for such a functionality.

As such, there's no way to keep the dup'ed DIR* in sync after such a
duplication.

The same problem occurs with fork(2), which is just a more thorough dup(2)
in terms of descriptors.

Bottom line is, with user-space DIR* with enforced user space bookkeeping,
there's no way after dup(2)/fork(2) to keep the directory position info
in sync.

Consequentially, there should be no assumption made how posix_getdents()
behaves after dup(2) or fork(2). I.e. using the descriptors with
posix_getdents() or readdir() in parallel should be undefined behaviour.


If you're interested in code, I invite you to take a look into the current,
preliminary implementation of posix_getdents() in Cygwin:

https://cygwin.com/cgit/newlib-cygwin/commit/?id=62ca95721a14 [^]

As the commit outlines, the code does not try to keep track of the hidden
DIR at all.


Thanks,
Corinna

- Issue History
Date Modified Username Field Change
2024-01-22 15:13 eblake New Issue
2024-01-22 15:13 eblake Name => Eric Blake
2024-01-22 15:13 eblake Organization => Red Hat
2024-01-22 15:13 eblake User Reference => ebb.posix_getdents
2024-01-22 15:13 eblake Section => XSH posix_getdents
2024-01-22 15:13 eblake Page Number => 1567
2024-01-22 15:13 eblake Line Number => 52609
2024-01-22 15:30 eblake Note Added: 0006632
2024-01-22 15:39 eblake Note Edited: 0006632
2024-02-16 10:18 corinna_vinschen Note Added: 0006658


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker