Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001798 [1003.1(2024)/Issue8] System Interfaces Objection Clarification Requested 2024-01-22 15:13 2024-07-24 14:32
Reporter eblake View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Interpretation Required  
Name Eric Blake
Organization Red Hat
User Reference ebb.posix_getdents
Section XSH posix_getdents
Page Number 1567
Line Number 52601
Interp Status Approved
Final Accepted Text Note: 0006819
Summary 0001798: Must posix_getdents remember file offsets across exec?
Description The RATIONALE for fdopendir( ) (page 922) states that POSIX imposes no constraints on what may happen for "the use or referencing of a dirp value or a dirent structure value ... after a fork( ) or one of the exec function calls." Issue 8 added the posix_getdents( ) interface, and one of our goals was to allow it to be implemented on top of a hidden DIR* object for implementations where readdir( ) and friends already track file types as an extension.

In trying to implement posix_getdents( ) for Cygwin, the choice was made to use a hidden DIR* object, opened on the first call to posix_getdents() for any fd , and where subsequent lseek() of the fd map to telldir()/seekdir() of the underlying DIR*. This works even for the case of dup() within a single process; but for fork() without exec, it is prohibitive to keep synchronization of the offset between the two copies, and after exec the underlying DIR* state is no longer available on the newly-exec'd process. It seems like most portable uses of getdent() were limited to a single process; it might help if the standard explicitly calls out the non-portability of exepcting directory offsets to be preserved across fork() and exec(), so that an implementation that uses an underlying DIR* is not hitting hard walls about the synchronized use of that DIR* across fork.
Desired Action (Draft 4 locations)
On page 1567, line 52616 (posix_getdents DESCRIPTION), change:
The behavior is unspecified if lseek( ) is used to set the file offset to a value other than zero or a value returned by a previous call to lseek( ) on the same open file description.
to:
The behavior is unspecified if lseek( ) is used to set the file offset to a value other than zero or a value returned by a previous call to lseek( ) on the same open file description; likewise, the behavior is unspecified if attempting to use posix_getdents( ) on a file descriptor after an exec call or in the child process of a fork( ) or _Fork( ) call if the file descriptor was at a non-zero offset before the call, without first using lseek( ) to set the file offset back to zero.
Tags tc1-2024
Attached Files

- Relationships

-  Notes
(0006632)
eblake (manager)
2024-01-22 15:30
edited on: 2024-01-22 15:39

Correction - I'm told that the attempted Cygwin implementation also has problems after dup(); it is unclear whether the states should be linked (reading an entry on one fd, grabbing its offset, then using the other fd to read entries, it is unclear whether the second fd starts reading from the point where the fd was at the time of dup() or at the subsequent point reached by the first fd, and whether the second fd can safely lseek() to any subsequent offset read using the first fd). Easiest would be to state that dup() has the same limitations as fork()/exec - namely, that resuming any mid-stream directory traversal in either side of the split is unspecified, and the only portable thing is to start a new traversal by lseek'ing back to 0 (at which point, the implementation no longer has to worry about sharing a half-read DIR* across fd copies or processes).

(0006658)
corinna_vinschen (reporter)
2024-02-16 10:18

From Cygwin's side, the problem is this:

The underlying non-POSIXy kernel does not allow lseek(2) operations on
directory descriptors, not even requesting a position within the directory.
The only available seek-like operation is equivalent to lseek(dirfd, SEEK_SET, 0).

Therefore we have to use a DIR* and the entire operation of position
bookkeeping is performed in user space.

If the standard strives to allow implementing posix_getdents() using DIR*
under the hood, the standard should be clear on the subject that DIR is
not dup(2)'able the same way as the dir descriptor given as argument to
posix_getdents().

DIR is, by and large, a user-space object while the descriptor is a kernel
object. DIR has never been meant as a dup'able object and there's no
precedent for such a functionality.

As such, there's no way to keep the dup'ed DIR* in sync after such a
duplication.

The same problem occurs with fork(2), which is just a more thorough dup(2)
in terms of descriptors.

Bottom line is, with user-space DIR* with enforced user space bookkeeping,
there's no way after dup(2)/fork(2) to keep the directory position info
in sync.

Consequentially, there should be no assumption made how posix_getdents()
behaves after dup(2) or fork(2). I.e. using the descriptors with
posix_getdents() or readdir() in parallel should be undefined behaviour.


If you're interested in code, I invite you to take a look into the current,
preliminary implementation of posix_getdents() in Cygwin:

https://cygwin.com/cgit/newlib-cygwin/commit/?id=62ca95721a14 [^]

As the commit outlines, the code does not try to keep track of the hidden
DIR at all.


Thanks,
Corinna
(0006695)
geoffclare (manager)
2024-02-29 17:27

Proposed interpretation (review timer to start after approval of issue 8) ...

Interpretation response
------------------------
The standard states that the posix_getdents() function starts reading at the current file offset in the open file description associated with fildes, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
Elsewhere the standard makes allowances for implementations where directory streams are not implemented using a file descriptor, but this was not extended to the new posix_getdents() function when it was added.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

After page 920 line 31407 section fdopendir(), add a new paragraph:
If the file descriptor specified by fd is associated with an open file description on which posix_getdents() has previously been used, the behavior is unspecified.


After page 1567 line 52616 section posix_getdents():
The behavior is unspecified if lseek() is used to set the file offset to a value other than zero or a value returned by a previous call to lseek() on the same open file description.

add these sentences:
The behavior is unspecified if calls to posix_getdents() are made on different file descriptors that refer to the same open file description (for example, before and after a file descriptor is inherited across fork() or the exec family of functions, or is duplicated using dup() or fcntl()), unless lseek() is used to set the file offset to zero in between the calls to posix_getdents(). A single exception to this condition is that after a call to fork(), either the parent or child (but not both) can continue processing the directory using posix_getdents(); if both the parent and child processes use the function, the result is unspecified. Likewise, the behavior is unspecified if in between two calls to posix_getdents() on one file descriptor, the file offset is altered by a call made on a different file descriptor that refers to the same open file description and the new offset is not zero.


After page 1571 line 52771 section posix_getdents(), add a new paragraph to RATIONALE:
The restrictions on the use of different file descriptors that refer to the same open file description are needed in order to enable implementations where directory streams are not implemented using a file descriptor to maintain some internal state related to a particular file descriptor.


At page 1858 line 61319 section readdir(), change:
    
the result is undefined.

to:
    
the result is unspecified.
(0006703)
corinna_vinschen (reporter)
2024-03-04 09:38

> Likewise, the behavior is unspecified if in between two calls to
> posix_getdents() on one file descriptor, the file offset is altered
> by a call made on a different file descriptor that refers to the same open
> file description and the new offset is not zero.

While the new clarifications look mostly good to me, this snippet still
looks like a problem, in particular the restriction "and the new offset
is not zero".

The reason is that after dup, the process has to create a new DIR
for the dup'ed descriptor. The two DIR's are logically distinct.

The above paragraph sounds like the behaviour is not supposed to be
undefined in the follwoing situation:

  posix_getdents (fd1, ...); // file pos != 0 after this call

  fd2 = dup (fd1);

  lseek (fd2, SEEK_SET, 0); // seeks file to pos 0 via fd2

  posix_getdents (fd1, ...); // is now supposed to start at pos 0?

If so, I'm not sure how to do this via underlying DIR pointer.

The DIR pointer attached to fd1 is either dropped from or duplicated
to fd2. It's certainly not the same DIR pointer. The lseek in
fd2 can only affect the DIR attached to fd2, not the one from fd1.

So, given we don't have a seekable underlying OS file descriptor,
how is the second posix_getdents on fd1 supposed to know that it
has to restart at pos 0?


Thanks,
Corinna
(0006708)
geoffclare (manager)
2024-03-07 12:28

Re Note: 0006703 I'm not seeing why that case would be any more difficult to handle than this one:

  posix_getdents (fd1, ...); // file pos != 0 after this call

  lseek (fd1, SEEK_SET, 0); // seeks file to pos 0

  posix_getdents (fd1, ...);

Since posix_getdents() has to query the file offset on every call, if the offset is zero it should just start reading the directory at the beginning, regardless of how the offset became zero.

Did I miss something?
(0006709)
corinna_vinschen (reporter)
2024-03-07 15:00

You're missing the fact that the underlying OS does *not* maintain a file position on directory descriptors. The function returning the file position
always returns 0 on a directory, independent of the actually read directory entries. Also, there's no way to lseek on a directory. The only operation
available is a "restart" flag to the directory read operation, which allows
to specify to start at position 0.

So, to be able to implement telldir/seekdir, the DIR struct has to maintain
a read counter. telldir() simply returns the number of directory entries read so far. Seekdir() is implemented as a "restart" and then reading directory entries
in a loop until the counter matches the one given as argument.

Having said that, as soon as you fork() a directory descriptor with
posix_getdent operation, you not only generate a copy of the underlying OS
descriptor, you also duplicate the DIR struct into the new process. Now the
DIR structs are independent from each other. If you call posix_getdents on
one of them, the DIR strucxt in the other process is obviously *not* updated
accordingly. Thus, any lseek() on the directory descriptor in one process
is lost on the one in the other directory.

I used fork() as an example, but the same goes for dup(), unless you share
the same DIR structure for all the directory descriptors in shared memory.

Does that clear things up?

Thanks,
Corinna
(0006710)
geoffclare (manager)
2024-03-07 18:14

> You're missing the fact that the underlying OS does *not* maintain a file position on directory descriptors.

Actually, I think I knew that, but had forgotten it.

So the Cygwin lseek() must have to fake an offset for fds associated with a directory stream - presumably returning the read count - and accept those faked offsets as input.

To make it work for an lseek() on an fd obtained from dup(), as in Note: 0006703, couldn't you have dup() notice that the fd passed in is associated with a directory stream and create an association between the new fd and the same directory stream? Admittedly the code would be more complicated if a directory stream can be associated with more than one fd, but it seems to me that this could be a promising approach that would provide better compatibility with other systems.
(0006711)
corinna_vinschen (reporter)
2024-03-07 20:24

> So the Cygwin lseek() must have to fake an offset for fds associated with a
> directory stream - presumably returning the read count - and accept those
> faked offsets as input

Yes, as I described in my previous note, readdir() keeps count, telldir()
returns the count, seekdir () rewinds and calls readdir until the count equals
the seekdir() argument.

lseek() on dirs was not implemented at all due to the OS not supporting it.

lseek() is now supported for dirs used in posix_getdents() by calling
telldir()/seekdir() under the hood, see

https://cygwin.com/cgit/newlib-cygwin/commit/?id=62ca95721a14 [^]

> To make it work for an lseek() on an fd obtained from dup(), as in
> Note: 0006703, couldn't you have dup() notice that the fd passed in is
> associated with a directory stream and create an association between
> the new fd and the same directory stream? Admittedly the code would be
> more complicated

Actually a *lot* more complicated.

You're basically now expecting shared bookkeeping of DIRs.

But this was never before required by opendir()/readdir()/...

A DIR is an allocated user space structure on the heap. The only way
of sharing a DIR was by duplicating it via fork(). The two resulting
DIRs in the parent and child processes are disconnected and they function independently of each other.

If you now require DIRs to be shared across dup() and fork(), you're
basically requiring a rewrite of otherwise conforming implementations
of opendir() and friends. It would require to store a DIR in shared
memory, add interprocess locking to readdir() , and whatnot.

If the idea was to allow implementing posix_getdents() with existing
DIR under the hood, this new requirement breaks this assumption, just
for a border case.


Corinna
(0006712)
corinna_vinschen (reporter)
2024-03-07 20:30

Btw., in terms of lseek() I pushed an improved patch a week ago:
 
https://cygwin.com/cgit/newlib-cygwin/commit/?id=6d936915477c [^]
(0006715)
geoffclare (manager)
2024-03-08 09:00

> If you now require DIRs to be shared across dup() and fork(), ...

I was only proposing sharing across dup(), not fork(). The extra complexity you describe (shared memory, interprocess locking) would not be needed.
(0006716)
corinna_vinschen (reporter)
2024-03-08 11:08

Even if only with dup() it's still synchronization
overhead which was never before required for DIR :(
(0006721)
eblake (manager)
2024-03-21 16:21
edited on: 2024-03-21 16:30

On the 21 Mar 2024 call, we reopened this bug, in order to consider replacing the original proposal of Note: 0006695 with the following; line numbers from draft 4.0:

Proposed interpretation (review timer to start after approval of issue 8) ...

Interpretation response
------------------------
The standard states that the posix_getdents() function starts reading at the current file offset in the open file description associated with fildes, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
Elsewhere the standard makes allowances for implementations where directory streams are not implemented using a file descriptor, but this was not extended to the new posix_getdents() function when it was added.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

After page 920 line 31407 section fdopendir(), add a new paragraph:

    
If the file descriptor specified by fd is associated with an open file description on which posix_getdents() has previously been used, or for which any associated file descriptor is already associated with a directory stream, the behavior is unspecified.



After page 1567 line 52616 section posix_getdents():

    
The behavior is unspecified if lseek() is used to set the file offset to a value other than zero or a value returned by a previous call to lseek() on the same open file description.


add these sentences:

    
The behavior is unspecified if calls to posix_getdents() are made on different file descriptors that refer to the same open file description (for example, before and after a file descriptor is inherited across fork() or the exec family of functions, or is duplicated using dup() or fcntl()), unless lseek() is used to set the file offset to zero in between the calls to posix_getdents(). A single exception to this condition is that after a call to fork(), either the parent or child (but not both) can continue processing the directory using posix_getdents(). Likewise, the behavior is unspecified if in between two calls to posix_getdents() on one file descriptor, the file offset is altered by a call made on a different file descriptor that refers to the same open file description.


After page 1571 line 52771 section posix_getdents(), add a new paragraph to RATIONALE:

    
The restrictions on the use of different file descriptors that refer to the same open file description are needed in order to enable implementations where directory streams are not implemented using a file descriptor to maintain some internal state related to a particular file descriptor.



At page 1858 line 61312, section readdir(), change:

    
If a file is removed from or added to the directory after the most recent call to opendir( ) or rewinddir( ), whether a subsequent call to readdir( ) returns an entry for that file is unspecified.


to:

    
If a file is removed from or added to the directory after the most recent call to opendir( ) or rewinddir( ), whether a subsequent call to readdir( ) on that directory stream returns an entry for that file is unspecified. For all other files in the directory that existed at the time the directory stream was opened and which have not been removed, successive calls to readdir( ) on that directory stream shall return an entry for each such file exactly once before reporting that the end of the directory has been reached, provided that there are no intervening calls to seekdir( ) and no unspecified behavior caused by opening a second directory stream on the same file description associated with the directory. For any such file that is renamed within the directory after the directory stream was opened, readdir( ) shall return either an entry for the original name or for the new name, but not both.



At page 1858 line 61319 section readdir(), change:
    
the result is undefined.

to:
    
the result is unspecified.


(0006722)
corinna_vinschen (reporter)
2024-03-21 18:58

This sounds pretty good.

Thanks,
Corinna
(0006723)
kre (reporter)
2024-03-22 03:50
edited on: 2024-03-22 04:48

Re Note: 0006721

There it says:

    For any such file that is renamed within the directory after
    the directory stream was opened, readdir( ) shall return either
    an entry for the original name or for the new name, but not both.

I have no idea how that is supposed to be implementable, that is, the
"but not both".

First, I am assuming that "such file" means this applies only to a file
that is:

     removed from or added to the directory after the most recent call to

from the opening line of the paragraph, though it doesn't really matter,
since a typical implementation of readdir() isn't going to be able
to tell which files were added to the directory after any particular
event, other than when the event is readdir() reaching EOF.

That is, if you consider reading a VERY large directory with
readdir(), while reading the first few entries (however
many readdir() manages to buffer in memory) a new file is
added, way down at the end of the directory where it hasn't reached yet.
How could there be a special rule for that file which doesn't also
apply to the previously added file, which just happened to be added
before the event occurred?

To readdir() those two files look to be almost exactly the same,
it can't even use the modification time of the directory, or its
size, as some kind of hint, neither of those works given the right
circumstances.

Note that here I am concerned only with the rule about what happens
with files that are renamed, not with the "unspecified whether it is
shown or not" part, that's simple. And for this first nit, it is
just the "such file" (as distinct from "any file") which seems to be
wrong.

One more assumption, then I'll get to the real problem with the
language quoted.

If I set up a directory using

     mkdir dir
     cd dir
     > file1
     ln file1 file2

and after that is all done, I run an application (maybe ls) which
uses readdir() to read the directory (with nothing changing in any
way), I assume (hope) that is intended, and required, that readdir()
returns entries for both file1 and file2. If that's not required,
then you can stop reading this note now, and we have bigger problems.

So, I assume it is true.

Now setup a directory as follows

     mkdir dir
     cd dir
     > f
     makemany a b c d e
     > do_it_now
     makemany g h i j k l m n o p q r s t u v w x y z

where "makemany" is a function defined like

     makemany()
     {
        for c
        do
          for n1 in 0 1 2 3 4 5 6 7 8 9
          do
            for n2 in 0 1 2 3 4 5 6 7 8 9
            do
              for n3 in 0 1 2 3 4 5 6 7 8 9
              do
                  >"${c}${n1}${n2}${n3}"
              done
            done
          done
        done
     }

which is fairly ugly ... the silly nested n1 n2 n3 loops are just
because POSIX appears not to have a standard utility like either jot
or seq, which is what I'd use in reality. [Aside, I know awk would
work, but that is kind of heavyweight.]

The point is simply to create thousands of files with relatively
short names in the current directory. If your readdir() buffers
LOTS in memory, you might need to add n4 n5 ... to make sufficient
files for the problem to manifest itself. The method by which that
is accomplished is unimportant.

Then we start a process that uses readdir() to read this directory.
When (or about when) it reaches the file "do_it_now" while reading
the directory, either this process (the thread running readdir()
or some other thread) or some other process unknown to this one,
does:

      mv f the_same_file_as_f_was_but_with_a_much_longer_name

Now because of the way directories are created in typical systems
(and certainly a way they're permitted to be created) the entry for
"f" right at the start of the directory, which our readdir() call
will have already returned an entry for, will be removed, and a
new entry for the new name (which I won't type again) will be made.

Because of the way the directory was created, with many short file
names, the only place that new name can be put is right at the end
beyond what the readdir() implementation has already read from the
filesystem (if it has already read to the end in a particular
implementation, simply add more directory entries).

The new name cannot simply replace "f", it is far too long to fit
there, and if other entries were moved around, trying to get
readdir() to do anything reasonable at all (given that none of
the files that might be repositioned in the directory have been
added, or removed, while readdir() is reading the directory) would
be almost impossible.

The rule that says "but not both" means that, as an entry for "f"
has already been returned, readdir() is not permitted to return
an entry for its replacement name. But I have no idea how any
reasonable readdir() implementation is supposed to implement that rule.

Before you start telling me how it could, or should, be done, consider
the similar case, where instead of doing the "mv" command above, the
following was performed (at the same point as the mv, instead of it)
     
    ln f the_same_file_as_f_was_but_with_a_much_longer_name

Now I know in that case it is unspecified whether or not the new name
(being one created after readdir() started reading the directory)
is returned or not, so not returning it would be legitimate, but in
practice, there is no way for readdir() to know that this file was
added while it was reading the directory. If necessary, we can
postulate that before doing that "ln" command, the same process which
would execute that one (or the equivalent link() system call) removed
just the right number of the final files in the directory, and the
alternative (long) name for "f" is made precisely the correct length
so that the size of the directory is not altered by this sequence.

The assumption above is that when links to files exist, both must
be returned. Since there is no way to know for sure that the new
file was added after readdir() started, it cannot rely on the
"unspecified whether it is returned or not" - the second name
simply must be returned.

(The "unspecified" is because an implementation is permitted to
insert the new name into a section of the directory which had
already been read, and we cannot require the implementation to
continuously re-read the directory, just in case a new file was
added into a segment already processed.)

Now how is the readdir() implementation supposed to know that
a rename ("mv") happened, rather than a "ln" - or in fact that
any of those things happened at all while the directory was being
read?

Hence I cannot see a way that makes it possible to implement that
"but not both" rule. It simply has to go. There cannot be any
rule about what must be done in these cases, as there is no way to
determine whether one of the special cases happened or not. All
the implementation can do is return the entries it sees as it
reads the directory. The standard simply needs to make it clear
that there are cases where what is returned might be neither what
was in the directory when the readdir() started, nor what is there
when it completes.

What needs to be said about rename() is more like:

    If a file is renamed within the directory after the directory
    stream was opened, readdir() may return an entry for the
    original name, one for the new name, neither, or both.

as in reality (as far as the directory is concerned) a rename is
simply making a new file, and deleting an old. That the contents,
type, and other attributes, of the file are all the same is not
relevant at all to readdir(), so the effects need to be the same:
unspecified whether the name that was removed is returned, and
unspecified whether the replacement name (which was added) is
returned. (The suggested paragraph above could be rewritten in
terms of "unspecified" rather than "may" if desired.)


I also notice that posix_getdents() says nothing about the effects of
a rename() - and perhaps should. However, were the language changed
to refer to file names being added to or removed from the directory,
rather than files being added or removed, then what is there now would
cover it I think. That does assume that the "effects of the concurrent
operation" mean only the effects as applied to any specific entry being
returned, and do not extend to other entries that may be modified as a
side effect of that concurrent operation.

(0006724)
geoffclare (manager)
2024-03-22 09:48
edited on: 2024-03-22 10:30

> If I set up a directory using
>
> mkdir dir
> cd dir
> > file1
> ln file1 file2
>
> and after that is all done, I run an application (maybe ls) which
> uses readdir() to read the directory (with nothing changing in any
> way), I assume (hope) that is intended, and required, that readdir()
> returns entries for both file1 and file2. If that's not required,
> then you can stop reading this note now, and we have bigger problems.

Obviously, returning entries for both is _intended_ to be required, but you have uncovered a major problem with the proposed wording, and as it stands implementations would be required to return either file1 or file2 but not both. This is because the text uses "file" when it means "directory entry". In your example, file1 and file2 are separate directory entries which both refer to the same file.

If we reword in terms of directory entries, I think no explicit statement about renaming will be needed. If the rename removes one directory entry and adds another, it will be covered by the add/remove text; if it updates the name within the directory entry, the existing requirement for directory operations to be atomic will be sufficient.

> I also notice that posix_getdents() says nothing about the effects of
> a rename() - and perhaps should. However, were the language changed
> to refer to file names being added to or removed from the directory,
> rather than files being added or removed, then what is there now would
> cover it I think.

In the relevant paragraph (lines 52624-52629 in draft 4) the first sentence uses "directory entry" and the second uses "file". The second should change to use "directory entry".

Update: I have added suggested new wording to the etherpad at https://posix.rhansen.org/p/2024-03-21 [^] (currently at line 330)

(0006726)
eblake (manager)
2024-03-25 15:20
edited on: 2024-03-25 15:22

Proposed interpretation (review timer to start after approval of issue 8), using draft 4.0 line numbers ...

Interpretation response
------------------------
The standard states that the posix_getdents() function starts reading at the current file offset in the open file description associated with fildes, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
Elsewhere the standard makes allowances for implementations where directory streams are not implemented using a file descriptor, but this was not extended to the new posix_getdents() function when it was added.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

After page 920 line 31407 section fdopendir(), add a new paragraph:

    
If the file descriptor specified by fd is associated with an open file description on which posix_getdents() has previously been used, or for which any associated file descriptor is already associated with a directory stream, the behavior is unspecified.



After page 1567 line 52616 section posix_getdents():

    
The behavior is unspecified if lseek() is used to set the file offset to a value other than zero or a value returned by a previous call to lseek() on the same open file description.


add these sentences:

    
The behavior is unspecified if calls to posix_getdents() are made on different file descriptors that refer to the same open file description (for example, before and after a file descriptor is inherited across fork() or the exec family of functions, or is duplicated using dup() or fcntl()), unless lseek() is used to set the file offset to zero in between the calls to posix_getdents(). A single exception to this condition is that after a call to fork(), either the parent or child (but not both) can continue processing the directory using posix_getdents(). Likewise, the behavior is unspecified if in between two calls to posix_getdents() on one file descriptor, the file offset is altered by a call made on a different file descriptor that refers to the same open file description.



At page 1568 line 52626 section posix_getdents(), change:

    
If a sequence of calls to posix_getdents() is made that reads from offset zero to end-of-file and a file is removed from or added to the directory between the first and last of those calls, whether the sequence of calls returns an entry for that file is unspecified.


to:

    
If a sequence of calls to posix_getdents() is made that reads from offset zero to end-of-file and a directory entry is removed from or added to the directory between the first and last of those calls, whether the sequence of calls returns that directory entry is unspecified.



After page 1571 line 52771 section posix_getdents(), add a new paragraph to RATIONALE:

    
The restrictions on the use of different file descriptors that refer to the same open file description are needed in order to enable implementations where directory streams are not implemented using a file descriptor to maintain some internal state related to a particular file descriptor.



At page 1858 line 61304, section readdir(), change:

    
If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.


to:

    
If a directory entry is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() on that directory stream returns that directory entry is unspecified. For all other directory entries in the directory that existed at the time the directory stream was opened or rewound and which have not been removed, successive calls to readdir() on that directory stream shall return each such directory entry exactly once before reporting that the end of the directory has been reached, provided that there are no intervening calls to seekdir() and no unspecified behavior caused by performing an operation on an open file description associated with the directory.



At page 1858 line 61319 section readdir(), change:
    
the result is undefined.

to:
    
the result is unspecified.


At page 1859 line 61332 section readdir(), change:

    
The readdir_r() function shall not return directory entries containing empty names.

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir_r() returns an entry for that file is unspecified.


to:

    
The readdir_r() function shall not return directory entries containing empty names. If entries for dot or dot-dot exist, one entry shall be returned for dot and one entry shall be returned for dot-dot; otherwise, they shall not be returned.

If a directory entry is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir_r() on that directory stream returns that directory entry is unspecified. For all other directory entries in the directory that existed at the time the directory stream was opened or rewound and which have not been removed, successive calls to readdir_r() on that directory stream shall return each such directory entry exactly once before reporting that the end of the directory has been reached, provided that there are no intervening calls to seekdir() and no unspecified behavior caused by performing an operation on an open file description associated with the directory.


(0006819)
geoffclare (manager)
2024-06-17 08:51

Interpretation response
------------------------
The standard states that the posix_getdents() function starts reading at the current file offset in the open file description associated with fildes, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
Elsewhere the standard makes allowances for implementations where directory streams are not implemented using a file descriptor, but this was not extended to the new posix_getdents() function when it was added.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

After page 920 line 31395 section fdopendir(), add a new paragraph:
If the file descriptor specified by fd is associated with an open file description on which posix_getdents() has previously been used, or for which any associated file descriptor is already associated with a directory stream, the behavior is unspecified.

After page 1567 line 52603 section posix_getdents():
The behavior is unspecified if lseek() is used to set the file offset to a value other than zero or a value returned by a previous call to lseek() on the same open file description.

add these sentences:
The behavior is unspecified if calls to posix_getdents() are made on different file descriptors that refer to the same open file description (for example, before and after a file descriptor is inherited across fork() or the exec family of functions, or is duplicated using dup() or fcntl()), unless lseek() is used to set the file offset to zero in between the calls to posix_getdents(). A single exception to this condition is that after a call to fork(), either the parent or child (but not both) can continue processing the directory using posix_getdents(). Likewise, the behavior is unspecified if in between two calls to posix_getdents() on one file descriptor, the file offset is altered by a call made on a different file descriptor that refers to the same open file description.

At page 1568 line 52613 section posix_getdents(), change:
If a sequence of calls to posix_getdents() is made that reads from offset zero to end-of-file and a file is removed from or added to the directory between the first and last of those calls, whether the sequence of calls returns an entry for that file is unspecified.

to:
If a sequence of calls to posix_getdents() is made that reads from offset zero to end-of-file and a directory entry is removed from or added to the directory between the first and last of those calls, whether the sequence of calls returns that directory entry is unspecified.

After page 1571 line 52758 section posix_getdents(), add a new paragraph to RATIONALE:
The restrictions on the use of different file descriptors that refer to the same open file description are needed in order to enable implementations where directory streams are not implemented using a file descriptor to maintain some internal state related to a particular file descriptor.

At page 1858 line 61299, section readdir(), change:
If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.

to:
If a directory entry is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() on that directory stream returns that directory entry is unspecified. For all other directory entries in the directory that existed at the time the directory stream was opened or rewound and which have not been removed, successive calls to readdir() on that directory stream shall return each such directory entry exactly once before reporting that the end of the directory has been reached, provided that there are no intervening calls to seekdir() and no unspecified behavior caused by performing an operation on an open file description associated with the directory.

At page 1858 line 61306 section readdir(), change:
the result is undefined.

to:
the result is unspecified.

At page 1859 line 61319 section readdir(), change:
The readdir_r() function shall not return directory entries containing empty names.

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir_r() returns an entry for that file is unspecified.

to:
The readdir_r() function shall not return directory entries containing empty names. If entries for dot or dot-dot exist, one entry shall be returned for dot and one entry shall be returned for dot-dot; otherwise, they shall not be returned.

If a directory entry is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir_r() on that directory stream returns that directory entry is unspecified. For all other directory entries in the directory that existed at the time the directory stream was opened or rewound and which have not been removed, successive calls to readdir_r() on that directory stream shall return each such directory entry exactly once before reporting that the end of the directory has been reached, provided that there are no intervening calls to seekdir() and no unspecified behavior caused by performing an operation on an open file description associated with the directory.
(0006825)
agadmin (administrator)
2024-06-21 11:47

Interpretation proposed: 21 June 2024
(0006837)
agadmin (administrator)
2024-07-24 14:32
edited on: 2024-07-25 03:57

Interpretation approved: 24 July 2024


- Issue History
Date Modified Username Field Change
2024-01-22 15:13 eblake New Issue
2024-01-22 15:13 eblake Name => Eric Blake
2024-01-22 15:13 eblake Organization => Red Hat
2024-01-22 15:13 eblake User Reference => ebb.posix_getdents
2024-01-22 15:13 eblake Section => XSH posix_getdents
2024-01-22 15:13 eblake Page Number => 1567
2024-01-22 15:13 eblake Line Number => 52609
2024-01-22 15:30 eblake Note Added: 0006632
2024-01-22 15:39 eblake Note Edited: 0006632
2024-02-16 10:18 corinna_vinschen Note Added: 0006658
2024-02-29 17:27 geoffclare Note Added: 0006695
2024-02-29 17:29 geoffclare Final Accepted Text => Note: 0006695
2024-02-29 17:29 geoffclare Status New => Resolution Proposed
2024-02-29 17:29 geoffclare Resolution Open => Accepted As Marked
2024-02-29 17:30 geoffclare Tag Attached: tc1-2024
2024-03-04 09:38 corinna_vinschen Note Added: 0006703
2024-03-07 12:28 geoffclare Note Added: 0006708
2024-03-07 15:00 corinna_vinschen Note Added: 0006709
2024-03-07 18:14 geoffclare Note Added: 0006710
2024-03-07 20:24 corinna_vinschen Note Added: 0006711
2024-03-07 20:30 corinna_vinschen Note Added: 0006712
2024-03-08 09:00 geoffclare Note Added: 0006715
2024-03-08 11:08 corinna_vinschen Note Added: 0006716
2024-03-21 16:21 eblake Note Added: 0006721
2024-03-21 16:21 eblake Resolution Accepted As Marked => Reopened
2024-03-21 16:24 eblake Note Edited: 0006721
2024-03-21 16:30 eblake Note Edited: 0006721
2024-03-21 16:31 eblake Final Accepted Text Note: 0006695 => Note: 0006721
2024-03-21 18:58 corinna_vinschen Note Added: 0006722
2024-03-22 03:50 kre Note Added: 0006723
2024-03-22 04:40 kre Note Edited: 0006723
2024-03-22 04:48 kre Note Edited: 0006723
2024-03-22 09:48 geoffclare Note Added: 0006724
2024-03-22 10:30 geoffclare Note Edited: 0006724
2024-03-25 15:20 eblake Note Added: 0006726
2024-03-25 15:21 eblake Final Accepted Text Note: 0006721 => Note: 0006726
2024-03-25 15:21 eblake Resolution Reopened => Accepted As Marked
2024-03-25 15:22 eblake Note Edited: 0006726
2024-06-17 08:19 geoffclare Project Issue 8 drafts => 1003.1(2024)/Issue8
2024-06-17 08:37 geoffclare Line Number 52609 => 52601
2024-06-17 08:37 geoffclare Interp Status => Pending
2024-06-17 08:37 geoffclare Status Resolution Proposed => Interpretation Required
2024-06-17 08:51 geoffclare Note Added: 0006819
2024-06-17 08:53 geoffclare Final Accepted Text Note: 0006726 => Note: 0006819
2024-06-21 11:47 agadmin Interp Status Pending => Proposed
2024-06-21 11:47 agadmin Note Added: 0006825
2024-07-24 14:32 agadmin Interp Status Proposed => Approved
2024-07-24 14:32 agadmin Note Added: 0006837
2024-07-25 03:57 agadmin Note Edited: 0006837


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker