Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001832 [Issue 8 drafts] System Interfaces Comment Enhancement Request 2024-05-24 21:48 2024-05-27 20:04
Reporter alanc View Status public  
Assigned To
Priority normal Resolution Open  
Status New   Product Version
Name Alan Coopersmith
Organization
User Reference
Section System Interfaces
Page Number (page or range of pages)
Line Number (Line or range of lines)
Final Accepted Text
Summary 0001832: Add preadv() and pwritev()
Description Many implementations offer preadv() and pwritev() interfaces, which are like
the existing readv() and writev() APIs, except that they use specified
positions instead of the current file offsets, just like the existing
pread() and pwrite() versions of read() and write().

FreeBSD:
 - https://man.freebsd.org/cgi/man.cgi?query=preadv [^]
 - https://man.freebsd.org/cgi/man.cgi?query=pwritev [^]
illumos:
 - https://illumos.org/man/2/preadv [^]
 - https://illumos.org/man/2/pwritev [^]
Linux (GNU libc):
 - https://man7.org/linux/man-pages/man2/preadv.2.html [^]
NetBSD:
 - https://man.netbsd.org/preadv.2 [^]
 - https://man.netbsd.org/pwritev.2 [^]
OpenBSD:
 - https://man.openbsd.org/preadv.2 [^]
 - https://man.openbsd.org/pwritev.2 [^]
Solaris:
 - (man pages not online yet, just added in 11.4.69 in May 2024)

Known consumers include PostgreSQL and libuv, as discussed in the thread at
https://twitter.com/MengTangmu/status/1729704220368990235 [^] .
Desired Action Add preadv() and pwritev() to the System Interfaces in Issue 9.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0006800)
Guy Harris (reporter)
2024-05-24 21:56

macOS has it as well (formatted man pages not online at an Apple site, but their GitHub repository has it in the read(2) man page at https://opensource.apple.com/source/xnu/xnu-7195.60.75/bsd/man/man2/read.2.auto.html). [^]
(0006803)
alanc (reporter)
2024-05-25 23:10

If added, preadv() and pwritev() should also be considered to be added to
the list of Cancellation Points in the Thread Cancellation section, to
match readv/writev/pread/pwrite.
(0006804)
philip-guenther (reporter)
2024-05-26 00:03

I suggest that preadv/pwritev be documented to MAY be cancellation points and not MUST be, and even to have a future direction that they be forbidden from being cancellation points.

The argument is that being a cancellation point is proper for 'slow' operations, which may block for an arbitrary length of time and where EINTR would otherwise be an appropriate return-value**. For read/write those conditions can occur on sockets, pipes, FIFOs, and ttys...but none of those are valid for preadv/pwritev!

Yes, this argument applies to pread/pwrite too. IMHO, I think that the standard's current requirement for them should be weakened to a MAY as well with a future direction to completely remove permission for them to be cancellation points.

Compare this with fcntl() which is only required to be a cancellation point when the cmd is F_SETLKW, the only 'unbounded blocking' call. For pread/pwrite/preadv/pwritev we know they can't exhibit that: why is it natural for them to be cancellation points?


** Yes, yes, that's not the precise statement of when cancellation would be expected to be tested, but hopefully the point it clear
(0006805)
Guy Harris (reporter)
2024-05-26 02:43

> and even to have a future direction that they be forbidden from being cancellation points.

Given that macOS has both "preadv" and "preadv_nocancel" system calls, and that the GNU libc used in most Linux systems has both "preadv()" and "__preadv_nocancel()" routines, it's likely that "preadv()" is a cancellation point on both those OSes, so that might cause problems.

Furthermore, on both those OSes (and most if not all other UN*Xes), "slow" operation scan occur on special files other than ttys, *and* those special files might use the seek offset, so there might well be cases where making pthreadv() not a cancellation point would cause problems, even if it doesn't cause problems for POSIX-conformant programs.

So I would recommend against that. Instead, if there is a demand for non-cancellation-point calls, I would recommend, for a future direction, that _nocancel versions of calls be added. I don't know why neither macOS nor GNU(libc)/Linux provide _nocanncel versions of those APIs, but, other than the namespace pollution issues, it would be trivial to provide them, at least in those OSes.
(0006806)
steffen (reporter)
2024-05-27 20:04

This remark surely belongs to Rich Felker of musl, also because he wrote the Linux kernel patch, i think (to remember), but i want to add a note .. or, let me paste parts of a musl commit message instead:

    POSIX requires pwrite to honor the explicit file offset where the
    write should take place even if the file was opened as O_APPEND.
    however, linux historically defined the pwrite syscall family as
    honoring O_APPEND. this cannot be changed on the kernel side due to
    stability policy, but the addition of the pwritev2 syscall with a
    flags argument opened the door to fixing it

    this patch changes the pwrite function to first attempt using the
    pwritev2 syscall with RWF_NOAPPEND, falling back to using the old
    pwrite syscall only after checking that O_APPEND is not set for the
    open file. if O_APPEND is set, the operation fails with EOPNOTSUPP,
    reflecting that the kernel does not support the correct behavior. this
    is an extended error case needed to avoid the wrong behavior that
    happened before (writing the data at the wrong location), and is
    aligned with the spirit of the POSIX requirement that "An attempt to
    perform a pwrite() on a file that is incapable of seeking shall result
    in an error."

    since the pwritev2 syscall interprets the offset of -1 as a request to
    write at the current file offset, it is mapped to a different negative
    value that will produce the expected error.

    pwritev, though not governed by POSIX at this time, is adjusted to
    match pwrite in honoring the offset.

- Issue History
Date Modified Username Field Change
2024-05-24 21:48 alanc New Issue
2024-05-24 21:48 alanc Name => Alan Coopersmith
2024-05-24 21:48 alanc Section => System Interfaces
2024-05-24 21:48 alanc Page Number => (page or range of pages)
2024-05-24 21:48 alanc Line Number => (Line or range of lines)
2024-05-24 21:56 Guy Harris Note Added: 0006800
2024-05-25 23:10 alanc Note Added: 0006803
2024-05-26 00:03 philip-guenther Note Added: 0006804
2024-05-26 02:43 Guy Harris Note Added: 0006805
2024-05-27 20:04 steffen Note Added: 0006806


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker