|Anonymous | Login||2022-08-13 21:09 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0000708||[1003.1(2013)/Issue7+TC1] System Interfaces||Editorial||Enhancement Request||2013-06-07 21:23||2013-06-13 15:35|
|Section||XSH 2.9.1 Thread-Safety|
|Final Accepted Text|
|Summary||0000708: Make mblen, mbtowc, and wctomb thread-safe for alignment with C11|
Per C11 7.1.4 paragraph 5,
"Unless explicitly stated otherwise in the detailed descriptions that follow, library functions shall prevent data races as follows: A library function shall not directly or indirectly access objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's arguments. A library function shall not directly or indirectly modify objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's non-const arguments. Implementations may share their own internal objects between threads if the objects are not visible to users and are protected against data races."
7.22.7 (Multibyte/wide character conversion functions) does not specify that these functions are not required to avoid data races with other calls. The only time they would even potentially be subject to data races is for state-dependent encodings, which are all but obsolete; for single-byte or modern multi-byte (i.e. UTF-8) encodings, these functions are pure.
Note that 188.8.131.52 (Restartable multibyte/wide character conversion functions) does make exceptions that the "r" versions of these functions are not required to avoid data races when the state argument is NULL.
Remove mblen, mbtowc, and wctomb from the list of functions which are not required to be thread-safe.
It seems odd that C11 would have different thread-safety requirements
for mbrlen, mbrtowc, and wcrtomb with a null state argument than for
mblen, mbtowc, and wctomb. We should query this with the C committee,
as it may well be unintentional.
I think there's a very good reason for the discrepancy: the restartable versions can store a partially-decoded character in the mbstate_t object, so even for state-independent encodings, there is state which would need to be protected against data races. The non-restartable versions, on the other hand, are pure except in the case of state-dependent encodings, which are mostly a relic of the past and which were never supported on most POSIX systems, since these encodings are mostly incompatible with POSIX filesystem semantics. Only implementations supporting such encodings (which might not even exist - can anyone confirm?) would incur the burden of avoiding data races. Note that these functions give applications access to information on whether the locale's encoding is state-dependent, so a portable application could use the restartable interfaces when the locale is state-dependent, and the non-restartable ones otherwise.
As to the motivation behind my request for this change, I have spent a good deal of time investigating the performance bottlenecks in character-at-a-time multibyte processing, and it turns out that there is a fundamental bottleneck in the restartable interfaces due to their interface requirements for handling the ps argument and partially-decoded characters. For applications which don't need partial-character processing capability, I believe it would make sense to encourage a transition to the non-restartable interfaces, but of course this is problematic if the non-restartable interfaces are not thread-safe. In my experiments, I found the non-restartable interfaces capable of reaching roughly a 50% performance advantage over the restartable ones; this difference would of course become even more extreme if the core decoding algorithms were further optimized.
|This will be raised as a potential defect with the C committee, and any decision on how to proceed should be made there first.|
|2013-06-07 21:23||dalias||New Issue|
|2013-06-07 21:23||dalias||Name||=> Rich Felker|
|2013-06-07 21:23||dalias||Organization||=> musl libc|
|2013-06-07 21:23||dalias||Section||=> XSH 2.9.1 Thread-Safety|
|2013-06-07 21:23||dalias||Page Number||=> unknown|
|2013-06-07 21:23||dalias||Line Number||=> unknown|
|2013-06-08 09:12||geoffclare||Note Added: 0001647|
|2013-06-08 09:13||geoffclare||Tag Attached: C11|
|2013-06-08 12:14||dalias||Note Added: 0001648|
|2013-06-13 15:35||nick||Note Added: 0001651|
|2013-12-03 21:09||torvald||Issue Monitored: torvald|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|