Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000073 [1003.1(2008)/Issue 7] System Interfaces Comment Clarification Requested 2009-06-28 15:20 2019-06-10 08:55
Reporter nick View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Nick Stoughton
Organization USENIX
User Reference nms-C-wmemcmp
Section wmemcmp
Page Number 2254
Line Number 70784-70789
Interp Status Approved
Final Accepted Text Note: 0001089
Summary 0000073: wmemcmp C conflict?
Description This issue is for tracking purposes.

The following question is being discussed in the C committee at present, and highlights a difference between C-1990 with AMD-1 and C99. POSIX has followed the C89+AMD1 words, and so is possibly at odds with C99.


===> from Joseph Myers

When are wide string library functions required to handle values of type
wchar_t that do not represent any value in the execution character set,
and when does using such values with a library function result in
undefined behavior?

Consider the following testcase as an example:

#include <stdlib.h>
#include <wchar.h>

wchar_t w0 = WCHAR_MIN;
wchar_t w1 = WCHAR_MAX;

int
main (void)
{
  if (wmemcmp (&w0, &w1, 1) < 0)
    return 0;
  else
    abort ();
}

Suppose that WCHAR_MIN and WCHAR_MAX do not both represent values in the
execution character set. If the arguments to wmemcmp are valid, wmemcmp
must return a value less than 0 because 7.24.4.4 says the comparison is
done the same way as comparing integers of type wchar_t, so the program
must execute successfully. With the GNU C Library, however, it aborts;
wchar_t is UTF-32 but has a signed type so WCHAR_MIN is negative and does
not represent a member of the execution character set.

C90 AMD1 had an explicit statement (7.16.4.6) that made clear that these
inputs were valid (and so wmemcmp had to return a value less than 0 for
the above example in C90 AMD1):

  These functions operate on arrays of type wchar_t whose size is
  specified by a separate count argument. These functions are not
  affected by locale and all wchar_t values are treated identically. The
  null wide character and wchar_t values not corresponding to valid
  multibyte characters are not treated specially.

I cannot however find any equivalent statement in C99. Was this a
deliberate change from AMD1, or a side-effect of how the functions were
rearranged when added to C99?

POSIX repeats the above requirement from C90 AMD1, but I believe this is
an accident of taking the specification from there originally and is not
intended to impose any requirements beyond those of C99.

Much the same issue applies to wcscmp and wcsncmp, where the comparison
semantics are specified but AMD1 has no mention of wide characters not
corresponding to members of the execution character set, and in principle
to other wcs* and wmem* functions that have no reason to need to consider
the semantics of the characters they process (but are less likely than the
comparison functions to have problems with the full set of wchar_t values
in practice).
Desired Action Await for decision from C and if necessary make whatever change to align with the emerging C standard.

Issue an interp to describe the discrepancy.
Tags c99, tc2-2008
Attached Files

- Relationships

-  Notes
(0000280)
nick (manager)
2009-11-05 16:20

The C committee are considering this.
(0000603)
nick (manager)
2010-11-05 13:23

No action has been taken by C on this. This should become a ballot comment on the upcoming C1x CD ballot.
(0001080)
ajosey (manager)
2011-12-15 16:12

Joseph Myers reports in mail sequence 16937:

My comments BSI 14 on this issue were accepted at
the London WG14 meeting and so C1X has the wording "Arguments to the
functions in this subclause may point to arrays containing wchar_t values
that do not correspond to members of the extended character set. Such
values shall be processed according to the specified semantics, except
that it is unspecified whether an encoding error occurs if such a value
appears in the format string for a function in 7.29.2 or 7.29.5 and the
specified semantics do not require that value to be processed by wcrtomb."
(7.29.1#5).
(0001083)
nick (manager)
2011-12-15 18:09
edited on: 2012-01-10 03:04

I believe that the following should be added to XBD Page 454, line 15482:

"Arguments to functions in this list can point to arrays containing wchar_t
values that do not correspond to members of the character set of the current locale. Such values shall be processed according to the specified semantics, unless otherwise stated."

Add the following sentence
"It is unspecified whether an encoding error occurs if the format string
contains wchar_t values that do not correspond to members of the character set of the current locale and the specified semantics do not require that value to be processed by wcrtomb()."

to:

page 973 line 32587 (fwprintf, swprintf, wprintf)
page 983 line 32960 (fwscanf, swscanf, wscanf)

Add the following sentence
"It is unspecified whether an encoding error occurs if the format string
contains wchar_t values that do not correspond to members of the character set of the current locale." to page 2207 line 69521 (wcsftime)

----
Since these changes align with C11, I do not believe that any of them need to be CX shaded. I believe that this change should be covered by an interpretation request (defer to another standard) and resolved in TC2.

(0001089)
geoffclare (manager)
2012-01-12 16:15

Interpretation response
------------------------
The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor.

Rationale:
-------------
A clarification has been made in the C11 standard and POSIX will adopt this wording in the next Technical Corrigendum. Since defect reports are no longer accepted against C99 this change in C11 is being taken as if it were a response to a C99 defect report.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
See Note: 0001083
(0001303)
ajosey (manager)
2012-06-29 16:20

Interpretation proposed 29 June 2012 for final 45 day review
(0001336)
ajosey (manager)
2012-08-30 09:09

Interpretation approved 30 Aug 2012

- Issue History
Date Modified Username Field Change
2009-06-28 15:20 nick New Issue
2009-06-28 15:20 nick Status New => Under Review
2009-06-28 15:20 nick Assigned To => ajosey
2009-06-28 15:20 nick Name => Nick Stoughton
2009-06-28 15:20 nick Organization => USENIX
2009-06-28 15:20 nick User Reference => nms-C-wmemcmp
2009-06-28 15:20 nick Section => wmemcmp
2009-06-28 15:20 nick Page Number => 2254
2009-06-28 15:20 nick Line Number => 70784-70789
2009-06-28 15:22 nick Tag Attached: real bug not in aardvark
2009-08-06 16:19 nick Tag Attached: c99
2009-08-13 15:37 msbrown Tag Detached: real bug not in aardvark
2009-11-05 16:20 nick Note Added: 0000280
2010-11-05 13:23 nick Note Added: 0000603
2010-11-05 14:03 nick Note Added: 0000604
2010-11-05 14:03 nick Note Deleted: 0000604
2011-12-15 16:12 ajosey Note Added: 0001080
2011-12-15 18:09 nick Note Added: 0001083
2012-01-08 18:44 nick Note Edited: 0001083
2012-01-08 18:44 nick Note View State: public: 1083
2012-01-10 03:04 nick Note Edited: 0001083
2012-01-12 16:15 geoffclare Interp Status => Pending
2012-01-12 16:15 geoffclare Note Added: 0001089
2012-01-12 16:15 geoffclare Status Under Review => Interpretation Required
2012-01-12 16:15 geoffclare Resolution Open => Accepted As Marked
2012-01-12 16:16 geoffclare Final Accepted Text => Note: 0001089
2012-01-12 16:16 geoffclare Tag Attached: tc2-2008
2012-06-29 16:20 ajosey Interp Status Pending => Proposed
2012-06-29 16:20 ajosey Note Added: 0001303
2012-08-30 09:09 ajosey Interp Status Proposed => Approved
2012-08-30 09:09 ajosey Note Added: 0001336
2019-06-10 08:55 agadmin Status Interpretation Required => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker