Anonymous | Login | 2024-09-16 22:49 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0000616 | [1003.1(2008)/Issue 7] System Interfaces | Comment | Clarification Requested | 2012-09-26 15:47 | 2024-06-11 08:52 | ||
Reporter | nick | View Status | public | ||||
Assigned To | ajosey | ||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Nick Stoughton | ||||||
Organization | USENIX | ||||||
User Reference | nms-mbsnrtowcs-002 | ||||||
Section | mbsnrtowcs | ||||||
Page Number | 1277 | ||||||
Line Number | 41975 | ||||||
Interp Status | --- | ||||||
Final Accepted Text | Note: 0001569 | ||||||
Summary | 0000616: mbsnrtowcs clarification | ||||||
Description |
In austin-group-l:archive/latest/17532 Matthew Dempsky posed the following question:On Ubuntu 10.04, the code below prints "0 2". This is the behavior that I think logically makes sense (and that I was intending to implement for OpenBSD). However, my reading of mbsnrtowcs() description in Issue 7 is that the correct output (assuming "en_US.UTF-8" is a valid UTF-8 based locale) should be "0 0". Issue 7 says: """ If dst is not a null pointer, the pointer object pointed to by src shall be assigned either a null pointer (if conversion stopped due to reaching a terminating null character) or the address just past the last character converted (if any). """ However, in my test program, mbs+2 is in the *middle* of a [multi-byte] character, not "just past" a [multi-byte] character. Ubuntu 10.04's behavior would be consistent if the description was "just past the last input byte consumed". Am I misunderstanding something? Or is there a bug in either Ubuntu 10.04's implementation or the POSIX wording? #include <wchar.h> #include <locale.h> #include <string.h> #include <stdio.h> wchar_t wcs[100]; char mbs[100]; int main() { setlocale(LC_CTYPE, "en_US.UTF-8"); memcpy(mbs, "\xe7\x95\x8c", 4); const char *s = mbs; printf("%u ", (unsigned)mbsnrtowcs(wcs, &s, 2, 100, NULL)); printf("%u\n", (unsigned)(s - mbs)); } Further discussion noted that 'C99 does in fact state that mbstate_t's conversion state includes tracking "the position within a multibyte character", so multibyte character string inputs do not necessarily need to be processed exclusively at multibyte character boundaries. E.g., it's okay to call mbrtowc() to process one byte at a time of a multibyte string.' But more importantly, do any implementations of mbsnrtowcs() print "0 0"? Glibc, FreeBSD, and OS X all print "0 2". If no implementation actually prints "0 0", then I think it makes sense to revise the wording for mbsnrtowcs() to "just past the last byte processed" instead of "just past the last multibyte character converted". --- Given that a number of implementations do not follow the apparent requirements of the standard to process the src string character by character rather than byte by byte, I believe a formal interpretation is required. |
||||||
Desired Action |
As described in 0000601, at page 1277 line 41977 change: past the last character converted (if any) to: past the last byte processed (if any) At page 1277 line 41986 change: ... limited to at most nmc bytes (the size of the input buffer). to (all within the CX shading): ... limited to at most nmc bytes (the size of the input buffer). If the input buffer ends with an incomplete character, conversion shall stop at the end of the input buffer; a subsequent call to mbsnrtowcs() with an input buffer that starts with the remainder of the incomplete character shall correctly complete the conversion of that character. Assuming that 0000601 is implemented, at line 1278 line 42008 change FUTURE DIRECTIONS from: A future version may require that when the input buffer ends with an incomplete character, conversion stops at the end of the input buffer. to None. |
||||||
Tags | issue8 | ||||||
Attached Files | |||||||
|
Relationships | ||||||
|
Notes | |
(0001569) geoffclare (manager) 2013-05-03 10:04 |
New proposed changes which match Note: 0001568... At page 1277 line 41986 after applying the changes in 0000601, change: If the input buffer ends with an incomplete character, it is unspecified whether conversion stops at the end of the previous character (if any), or at the end of the input buffer. In the latter case, a subsequent call to mbsnrtowcs() with an input buffer that starts with the remainder of the incomplete character shall correctly complete the conversion of that character. to: If the input buffer ends with an incomplete character, conversion shall stop at the end of the input buffer; a subsequent call to mbsnrtowcs() with an input buffer that starts with the remainder of the incomplete character shall correctly complete the conversion of that character. Assuming that 0000601 is implemented, at page 1278 line 42008 change FUTURE DIRECTIONS from: A future version may require that when the input buffer ends with an incomplete character, conversion stops at the end of the input buffer. to None. |
Issue History | |||
Date Modified | Username | Field | Change |
2012-09-26 15:47 | nick | New Issue | |
2012-09-26 15:47 | nick | Status | New => Under Review |
2012-09-26 15:47 | nick | Assigned To | => ajosey |
2012-09-26 15:47 | nick | Name | => Nick Stoughton |
2012-09-26 15:47 | nick | Organization | => USENIX |
2012-09-26 15:47 | nick | User Reference | => nms-mbsnrtowcs-002 |
2012-09-26 15:47 | nick | Section | => mbsnrtowcs |
2012-09-26 15:47 | nick | Page Number | => 1277 |
2012-09-26 15:47 | nick | Line Number | => 41975 |
2012-09-26 15:47 | nick | Interp Status | => --- |
2012-09-26 15:47 | nick | Issue generated from | 0000601 |
2012-09-26 15:47 | nick | Relationship added | child of 0000601 |
2012-09-26 15:47 | nick | Tag Attached: issue8 | |
2012-09-26 15:50 | nick | Desired Action Updated | |
2012-09-26 15:57 | nick | Desired Action Updated | |
2012-09-26 15:59 | jim_pugsley | Status | Under Review => Resolved |
2012-09-26 15:59 | jim_pugsley | Resolution | Open => Accepted |
2012-09-27 07:26 | geoffclare | Desired Action Updated | |
2013-05-03 10:04 | geoffclare | Note Added: 0001569 | |
2013-05-03 10:04 | geoffclare | Status | Resolved => Under Review |
2013-05-03 10:04 | geoffclare | Resolution | Accepted => Reopened |
2013-05-16 15:43 | msbrown | Final Accepted Text | => Note: 0001569 |
2013-05-16 15:43 | msbrown | Status | Under Review => Resolved |
2013-05-16 15:43 | msbrown | Resolution | Reopened => Accepted As Marked |
2020-03-23 10:31 | geoffclare | Status | Resolved => Applied |
2024-06-11 08:52 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |