View Issue Details

IDProjectCategoryView StatusLast Update
00006161003.1(2008)/Issue 7System Interfacespublic2024-06-11 08:52
Reporternick Assigned Toajosey  
PrioritynormalSeverityCommentTypeClarification Requested
Status ClosedResolutionAccepted As Marked 
NameNick Stoughton
OrganizationUSENIX
User Referencenms-mbsnrtowcs-002
Sectionmbsnrtowcs
Page Number1277
Line Number41975
Interp Status---
Final Accepted Text0000616:0001569
Summary0000616: mbsnrtowcs clarification
DescriptionIn austin-group-l:archive/latest/17532 Matthew Dempsky posed the following question:

On Ubuntu 10.04, the code below prints "0 2".  This is the behavior
that I think logically makes sense (and that I was intending to
implement for OpenBSD).

However, my reading of mbsnrtowcs() description in Issue 7 is that the
correct output (assuming "en_US.UTF-8" is a valid UTF-8 based locale)
should be "0 0".

Issue 7 says:

"""
If dst is not a null pointer, the pointer object pointed to by src
shall be assigned either a null pointer (if conversion stopped due to
reaching a terminating null character) or the address just past the
last character converted (if any).
"""

However, in my test program, mbs+2 is in the *middle* of a
[multi-byte] character, not "just past" a [multi-byte] character.
Ubuntu 10.04's behavior would be consistent if the description was
"just past the last input byte consumed".

Am I misunderstanding something?  Or is there a bug in either Ubuntu
10.04's implementation or the POSIX wording?


#include <wchar.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>

wchar_t wcs[100];
char mbs[100];

int main()
{
        setlocale(LC_CTYPE, "en_US.UTF-8");
        memcpy(mbs, "\xe7\x95\x8c", 4);
        const char *s = mbs;
        printf("%u ", (unsigned)mbsnrtowcs(wcs, &s, 2, 100, NULL));
        printf("%u\n", (unsigned)(s - mbs));
}


Further discussion noted that 'C99 does in fact state that
mbstate_t's conversion state includes tracking "the position within a
multibyte character", so multibyte character string inputs do not
necessarily need to be processed exclusively at multibyte character
boundaries. E.g., it's okay to call mbrtowc() to process one byte at
a time of a multibyte string.'

But more importantly, do any implementations of mbsnrtowcs() print "0
0"? Glibc, FreeBSD, and OS X all print "0 2". If no implementation
actually prints "0 0", then I think it makes sense to revise the
wording for mbsnrtowcs() to "just past the last byte processed"
instead of "just past the last multibyte character converted".

---
Given that a number of implementations do not follow the apparent requirements of the standard to process the src string character by character rather than byte by byte, I believe a formal interpretation is required.
Desired ActionAs described in 0000601, at page 1277 line 41977 change:

    past the last character converted (if any)

to:

    past the last byte processed (if any)

At page 1277 line 41986 change:

    ... limited to at most nmc bytes (the size of the input buffer).

to (all within the CX shading):

    ... limited to at most nmc bytes (the size of the input buffer).
    If the input buffer ends with an incomplete character,
    conversion shall stop at the end of the input buffer;
    a subsequent call to mbsnrtowcs() with an input
    buffer that starts with the remainder of the incomplete character
    shall correctly complete the conversion of that character.

    
Assuming that 0000601 is implemented,
at line 1278 line 42008 change FUTURE DIRECTIONS from:

    A future version may require that when the input buffer ends with
    an incomplete character, conversion stops at the end of the input buffer.

to
    None.
Tagsissue8

Relationships

child of 0000601 Closedajosey mbsnrtowcs clarification 

Activities

geoffclare

2013-05-03 10:04

manager   bugnote:0001569

New proposed changes which match 0000601:0001568...

At page 1277 line 41986 after applying the changes in 0000601, change:

    If the input buffer ends with an incomplete character, it
    is unspecified whether conversion stops at the end of the previous
    character (if any), or at the end of the input buffer. In the
    latter case, a subsequent call to mbsnrtowcs() with an input
    buffer that starts with the remainder of the incomplete character
    shall correctly complete the conversion of that character.

to:

    If the input buffer ends with an incomplete character,
    conversion shall stop at the end of the input buffer;
    a subsequent call to mbsnrtowcs() with an input
    buffer that starts with the remainder of the incomplete character
    shall correctly complete the conversion of that character.

Assuming that 0000601 is implemented,
at page 1278 line 42008 change FUTURE DIRECTIONS from:

    A future version may require that when the input buffer ends with
    an incomplete character, conversion stops at the end of the input buffer.

to
    None.

Issue History

Date Modified Username Field Change
2012-09-26 15:47 nick New Issue
2012-09-26 15:47 nick Status New => Under Review
2012-09-26 15:47 nick Assigned To => ajosey
2012-09-26 15:47 nick Name => Nick Stoughton
2012-09-26 15:47 nick Organization => USENIX
2012-09-26 15:47 nick User Reference => nms-mbsnrtowcs-002
2012-09-26 15:47 nick Section => mbsnrtowcs
2012-09-26 15:47 nick Page Number => 1277
2012-09-26 15:47 nick Line Number => 41975
2012-09-26 15:47 nick Interp Status => ---
2012-09-26 15:47 nick Issue generated from: 0000601
2012-09-26 15:47 nick Relationship added child of 0000601
2012-09-26 15:47 nick Tag Attached: issue8
2012-09-26 15:50 nick Desired Action Updated
2012-09-26 15:57 nick Desired Action Updated
2012-09-26 15:59 jim_pugsley Status Under Review => Resolved
2012-09-26 15:59 jim_pugsley Resolution Open => Accepted
2012-09-27 07:26 geoffclare Desired Action Updated
2013-05-03 10:04 geoffclare Note Added: 0001569
2013-05-03 10:04 geoffclare Status Resolved => Under Review
2013-05-03 10:04 geoffclare Resolution Accepted => Reopened
2013-05-16 15:43 msbrown Final Accepted Text => 0000616:0001569
2013-05-16 15:43 msbrown Status Under Review => Resolved
2013-05-16 15:43 msbrown Resolution Reopened => Accepted As Marked
2020-03-23 10:31 geoffclare Status Resolved => Applied
2024-06-11 08:52 agadmin Status Applied => Closed