Anonymous | Login | 2023-05-28 18:46 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | |||||||
0001635 | [1003.1(2016/18)/Issue7+TC2] Base Definitions and Headers | Editorial | Clarification Requested | 2023-02-21 00:14 | 2023-03-06 16:35 | |||||||
Reporter | steffen | View Status | public | |||||||||
Assigned To | ||||||||||||
Priority | normal | Resolution | Open | |||||||||
Status | New | |||||||||||
Name | steffen | |||||||||||
Organization | ||||||||||||
User Reference | ||||||||||||
Section | iconv | |||||||||||
Page Number | 1123 | |||||||||||
Line Number | 38014 | |||||||||||
Interp Status | --- | |||||||||||
Final Accepted Text | ||||||||||||
Summary | 0001635: iconv: please be more explicit in input-not-convertible case | |||||||||||
Description |
issue 1007 resolves this to If iconv() encounters a character in the input buffer that is valid, but for which an identical character does not exist in the output codeset: If either the //IGNORE or the //NON_IDENTICAL_DISCARD indicator suffix was specified when the conversion descriptor cd was opened, the character shall be discarded but shall still be counted in the return value of the iconv() call. If the //TRANSLIT indicator suffix was specified when the conversion descriptor cd was opened, an implementation-defined transliteration shall be performed, if possible, to convert the character into one or more characters of the output codeset that best resemble the input character. The character shall be counted as one character in the return value of the iconv() call, regardless of the number of output characters. If no indicator suffix was specified when the conversion descriptor cd was opened, or the //TRANSLIT indicator suffix was specified but no transliteration of the character is possible, iconv() shall perform an implementation-defined conversion on the character and it shall be counted in the return value of the iconv() call. However, as Martin Sebor stated in the issue description, The specification for the iconv() function assumes that every input sequence that is valid in the source codeset is convertible to some sequence in the destination codeset. In particular, the specification doesn't allow the function to fail when a valid sequence in the source codeset cannot be represented in the destination codeset. As an example where this assumption doesn't hold, consider a conversion from UTF-8 to ISO-8859 where a large number of source characters don't have equivalents in the destination codeset. A survey of a subset of existing implementations shows that they fail with EILSEQ in such cases, despite the specification defining the error condition as "Input conversion stopped due to an input byte that does not belong to the input codeset." And this is true, GNU C library and GNU libiconv seem to fail output conversion immediately with the same EILSEQ error that denotes invalid input data. (A much more drastic error, .. is it!?!) |
|||||||||||
Desired Action |
Please be more explicit and denote that implementations exist which behave like GNU C-lib iconv / libiconv. That is to say that "implementation defined conversion" may mean no conversion at all, but an immediate stop. It would be tremendous if the standard could define hands that programmers can react upon, because, due to restriction of the iconv interface, it is impossible to decide what the error was. A programmer does know nothing of input nor output character set, how many bytes may make up a character, how many were consumed / produced, whether conversion replacements where stored, or not. (In practice all others known to me do place some character and continue.) This refers to GNU library bug report https://sourceware.org/bugzilla/show_bug.cgi?id=29913 [^] where the honourable author of GNU iconv, and YES!, the GNU approach has lots of merits!, but it should be possible to differentiate in between the errors, Better even would be an explicit //CONVERR-STOP-WITH-ENODATA modifier. refers to gnulib source files where the same approach is implemented portably, it seems, and the cost is tremendous, because of all the shortcomings of the iconv interface! Like approaching cautiously byte-by-byte until a conversion succeeds! for (insize = 1; inptr + insize <= inptr_end; insize++) { res = iconv (cd, (ICONV_CONST char **) &inptr, &insize, &outptr, &outsize); if (!(res == (size_t)(-1) && errno == EINVAL)) break; /* iconv can eat up a shift sequence but give EINVAL while attempting to convert the first character. E.g. libiconv does this. */ if (inptr > inptr_before) { res = 0; break; } } This is ridiculous! |
|||||||||||
Tags | No tags attached. | |||||||||||
Attached Files | ||||||||||||
|
![]() |
|||||||
|
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |