|Anonymous | Login||2022-09-30 22:39 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0001356||[Issue 8 drafts] Base Definitions and Headers||Editorial||Enhancement Request||2020-06-29 12:47||2020-09-16 15:11|
|Priority||normal||Resolution||Accepted As Marked|
|Status||Applied||Product Version||Draft 1|
|Final Accepted Text||Note: 0004985|
|Summary||0001356: Our definition of character disagrees with that of Unicode.|
In Draft 1, a character is being defined as:
> A sequence of one or more bytes representing
> a single graphic symbol or control code.
This definition falls apart when applied in the context of e.g. Arabic text.
The Unicode standard (version 13, page 15) says:
> The Unicode Standard draws a distinction
> between characters and glyphs. Characters are
> the abstract representations of the
> smallest components of written language
> that have semantic value.
Considering Unicode has a radically different goal than POSIX,
I propose the following new definition for consideration
(as the current wording may be intended, even not on purpose)
> A sequence of bytes that is considered an
> individual unit in text processing.
1) A sequence of bytes: remains the same as our original definition.
2) individual unit: graphic symbols in Arabic text are composed of parts that're sometimes stacked on top of each other. Older iOS that didn't take this into account caused the iPhone Arabic Glitch.
3) text processing: this can refer to terminal (and emulators) processing control sequences, and `wc -m` counting characters. Defining characters in terms of text processing lifts the burdon of relying on the concept of "code point" externally defined in the Unicode standard.
|Desired Action||Consider applying the the proposed new definition.|
On page 37 line 1241 (XBD 3.57 Character), change:
A sequence of one or more bytes representing a single graphic symbol or control code.to:
A sequence of one or more bytes representing a member of a character set.
On page 3357 line 114412 section A.3 (Character), change:
The term ``character’’ is used to mean a sequence of one or more bytes representing a single graphic symbol.to:
The term ``character’’ is used to mean a sequence of one or more bytes representing a member of a character set.
|2020-06-29 12:47||dannyniu||New Issue|
|2020-06-29 12:47||dannyniu||Name||=> DannyNiu/NJF|
|2020-06-29 12:47||dannyniu||Section||=> 3.57 Character|
|2020-06-29 12:47||dannyniu||Page Number||=> 37|
|2020-06-29 12:47||dannyniu||Line Number||=> 1241|
|2020-09-10 15:30||geoffclare||Note Added: 0004985|
|2020-09-10 15:32||geoffclare||Final Accepted Text||=> Note: 0004985|
|2020-09-10 15:32||geoffclare||Status||New => Resolved|
|2020-09-10 15:32||geoffclare||Resolution||Open => Accepted As Marked|
|2020-09-10 15:32||geoffclare||Tag Attached: issue8|
|2020-09-15 09:25||geoffclare||Tag Attached: tc3-2008|
|2020-09-16 15:11||geoffclare||Status||Resolved => Applied|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|