Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001356 [Issue 8 drafts] Base Definitions and Headers Editorial Enhancement Request 2020-06-29 12:47 2020-09-16 15:11
Reporter dannyniu View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Applied   Product Version Draft 1
Name DannyNiu/NJF
Organization
User Reference
Section 3.57 Character
Page Number 37
Line Number 1241
Final Accepted Text Note: 0004985
Summary 0001356: Our definition of character disagrees with that of Unicode.
Description In Draft 1, a character is being defined as:

> A sequence of one or more bytes representing
> a single graphic symbol or control code.

This definition falls apart when applied in the context of e.g. Arabic text.

The Unicode standard (version 13, page 15) says:

> The Unicode Standard draws a distinction
> between characters and glyphs. Characters are
> the abstract representations of the
> smallest components of written language
> that have semantic value.

Considering Unicode has a radically different goal than POSIX,
I propose the following new definition for consideration
(as the current wording may be intended, even not on purpose)

> A sequence of bytes that is considered an
> individual unit in text processing.

Explanation:

1) A sequence of bytes: remains the same as our original definition.

2) individual unit: graphic symbols in Arabic text are composed of parts that're sometimes stacked on top of each other. Older iOS that didn't take this into account caused the iPhone Arabic Glitch.

3) text processing: this can refer to terminal (and emulators) processing control sequences, and `wc -m` counting characters. Defining characters in terms of text processing lifts the burdon of relying on the concept of "code point" externally defined in the Unicode standard.

Desired Action Consider applying the the proposed new definition.
Tags issue8, tc3-2008
Attached Files

- Relationships

-  Notes
(0004985)
geoffclare (manager)
2020-09-10 15:30

On page 37 line 1241 (XBD 3.57 Character), change:
A sequence of one or more bytes representing a single graphic symbol or control code.
to:
A sequence of one or more bytes representing a member of a character set.

On page 3357 line 114412 section A.3 (Character), change:
The term ``character’’ is used to mean a sequence of one or more bytes representing a single graphic symbol.
to:
The term ``character’’ is used to mean a sequence of one or more bytes representing a member of a character set.

- Issue History
Date Modified Username Field Change
2020-06-29 12:47 dannyniu New Issue
2020-06-29 12:47 dannyniu Name => DannyNiu/NJF
2020-06-29 12:47 dannyniu Section => 3.57 Character
2020-06-29 12:47 dannyniu Page Number => 37
2020-06-29 12:47 dannyniu Line Number => 1241
2020-09-10 15:30 geoffclare Note Added: 0004985
2020-09-10 15:32 geoffclare Final Accepted Text => Note: 0004985
2020-09-10 15:32 geoffclare Status New => Resolved
2020-09-10 15:32 geoffclare Resolution Open => Accepted As Marked
2020-09-10 15:32 geoffclare Tag Attached: issue8
2020-09-15 09:25 geoffclare Tag Attached: tc3-2008
2020-09-16 15:11 geoffclare Status Resolved => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker