View Issue Details

IDProjectCategoryView StatusLast Update
0001356Issue 8 draftsBase Definitions and Headerspublic2024-06-11 09:12
Reporterdannyniu Assigned To 
PrioritynormalSeverityEditorialTypeEnhancement Request
Status ClosedResolutionAccepted As Marked 
Product VersionDraft 1 
NameDannyNiu/NJF
Organization
User Reference
Section3.57 Character
Page Number37
Line Number1241
Final Accepted Text0001356:0004985
Summary0001356: Our definition of character disagrees with that of Unicode.
DescriptionIn Draft 1, a character is being defined as:

> A sequence of one or more bytes representing
> a single graphic symbol or control code.

This definition falls apart when applied in the context of e.g. Arabic text.

The Unicode standard (version 13, page 15) says:

> The Unicode Standard draws a distinction
> between characters and glyphs. Characters are
> the abstract representations of the
> smallest components of written language
> that have semantic value.

Considering Unicode has a radically different goal than POSIX,
I propose the following new definition for consideration
(as the current wording may be intended, even not on purpose)

> A sequence of bytes that is considered an
> individual unit in text processing.

Explanation:

1) A sequence of bytes: remains the same as our original definition.

2) individual unit: graphic symbols in Arabic text are composed of parts that're sometimes stacked on top of each other. Older iOS that didn't take this into account caused the iPhone Arabic Glitch.

3) text processing: this can refer to terminal (and emulators) processing control sequences, and `wc -m` counting characters. Defining characters in terms of text processing lifts the burdon of relying on the concept of "code point" externally defined in the Unicode standard.

Desired ActionConsider applying the the proposed new definition.
Tagsissue8, tc3-2008

Activities

geoffclare

2020-09-10 15:30

manager   bugnote:0004985

On page 37 line 1241 (XBD 3.57 Character), change:
A sequence of one or more bytes representing a single graphic symbol or control code.
to:
A sequence of one or more bytes representing a member of a character set.

On page 3357 line 114412 section A.3 (Character), change:
The term ``character’’ is used to mean a sequence of one or more bytes representing a single graphic symbol.
to:
The term ``character’’ is used to mean a sequence of one or more bytes representing a member of a character set.

Issue History

Date Modified Username Field Change
2020-06-29 12:47 dannyniu New Issue
2020-06-29 12:47 dannyniu Name => DannyNiu/NJF
2020-06-29 12:47 dannyniu Section => 3.57 Character
2020-06-29 12:47 dannyniu Page Number => 37
2020-06-29 12:47 dannyniu Line Number => 1241
2020-09-10 15:30 geoffclare Note Added: 0004985
2020-09-10 15:32 geoffclare Final Accepted Text => 0001356:0004985
2020-09-10 15:32 geoffclare Status New => Resolved
2020-09-10 15:32 geoffclare Resolution Open => Accepted As Marked
2020-09-10 15:32 geoffclare Tag Attached: issue8
2020-09-15 09:25 geoffclare Tag Attached: tc3-2008
2020-09-16 15:11 geoffclare Status Resolved => Applied
2024-06-11 09:12 agadmin Status Applied => Closed