Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000878 [1003.1(2013)/Issue7+TC1] System Interfaces Editorial Enhancement Request 2014-09-19 17:37 2015-01-05 14:13
Reporter safinaskar View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Interpretation Required  
Name Askar Safin
Organization
User Reference
Section strtok, strtok_r
Page Number 2056
Line Number 65693-65725
Interp Status Approved
Final Accepted Text Note: 0002431
Summary 0000878: Is empty s2/sep allowed? What is the meaning of "lasts"?
Description 1. First of all, very minor issue. Argument names for "strtok" and "strtok_r" are different. First uses s1 and s2, second - s and sep with same meaning. This, for example, complicates writing such bug reports.

2. Second, the page doesn't say whatever empty string s2/sep is allowed. Please, document this. I think that empty string s2/sep should be allowed and strtok(_r) (if s2/sep is empty) should return remaining string. So, this provides easy way to get the remaining string and to continue parsing it using other tools.

I tested Debian GNU/Linux Wheezy 7.6.0, FreeBSD 10.0, OpenBSD 5.5, NetBSD 6.1.4, Oracle Solaris 11.2, Debian GNU/Hurd 2013-03-18 and all this systems support empty s2/sep this way.

(Also I still have this systems, so if you want, I can perform some another test on them)
(But this system list is, of course, very incomplete. It doesn't include Cygwin and MinGW, it doesn't include very popular MacOS X and it doesn't include HP-UX and AIX)

This means that the systems already support this behavior and some apps may already depend on this.

So, please allow empty s2/sep and write something like this: "empty s2/sep provides way to get remaining string".

I will assume in the remaining part of my bug report that you agree with my point :)

3. Third, the page doesn't say what is the meaning of "lasts". But the page names this argument as "lasts", and this implicitly implies that *lasts is remaining string. So, please do one of the following:

A. Rename "lasts" to some other name, for example, "internal_data" or something like that. Explicitly say that this "internal_data" is implementation detail and the application should not depend on it. Say something like that: "The only allowed use of "internal_data" is to pass it to the next strtok_r call. If an application want to get the remaining string, it should do "strtok_r(0, "", internal_data)"" (as I said I assume that you will allow empty s2/sep).
B. Explicitly say that "lasts" is the remaining string.

I prefer B. I perform another test on that bunch of systems and on them "lasts" is remaining string. So, please do B, because, again, getting remaining string is so useful, because it is already supported in systems, and because some apps may depend on this behavior.

Unfortunately, there is differences in "last" meaning. For example, consider the following code:

char s[] = "a";
char *lasts;
strtok_r(s, "-", &lasts);

On GNU/Linux and GNU/Hurd "lasts" will be empty string. But on that three BSD systems and on Solaris this is null pointer. So, please, document and allow all such differences.

4. Fourth, I want to put some notes again. So, de-facto strtok_r currently supports two ways of getting remaining string on real systems. It is lasts and it is empty s2/sep. Getting remaining string is useful feature, so in any case please allow at least one of this two methods. If you disallow both methods, then I think apps continue to use one of them. Also, some apps may depend on one of them and some on another, so the best is to allow both methods.
Desired Action Do something (and I hate "Desired Action" field in your bug tracker)
Tags tc2-2008
Attached Files

- Relationships

-  Notes
(0002395)
safinaskar (reporter)
2014-09-19 19:49

Also, I think my second point about empty s2/sep should be redirected to C standard :)
(0002431)
rhansen (manager)
2014-11-06 17:20
edited on: 2014-11-06 17:21

Interpretation response
------------------------
For strtok(), the standard clearly states that calls with an empty s2 return the remainder of the string being tokenized, and conforming implementations must conform to this.
For strtok_r(), the standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor.

Rationale:
-------------
The behavior of strtok() with an empty s2 is clear from this text at line 65704:
The strtok() function then searches from there for a byte that is contained in the current separator string. If no such byte is found, the current token extends to the end of the string pointed to by s1.

The behavior of strtok_r() with an empty sep is unclear.
Nothing regarding the behavior of strtok_r() can be inferred from the naming of the lasts argument.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
On page 2056 lines 65689-65691 (strtok() and strtok_r() synopsis) change:
char *strtok(char *restrict s1, const char *restrict s2);
char *strtok_r(char *restrict s, const char *restrict sep,
    char **restrict lasts);

to:
char *strtok(char *restrict s, const char *restrict sep);
char *strtok_r(char *restrict s, const char *restrict sep,
    char **restrict state);

On page 2056 lines 65696-65708, replace all instances of s1 with s and s2 with sep.

On page 2056 lines 65714-65725 (strtok_r description), change:
The strtok_r() function considers the null-terminated string s as a sequence of zero or more text tokens separated by spans of one or more characters from the separator string sep. The argument lasts points to a user-provided pointer which points to stored information necessary for strtok_r() to continue scanning the same string.

In the first call to strtok_r(), s points to a null-terminated string, sep to a null-terminated string of separator characters, and the value pointed to by lasts is ignored. The strtok_r() function shall return a pointer to the first character of the first token, write a null character into s immediately following the returned token, and update the pointer to which lasts points.

In subsequent calls, s is a null pointer and lasts shall be unchanged from the previous call so that subsequent calls shall move through the string s, returning successive tokens until no tokens remain. The separator string sep may be different from call to call. When no token remains in s, a null pointer shall be returned.

to:
The strtok_r() function shall be equivalent to strtok(), except that strtok_r() shall be thread-safe and the argument state points to a user-provided pointer that allows strtok_r() to maintain state between calls which scan the same string. The application shall ensure that the pointer pointed to by state is unique for each string (s) being processed concurrently by strtok_r() calls. The application need not initialize the pointer pointed to by state to any particular value. The implementation shall not update the pointer pointed to by state to point (directly or indirectly) to resources, other than within the string s, that need to be freed or released by the caller.

On page 2057 after line 65765 (strtok() application usage), insert a new paragraph:
Note that if sep is the empty string, strtok() and strtok_r() return a pointer to the remainder of the string being tokenized.


(0002451)
ajosey (manager)
2014-11-27 10:33

Interpretation Proposed: 27 November 2014
(0002514)
ajosey (manager)
2015-01-05 14:13

Interpretation approved: 5 Jan 2015

- Issue History
Date Modified Username Field Change
2014-09-19 17:37 safinaskar New Issue
2014-09-19 17:37 safinaskar Name => Askar Safin
2014-09-19 17:37 safinaskar Section => strtok, strtok_r
2014-09-19 17:37 safinaskar Page Number => strtok, strtok_r
2014-09-19 17:37 safinaskar Line Number => strtok, strtok_r
2014-09-19 19:49 safinaskar Note Added: 0002395
2014-10-30 15:32 rhansen Page Number strtok, strtok_r => 2056
2014-10-30 15:32 rhansen Line Number strtok, strtok_r => 65693-65725
2014-10-30 15:32 rhansen Interp Status => ---
2014-11-06 17:20 rhansen Note Added: 0002431
2014-11-06 17:21 rhansen Note Edited: 0002431
2014-11-06 17:23 rhansen Tag Attached: tc2-2008
2014-11-06 17:24 rhansen Interp Status --- => Pending
2014-11-06 17:24 rhansen Final Accepted Text => Note: 0002431
2014-11-06 17:24 rhansen Status New => Interpretation Required
2014-11-06 17:24 rhansen Resolution Open => Accepted As Marked
2014-11-27 10:33 ajosey Interp Status Pending => Proposed
2014-11-27 10:33 ajosey Note Added: 0002451
2015-01-05 14:13 ajosey Interp Status Proposed => Approved
2015-01-05 14:13 ajosey Note Added: 0002514


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker