View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000878 | 1003.1(2013)/Issue7+TC1 | System Interfaces | public | 2014-09-19 17:37 | 2019-06-10 08:54 |
Reporter | safinaskar | Assigned To | |||
Priority | normal | Severity | Editorial | Type | Enhancement Request |
Status | Closed | Resolution | Accepted As Marked | ||
Name | Askar Safin | ||||
Organization | |||||
User Reference | |||||
Section | strtok, strtok_r | ||||
Page Number | 2056 | ||||
Line Number | 65693-65725 | ||||
Interp Status | Approved | ||||
Final Accepted Text | 0000878:0002431 | ||||
Summary | 0000878: Is empty s2/sep allowed? What is the meaning of "lasts"? | ||||
Description | 1. First of all, very minor issue. Argument names for "strtok" and "strtok_r" are different. First uses s1 and s2, second - s and sep with same meaning. This, for example, complicates writing such bug reports. 2. Second, the page doesn't say whatever empty string s2/sep is allowed. Please, document this. I think that empty string s2/sep should be allowed and strtok(_r) (if s2/sep is empty) should return remaining string. So, this provides easy way to get the remaining string and to continue parsing it using other tools. I tested Debian GNU/Linux Wheezy 7.6.0, FreeBSD 10.0, OpenBSD 5.5, NetBSD 6.1.4, Oracle Solaris 11.2, Debian GNU/Hurd 2013-03-18 and all this systems support empty s2/sep this way. (Also I still have this systems, so if you want, I can perform some another test on them) (But this system list is, of course, very incomplete. It doesn't include Cygwin and MinGW, it doesn't include very popular MacOS X and it doesn't include HP-UX and AIX) This means that the systems already support this behavior and some apps may already depend on this. So, please allow empty s2/sep and write something like this: "empty s2/sep provides way to get remaining string". I will assume in the remaining part of my bug report that you agree with my point :) 3. Third, the page doesn't say what is the meaning of "lasts". But the page names this argument as "lasts", and this implicitly implies that *lasts is remaining string. So, please do one of the following: A. Rename "lasts" to some other name, for example, "internal_data" or something like that. Explicitly say that this "internal_data" is implementation detail and the application should not depend on it. Say something like that: "The only allowed use of "internal_data" is to pass it to the next strtok_r call. If an application want to get the remaining string, it should do "strtok_r(0, "", internal_data)"" (as I said I assume that you will allow empty s2/sep). B. Explicitly say that "lasts" is the remaining string. I prefer B. I perform another test on that bunch of systems and on them "lasts" is remaining string. So, please do B, because, again, getting remaining string is so useful, because it is already supported in systems, and because some apps may depend on this behavior. Unfortunately, there is differences in "last" meaning. For example, consider the following code: char s[] = "a"; char *lasts; strtok_r(s, "-", &lasts); On GNU/Linux and GNU/Hurd "lasts" will be empty string. But on that three BSD systems and on Solaris this is null pointer. So, please, document and allow all such differences. 4. Fourth, I want to put some notes again. So, de-facto strtok_r currently supports two ways of getting remaining string on real systems. It is lasts and it is empty s2/sep. Getting remaining string is useful feature, so in any case please allow at least one of this two methods. If you disallow both methods, then I think apps continue to use one of them. Also, some apps may depend on one of them and some on another, so the best is to allow both methods. | ||||
Desired Action | Do something (and I hate "Desired Action" field in your bug tracker) | ||||
Tags | tc2-2008 |
|
Also, I think my second point about empty s2/sep should be redirected to C standard :) |
|
Interpretation response ------------------------ For strtok(), the standard clearly states that calls with an empty s2 return the remainder of the string being tokenized, and conforming implementations must conform to this. For strtok_r(), the standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Rationale: ------------- The behavior of strtok() with an empty s2 is clear from this text at line 65704: The strtok() function then searches from there for a byte that is contained in the current separator string. If no such byte is found, the current token extends to the end of the string pointed to by s1. The behavior of strtok_r() with an empty sep is unclear. Nothing regarding the behavior of strtok_r() can be inferred from the naming of the lasts argument. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- On page 2056 lines 65689-65691 (strtok() and strtok_r() synopsis) change: char *strtok(char *restrict s1, const char *restrict s2); char *strtok_r(char *restrict s, const char *restrict sep, char **restrict lasts); to: char *strtok(char *restrict s, const char *restrict sep); char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state); On page 2056 lines 65696-65708, replace all instances of s1 with s and s2 with sep. On page 2056 lines 65714-65725 (strtok_r description), change: The strtok_r() function considers the null-terminated string s as a sequence of zero or more text tokens separated by spans of one or more characters from the separator string sep. The argument lasts points to a user-provided pointer which points to stored information necessary for strtok_r() to continue scanning the same string. to: The strtok_r() function shall be equivalent to strtok(), except that strtok_r() shall be thread-safe and the argument state points to a user-provided pointer that allows strtok_r() to maintain state between calls which scan the same string. The application shall ensure that the pointer pointed to by state is unique for each string (s) being processed concurrently by strtok_r() calls. The application need not initialize the pointer pointed to by state to any particular value. The implementation shall not update the pointer pointed to by state to point (directly or indirectly) to resources, other than within the string s, that need to be freed or released by the caller. On page 2057 after line 65765 (strtok() application usage), insert a new paragraph: Note that if sep is the empty string, strtok() and strtok_r() return a pointer to the remainder of the string being tokenized. |
|
Interpretation Proposed: 27 November 2014 |
|
Interpretation approved: 5 Jan 2015 |
Date Modified | Username | Field | Change |
---|---|---|---|
2014-09-19 17:37 | safinaskar | New Issue | |
2014-09-19 17:37 | safinaskar | Name | => Askar Safin |
2014-09-19 17:37 | safinaskar | Section | => strtok, strtok_r |
2014-09-19 17:37 | safinaskar | Page Number | => strtok, strtok_r |
2014-09-19 17:37 | safinaskar | Line Number | => strtok, strtok_r |
2014-09-19 19:49 | safinaskar | Note Added: 0002395 | |
2014-10-30 15:32 | rhansen | Page Number | strtok, strtok_r => 2056 |
2014-10-30 15:32 | rhansen | Line Number | strtok, strtok_r => 65693-65725 |
2014-10-30 15:32 | rhansen | Interp Status | => --- |
2014-11-06 17:20 | rhansen | Note Added: 0002431 | |
2014-11-06 17:21 | rhansen | Note Edited: 0002431 | |
2014-11-06 17:23 | rhansen | Tag Attached: tc2-2008 | |
2014-11-06 17:24 | rhansen | Interp Status | --- => Pending |
2014-11-06 17:24 | rhansen | Final Accepted Text | => 0000878:0002431 |
2014-11-06 17:24 | rhansen | Status | New => Interpretation Required |
2014-11-06 17:24 | rhansen | Resolution | Open => Accepted As Marked |
2014-11-27 10:33 | ajosey | Interp Status | Pending => Proposed |
2014-11-27 10:33 | ajosey | Note Added: 0002451 | |
2015-01-05 14:13 | ajosey | Interp Status | Proposed => Approved |
2015-01-05 14:13 | ajosey | Note Added: 0002514 | |
2019-06-10 08:54 | agadmin | Status | Interpretation Required => Closed |