Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000879 [1003.1(2013)/Issue7+TC1] System Interfaces Editorial Enhancement Request 2014-10-03 11:44 2015-02-12 17:45
Reporter ErikCederstrand View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Resolved  
Name Erik Cederstrand
Organization
User Reference
Section functions/strptime
Page Number 2041-2043
Line Number 65182-65231
Interp Status ---
Final Accepted Text Note: 0002545
Summary 0000879: strptime is missing conversion specifiers described in strftime
Description strptime omits several useful conversion specifiers that are described in strftime, most notably the timezone specifiers z, Z and ISO year, week and weekday specifiers G, V and u. Equivalent date calculations cannot be performed using any of the other specifiers specified in strptime.
Desired Action At least Scandinavian countries use ISO weeks and ISO days extensively in everyday communication and planning. strptime should be able to interpret at least the G, V and u specifiers.

It is not possible to unambiguously calculate the date from a year, ISO week number and weekday using specifiers Y, W/U and w, since some calendar (Y) years contain two weeks with week number 52. There are also edge cases around Sundays at the beginning and end of the year.

Being able to provide a timezone along with the naive date input would also be useful, but I'm not sure if this should be implemented using the locale instead.
Tags issue8
Attached Files

- Relationships
related to 0000920Interpretation Required missing %F description in strptime 

-  Notes
(0002509)
rhansen (manager)
2014-12-19 06:45
edited on: 2015-01-08 17:01

This was discussed during today's teleconference and I have some questions:
  • What do implementations do with %z and %Z? Do all implementations that support these specifiers also have time zone members in struct tm? Do they parse the time zone in the input string but ignore the value? Do they convert the resulting time to UTC or the system's time zone?
  • What do implementations do when there is an ambiguity? For example, if the input is "2012" and the format string is "%G", what do implementations put in the tm_year member? Or does strptime() return NULL?
  • What do implementations do when an input value doesn't make sense? For example, if the input is "2014 53" and the format string is "%G %V", what happens?


(0002521)
rhansen (manager)
2015-01-08 19:12

For time zones and problematic cases, perhaps we can specify something along these lines (but this probably doesn't match the behavior of existing implementations):
  • it is implementation-defined whether the implementation:
    • adds a timezone member to the struct tm, or
    • gives struct tm an implied timezone of either the system's time zone or UTC (implementation defined which one)
  • if the input does not specify a time zone, it is implementation-defined whether the implementation assumes an implied time zone of either:
    • UTC, or
    • the system's local time zone
    note that if struct tm doesn't have a time zone member, the time zone chosen for zone-less input does not have to match the choice made for the implied time zone of struct tm
  • if struct tm has a time zone member, then it shall be set to the time zone of the input string (no time zone conversion is allowed if struct tm has a time zone member)
  • if struct tm doesn't have a time zone member, it is implementation-defined whether the implementation:
    • ignores the input's time zone (parses the time zone string if there is one, but discards the input's time zone value and instead pretends as if the input's time zone matched the implied time zone of struct tm), or
    • converts the described time to the implied time zone of struct tm before setting the members of struct tm (hmm, maybe this option should not be allowed because implementations would not be able to add a time zone member to struct tm without breaking backward compatibility. but that's a quality of implementation decision I guess...)
  • the input string should completely describe a single time period. examples:
    • the string "2015-01-08" describes a single 24-hour time period. This is either 2015-01-08T00:00:00Z through 2015-01-08T24:00:00Z or that day in the local time zone (e.g., 2015-01-08T00:00:00-05:00 through 2015-01-08T24:00:00-05:00), depending on which time zone the system assumes if one is not included in the input string.
    • the string "2015-01-08T12:38:47" describes a single 1-second time period (though which 1-section time period depends on the time zone assumed by the implementation for zone-less input strings)
  • if the input does completely describe a single time period, then the struct members are set to the earliest moment in that time period (e.g., hours, seconds, and minutes are all set to 0 if the input is just a date)
  • if the input can describe multiple disjoint time periods, then implementations choose one of the following:
    • error (return NULL). (this option is probably not good because then "Tue" with format string "%a" would be an error)
    • the fields in struct tm that are ambiguous are not set (could this result in undesirable underfilling of struct members?)
    • a valid time period is arbitrarily chosen, and the fields in struct tm are set to the earliest moment in that chosen time period
  • if the input is nonsensical (no time can be described by the input string), it errors out (returns NULL)
(0002525)
martinr (reporter)
2015-01-15 15:49
edited on: 2015-01-15 16:03

These are my observations valid for Solaris 11.2:
  • There is no timezone member in struct tm.
  • Man page specifies ranges [l,h]. Parsed values are checked against that ranges. If any check fails strptime returns NULL.
  • The range checks inside strptime() are trivial. It does not do any evaluation whether the result makes sense.

strptime(3C) man page says:
%G      Week-based year, including  the  century  [0000,9999]; leading zero is
        permitted but not required.

%u      Weekday as a decimal number [1,7], with 1 representing Monday.

%U      Week number of the year as a  decimal  number  [0,53], with Sunday as
        the first day of the week; leading zero is permitted but not required.

%V      The ISO 8601 week number as a decimal number  [01,53].  In  the  ISO
        8601 week-based system, weeks begin on a Monday and week  1  of  the
        year  is the  week  that includes  both  January  4th and the first
        Thursday of the year. If the first Monday of January is  the  2nd, 3rd,
        or  4th, the preceding days are part of the last week of the preceding
        year.

%w      Weekday as a decimal number [0,6], with 0 representing Sunday.

%W      Week number of the year as a  decimal  number  [0,53], with Monday as
        the first day of the week; leading zero is permitted but not required.

%z      Offset from UTC in ISO 8601:2004 standard basic format (+hhmm  or
        -hhmm), or no characters if no time zone is determinable.

%Z      Time zone name  or  no  characters  if  no  time  zone exists.

For %z and %Z the implementation just checks ranges. It does not update struct tm. For %Z it updates f_isdst in tm.

Questions:
Q: What do implementations do with %z and %Z? Do all implementations that support these specifiers also have time zone members in struct tm? Do they parse the time zone in the input string but ignore the value? Do they convert the resulting time to UTC or the system's time zone?
A: There is no timezone member in struct tm. The value is ignored.

Q: What do implementations do when there is an ambiguity? For example, if the input is "2012" and the format string is "%G", what do implementations put in the tm_year member? Or does strptime() return NULL?
A: If an ambiguity is detected there is NULL on the return of the strptime().

Q: What goes in the struct tm members if the format specifier and input doesn't provide enough information to unambiguously determine the time value? For example, if format is "%G %m" and the input is "2014 1" then tm_year must be 114, but if the input is "2014 12" then tm_year might be 114 or it might be 115.
A: strptime("2014 1", "%G %m", t) produces "Sun Jan 00 00:00:00 2014", strptime("2014 12", "%G %m", t) produces "Sun Dec 00 00:00:00 2014".

Q: What happens if the input is nonsensical but struct tm supports it? e.g., format is "%Y-%m-%d %u" and input is "2014-12-18 1"
A: As you expect the result is "Mon Dec 18 00:00:00 2014"

Q: What happens if the input is nonsensical but struct tm doesn't support it? e.g., format is "%G %V" and input is "2014 53"
A: There is "Sun Dec 35 00:00:00 2014" on the output. If the input is "2014 54" there is NULL on the output.

(0002526)
joerg (reporter)
2015-01-21 14:53
edited on: 2015-02-12 17:48

A comment to Note: 0002525:

%Z is not ignored on Solaris, %Z rather controls how/whether tm_isdst is updated.
%z causes an abort with return (NULL) in OpenSolaris (S11 Build 147).
%G also causes a return (NULL).
%V "
%u "

So ignoring %z %G %V %u seems to have been added later by Oracle.

(0002533)
ajosey (manager)
2015-01-30 11:49

This was discussed on the 29 January 2015 call, and there was general agreement with the current approach in OpenSolaris. An action was assigned.
(0002536)
martinr (reporter)
2015-02-05 15:12
edited on: 2015-02-12 17:47

This is an additional information to Note: 0002525 how strptime() on Solaris 11 Update 2 behaves.

Error condition is signaled by NULL return code. Error cases:

%G  Input differs from value stored during current strptime call.

%u  Input is not a number in range [1,7].
    Input differs from value stored during current strptime call.

%V  Input is not a number in range [1,53].
    Input differs from value stored during current strptime call.

%z  s is neither + nor -. (Expected format is shhmm.)
    hh is not a number in range [0,12].
    mm is not a number in range [0,59].
    hh is 12 and mm is greater than 0.

%Z specification:

If input equals to tzname[1] and tzname[0] and tzname[1] differs => tm->tm_isdst = 1.
If input equals to tzname[0] => tm->tm_isdst = 0.

(0002545)
rhansen (manager)
2015-02-12 17:39
edited on: 2015-02-12 17:46

On page 2041 line 65181 (strptime() DESCRIPTION), after applying the change for 0000920, change:
or if a field width is specified for any conversion specifier other than <tt>C</tt> or <tt>Y</tt>.
to:
or if a field width is specified for any conversion specifier other than <tt>C</tt>, <tt>F</tt>, <tt>G</tt>, <tt>Y</tt>, or <tt>Z</tt>.
On page 2042 after line 65194 (strptime() DESCRIPTION) insert:
<tt>F</tt> This specifier is similar to <tt>%Y-%m-%d</tt> where the characters up to the first <hyphen-minus> separator shall be converted as for %Y but with unlimited field width, the characters between the two <hyphen-minus> separators shall be converted as for %m, and the characters after the last <hyphen-minus> separator shall be converted as for %d. If a field width is specified, each of the %Y, %m, and %d conversions shall not convert any characters past the overall %F field width.

<tt>g</tt> The last 2 digits of the week-based year (see below) as a decimal number (for example, 77). Leading zeros shall be permitted but shall not be required. A leading <tt>'+'</tt> or <tt>'-'</tt> character shall be permitted before any leading zeros but shall not be required. The effect of this year, if any, on the tm structure pointed to by tm is unspecified.

<tt>G</tt> The week-based year (see below) as a decimal number (for example, 1977). Leading zeros shall be permitted but shall not be required. A leading <tt>'+'</tt> or <tt>'-'</tt> character shall be permitted before any leading zeros but shall not be required. The effect of this year, if any, on the tm structure pointed to by tm is unspecified.
On page 2042 after line 65212 insert:
<tt>u</tt> The weekday as a decimal number [1,7], with 1 representing Monday.
On page 2042 line 65214 add a sentence to the U conversion:
The effect of this week number, if any, on the tm structure pointed to by tm is unspecified.
On page 2042 after line 65214 insert:
<tt>V</tt> The week number of the week-based year (see below) as a decimal number [01,53]. Leading zeros shall be permitted but shall not be required. The effect of this week number, if any, on the tm structure pointed to by tm is unspecified.
On page 2042 line 65217 add a sentence to the W conversion:
The effect of this week number, if any, on the tm structure pointed to by tm is unspecified.
On page 2043 after line 65230 insert:
<tt>z</tt> The offset from UTC in the ISO 8601:2004 standard format (<tt>+hhmm</tt> or <tt>-hhmm</tt>). For example, <tt>"-0430"</tt> means 4 hours 30 minutes behind UTC (west of Greenwich). The effect of this offset, if any, on the tm structure pointed to by tm is unspecified.

<tt>Z</tt> The timezone name. If this name matches <tt>tzname[1]</tt>, and <tt>tzname[0]</tt> and <tt>tzname[1]</tt> differ, then the <tt>tm_isdst</tt> field of the tm structure pointed to by tm shall be set to 1. Otherwise, if this name matches <tt>tzname[0]</tt> then the <tt>tm_isdst</tt> field of the tm structure pointed to by tm shall be set to 0. Any other effects on the tm structure pointed to by tm are unspecified.
On page 2043 after line 65252 insert:
<tt>%OV</tt> The same as %V but using the locale's alternative numeric symbols.
On page 2043 after line 65257 insert:
<tt>%g</tt>, <tt>%G</tt>, and <tt>%V</tt> convert values according to the ISO 8601:2004 standard week-based year. In this system, weeks begin on a Monday and week 1 of the week-based year is the week that includes January 4th, which is also the week that includes the first Thursday of the year, and is also the first week that contains at least four days in the year. If the first Monday of January is the 2nd, 3rd, or 4th, the preceding days are part of the last week of the preceding week-based year (thus, the string <tt>"1998 53 6"</tt> with format specifier <tt>"%G %V %u"</tt> represents Saturday 2nd January 1999). If December 29th, 30th, or 31st is a Monday, it and any following days are part of week 1 of the following week-based year (thus, the string <tt>"1998 01 2"</tt> with format specifier <tt>"%G %V %u"</tt> represents Tuesday 30th December 1997).



- Issue History
Date Modified Username Field Change
2014-10-03 11:44 ErikCederstrand New Issue
2014-10-03 11:44 ErikCederstrand Name => Erik Cederstrand
2014-10-03 11:44 ErikCederstrand Section => functions/strptime
2014-10-03 11:44 ErikCederstrand Page Number => -
2014-10-03 11:44 ErikCederstrand Line Number => -
2014-10-03 11:45 ErikCederstrand Issue Monitored: ErikCederstrand
2014-12-18 16:44 rhansen Page Number - => 2041-2043
2014-12-18 16:44 rhansen Line Number - => 65182-65231
2014-12-18 16:44 rhansen Interp Status => ---
2014-12-19 06:45 rhansen Note Added: 0002509
2015-01-08 17:01 rhansen Note Edited: 0002509
2015-01-08 19:12 rhansen Note Added: 0002521
2015-01-15 15:49 martinr Note Added: 0002525
2015-01-15 15:52 martinr Note Edited: 0002525
2015-01-15 16:03 martinr Note Edited: 0002525
2015-01-21 14:53 joerg Note Added: 0002526
2015-01-30 11:49 ajosey Note Added: 0002533
2015-02-05 15:12 martinr Note Added: 0002536
2015-02-05 15:25 martinr Note Edited: 0002536
2015-02-05 15:26 martinr Note Edited: 0002536
2015-02-06 09:42 geoffclare Relationship added related to 0000920
2015-02-12 17:39 rhansen Note Added: 0002545
2015-02-12 17:43 rhansen Tag Attached: issue8
2015-02-12 17:45 rhansen Final Accepted Text => Note: 0002545
2015-02-12 17:45 rhansen Status New => Resolved
2015-02-12 17:45 rhansen Resolution Open => Accepted As Marked
2015-02-12 17:46 rhansen Note Edited: 0002545
2015-02-12 17:47 Don Cragun Note Edited: 0002536
2015-02-12 17:48 Don Cragun Note Edited: 0002526


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker