Austin Group Defect Tracker

Aardvark Mark IV

Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000787 [1003.1(2013)/Issue7+TC1] System Interfaces Editorial Omission 2013-11-05 19:59 2014-03-25 13:42
Reporter hch View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Interpretation Required  
Name Your Name Here
User Reference
Section posix_fallocate
Page Number 1425
Line Number 47064-47074
Interp Status Approved
Final Accepted Text See Note: 0002089.
Summary 0000787: posix_fallocate does not specify state of data in the affected region
Description The text describing posix_fallocate does not specify what will happen to data in the range affected by the call.

So far all implementations I have seen zero space that hasn't been allocated, but behavior for regions already containing data vary, in that some implementations also zero it and lose the data while others leave the data unaffected.
Desired Action Clarify the affect of posix_fallocate on the data contained in the file, precisely that the affect on already written regions is undefined while previously unwritten regions should return zero for reads.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
shware_systems (reporter)
2013-11-06 00:00
edited on: 2013-11-06 00:15

Actually, I believe the intended effect is data already allocated is expected to be ignored and unchanged as it's already been ensured that part is allocated. The only exception would be for file systems that support sparse files. If the range specified is in or overlaps a sparse data region garbage data may need to be written to make that region non-sparse. How the implementation handles re-zeroing this area if the area is unlocked or the file is closed before the application overwrites that garbage, or reporting zeros instead of that garbage on reads, is implementation defined. The application is expected to complete its writes before trying to read from there again.

For new allocations it's undefined whether zeros or any other data pattern is written to initialize it, or whether anything is written at all - a read may even see data from a previously deleted file that hasn't been wiped. What is needed depends on the file system a bit, if it supports sparse files or not; administrative level settings on cache control and security over speed, which are out of scope for the standard; and the posix_fadvise() settings if they're supported. Only the minimum disk changes needed to ensure what subsequent data writes the application makes will succeed needs to occur. Again, the application cannot presume any area it doesn't overwrite will be reliably readable.

In summary, as I see it, reads of prior written data are supposed to stay unchanged, and it is undefined what is returned from new areas. I can see posix_fadvise() might be extended to support the application providing its own initialization value to fill file expansions for use specifically with posix_fallocate(), but that would be non-TC matter as argument usage would be overriden.

Suggested Resolution:
At Line 47067, Append:
"An attempt to perform a posix_fallocate( ) on a file that is incapable of seeking shall result in an error."

At Line 47071, Append:
"For regions that overlap previously allocated file space, the implementation shall ensure a read() of data from that overlapped area is unchanged. It is undefined what data is read from newly allocated space. When offset is larger than the current length of the file any padding allocation is/isn't* to be initialized with zero byte values.

When the fd argument has the O_APPEND flag set, and offset is larger than the current size plus 1 it shall be ignored and the file simply extended by len bytes. No error shall be returned if this adjustment is made. The descriptors' current write position shall not change and writes to the file shall be as if O_APPEND was not in effect until the current position is set to a value outside the region by a call to seek() or write()."

[* Ed. note: I think 'is' or 'isn't' there needs to be specified, or explicitly an implementation choice how initialized. Just extending the file size leaves it implicitly undefined, but this may cause difficulties with other fd's reading the file. The alternative could be when offset larger than size it's treated as an append of len bytes, but that's problematic. I'm open to this being considered an error as well. pwrite() doesn't adequately address this either, nor the effects when O_APPEND is set. The paragraph at Line 72445 should be edited also, as a separate Bug.]

At Line 47100, Replace "None." with:
"A future version of the standard may provide a means to specify an initialization value or values for newly allocated space when a file is extended."

At Line 72479, write(), Change "or socket." to:
"socket, or regular file that does not support seek operations."

At Line 47089, Replace with copy of edited Line 72479.

[Note: The last two changes are for tape drives that may have a file system for random file read access, but only allows sequential reads when file open and only the last file on the tape may be appended to for writing.]

hch (reporter)
2013-11-06 14:04

Actually the only implementation where I assumed it modifies that data didn't so so by just rewriting the byte it read from the file. That means so far nothing speaks against standardizing that behavior which would be very helpful for portable applications.

Exposing uninitialized data is something avoided in Posix as far as I can see, and nothing any implementation known to me does. Explicitly allowing this behavior does not seem like a good idea to me. Specifing zero padding for allocations beyond i_size as well as for holes is what application seems to expect.

More updates on your wording:

""For regions that overlap previously allocated file space, the implementation shall ensure a read() of data from that overlapped area is unchanged."

This seems to specify one particular implementation and not the effect. For example the Linux fallocate system call that is used to implement posix_fallocate will not have to read any actual data as the filesystem can determine the state based on the metadata.

"When the fd argument has the O_APPEND flag set, and offset is larger than the current size plus 1 it shall be ignored and the file simply extended by len bytes. No error shall be returned if this adjustment is made. The descriptors' current write position shall not change and writes to the file shall be as if O_APPEND was not in effect until the current position is set to a value outside the region by a call to seek() or write().""

This is contrary to at least the Linux implementation. The exact behaviour probably needs to be left undefined as implementation using your semantics might exist in the field.
shware_systems (reporter)
2013-11-06 19:02

The uninit data exposure is a consequence of permitting an arbitrary offset. I agree it should be avoided for security reasons but it can slow systems down so is left optional for this and pwrite(). Most systems favor security over speed so I'm not surprised it isn't seen much.

The standard is mostly written in terms of specific instances of implementation behavior, not plural ones so much. It could be "space, all implementations shall ensure after the interface returns a subsequent read()" as well. If it can get by using already cached data while the allocation in progress that's fine.

The O_APPEND lines are new, and take into account that while a seek to an offset and read may be from anywhere in the file when O_APPEND is set, all writes are expected to behave as if pwrite() with offset of i_size+1 is used as their starting offset, according to the write() interface description, and only reads modify current position implicitly. This interface and pwrite() both require i_size to change when extension occurs. Without a change of this nature, the flag's requirements dominate what this interface is attempting. IOW, pwrite() has the current position should not be changed, but that is for reads only when O_APPEND in effect. For any allocations by this or pwrite(), use of a regular write won't permit writing to the allocation or any hole; the next write() will occur after the allocation. An application is forced otherwise to use only pwrite() for accessing those new parts of the file, which I don't see as the intent of the current wording. This wording allows O_APPEND to dominate still and for any sequence of write interface calls to succeed until the allocated area is overwritten.

Systems that don't treat regular files as separate input and output pipes when O_APPEND is in effect , so there are two explicit current positions, won't accommodate this as easily as those that do. Granted, this has some warts still which I didn't get into, related to locking, but from an application's perspective it can be considered portable, I think, and does clarify the intent as currently worded. What current POSIX systems are doing should be taken into account for further wart removal but something consistent should be the result.

I'm trying to keep it so gets applied as a clarification in the TC, not deferred to next Issue. The other portable alternative is define a new error code for this as a 'shall fail' case and not allow starting offset plus len to be larger than current i_size when O_APPEND is set at all, but I'm also trying to avoid that. Having it 'undefined' or 'unspecified' is definitely non-portable, so I'm not for that either.
Don Cragun (manager)
2013-11-06 20:16
edited on: 2013-11-06 20:20

The description of read already states:
The read() function reads data previously written to a file. If any portion of a regular file prior to
the end-of-file has not been written, read() shall return bytes with value 0. For example, lseek()
allows the file offset to be set beyond the end of existing data in the file. If data is later written at
this point, subsequent reads in the gap between the previous end of data and the newly written
data shall return bytes with value 0 until data is written into the gap.

Although it uses lseek() as the example, the same requirement applies when posix_fallocate() allocates new blocks to a file (they have to be nul byte filled). The last paragraph of the posix_fallocate() description says that space allocated to a file MAY BE freed by a call to ftruncate() that reduces the size of a file.

In practice, implementations usually free data in units of filesystem blocks rather than bytes. So, if ftruncate() reduces the size of a file's size to a multiple of the block size of the underlying filesystem, all previously written data beyond the new (truncated) file size will be lost. But if ftruncate() is asked to reduce the size of a file to an offset in the middle of a block of data, the remainder of the data in that block need not be cleared if the file size increases later due to a posix_fallocate() that grows the file or an lseek() followed by a write() skips over that previously written data.

The standard does not allow data that resided in a block that was part of another file to be inserted into a regular file without having data from the previous file zeroed out first.

eblake (manager)
2013-12-19 17:10
edited on: 2013-12-19 17:13

"some implementations also zero it and lose the data" - which implementations? This sounds like a bug in those implementations for overwriting existing data (posix_fallocate should merely ensure space is reserved for the file, not change contents of the portions that are already allocated).

eblake (manager)
2013-12-19 17:13

Meanwhile, the ability to intentionally zero out a portion of a file (also known as punching holes) is something worth standardizing (but in a different bug report); at least Solaris and Linux have provided hooks for doing this, but not by the same names [Linux with fallocate(FALLOC_FL_PUNCH_HOLE), Solaris with fcntl(F_FREESP)]
Don Cragun (manager)
2013-12-19 17:18
edited on: 2013-12-19 17:19

Interpretation response
The standard clearly states space will be allocated if it is not already allocated to the file, and conforming implementations must conform to this. The standard does not allow implementations to overwrite previously allocated data as a side effect of this call.


Notes to the Editor (not part of this interpretation):
No change required.

ajosey (manager)
2014-02-21 15:39

Interpretation Proposed 21 Feb 2014
ajosey (manager)
2014-03-25 13:42

Interpretation Approved: 25 March 2014

- Issue History
Date Modified Username Field Change
2013-11-05 19:59 hch New Issue
2013-11-05 19:59 hch Name => Your Name Here
2013-11-05 19:59 hch Section => posix_fallocate
2013-11-05 19:59 hch Page Number => ???
2013-11-05 19:59 hch Line Number => ???
2013-11-06 00:00 shware_systems Note Added: 0001960
2013-11-06 00:15 shware_systems Note Edited: 0001960
2013-11-06 14:04 hch Note Added: 0001961
2013-11-06 19:02 shware_systems Note Added: 0001962
2013-11-06 19:52 Don Cragun Page Number ??? => 1425
2013-11-06 19:52 Don Cragun Line Number ??? => 47064-47074
2013-11-06 19:52 Don Cragun Interp Status => ---
2013-11-06 20:16 Don Cragun Note Added: 0001963
2013-11-06 20:18 Don Cragun Note Edited: 0001963
2013-11-06 20:19 Don Cragun Note Edited: 0001963
2013-11-06 20:20 Don Cragun Note Edited: 0001963
2013-12-19 17:10 eblake Note Added: 0002087
2013-12-19 17:13 eblake Note Added: 0002088
2013-12-19 17:13 eblake Note Edited: 0002087
2013-12-19 17:18 Don Cragun Final Accepted Text => See Note: 0002086.
2013-12-19 17:18 Don Cragun Note Added: 0002089
2013-12-19 17:18 Don Cragun Status New => Interpretation Required
2013-12-19 17:18 Don Cragun Resolution Open => Accepted As Marked
2013-12-19 17:19 Don Cragun Interp Status --- => Pending
2013-12-19 17:19 Don Cragun Final Accepted Text See Note: 0002086. => See Note: 0002089.
2013-12-19 17:19 Don Cragun Note Edited: 0002089
2014-02-21 15:39 ajosey Interp Status Pending => Proposed
2014-02-21 15:39 ajosey Note Added: 0002153
2014-03-25 13:42 ajosey Interp Status Proposed => Approved
2014-03-25 13:42 ajosey Note Added: 0002198

Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker