Anonymous | Login | 2024-12-04 05:37 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | |||||||
0001824 | [Issue 8 drafts] Shell and Utilities | Editorial | Clarification Requested | 2024-04-01 15:31 | 2024-10-07 16:37 | |||||||
Reporter | dag-erling | View Status | public | |||||||||
Assigned To | ||||||||||||
Priority | normal | Resolution | Open | |||||||||
Status | New | Product Version | Draft 4.1 | |||||||||
Name | Dag-Erling Smørgrav | |||||||||||
Organization | ||||||||||||
User Reference | ||||||||||||
Section | Utilities | |||||||||||
Page Number | 2741-2748 | |||||||||||
Line Number | 90593-90715, 90876-90880 | |||||||||||
Final Accepted Text | ||||||||||||
Summary | 0001824: cp: directories and symlinks | |||||||||||
Description |
I would like to request a clarification on the matter of cp's handling of symbolic links in the destination. To begin with, I find the wording of the final paragraph of the rationale (90876-90880) confusing. It mentions “file types not specified by the System Interfaces” and implies that symbolic links fall into that category, but I can no indication anywhere else that this is the case. On the contrary, the definition of “file” in §3.139 on page 51 explicitly includes “symbolic link” in its enumeration of file types (1592-1593), before stating that “[o]ther types of files may be supported by the implementation” (1593-1594). If we jump back to the description section, the behavior of cp if a source file is a directory and the corresponding destination file exists and is a symbolic link is not entirely clear to me. If you believe the final paragraph of the rationale, it is covered by item 2c (90638-90639) which says it's implementation-defined. If you don't, it depends on a couple of additional factors. First, do you consider the type of the link or the type of its target? (I will come back to this later.) If you consider the type of the link, or if you consider the type of its target and its target is not a directory, we turn to item 2d (90640-90642) which says to emit an error, not descend, and go on with the next source file. If you consider the type of its target and its target is a directory, we turn to item 2f (90649-90650) which says to copy the contents of the source into the destination. I cannot find any discussion anywhere in the specification for cp of what to do if the target of a symbolic link does not exist, unless the second paragraph of item 4c (90697-90698) is intended to cover this case (but 4c discusses the case where the source is a symbolic link, so it doesn't tell us what to do if the destination exists, is a symbolic link, and references a non-existent file). Now for the matter of whether, if the destination file is a symbolic link, we should consider instead the type of its target. The descriptions of the -L and -P options repeatedly use the phrase “symbolic links encountered during traversal of a file hierarchy” (cf. 90625, 90627, 90712, 90715). Given that the surrounding text mostly refers to the source, it is not clear to me whether this phrase only applies to the source, or to both the source and the destination. Turning to historical precedent, BSD cp has traditionally followed symbolic links in the destination hierarchy while GNU cp appears not to. FreeBSD recently changed its implementation to take the -R, -L, and -P options into consideration when checking the destination as well, while the GNU cp documentation appears to state quite clearly (and, in my opinion, correctly) that these options only apply to the source. I believe that this change was a mistake, and I intend to revert it. However, I cannot make up my mind as to whether the historical behavior of BSD cp (always follow symbolic links in the destination) is correct. I can easily conceive of situations where you would want cp to do that, but it has been pointed out to me that doing so, at least by default, can be considered a security risk. Note that in the case where the source file is a file and the destination exists (item 3a lines 90657-90671), the file type of the destination is not taken into account at all. I am not as concerned with this case as I am with the directory case, but it should probably be addressed as well. To summarize: - The rationale implies that symbolic links are an extension, which I believe to be incorrect. - It is unclear whether symbolic links in the destination should be followed, and whether the -L and -P options apply, when inspecting destination paths. - There is historical precedent for answering these questions with “yes” and “no”, respectively. Recent history suggests the wording is vague enough that implementers are confused on the second point. - The description section does not adequately discuss how cp should behave if the source is a directory and the destination exists and is a symbolic link. - The description section does not consider the type of the destination at all in the case where the source is a file and the destination exists. |
|||||||||||
Desired Action |
1. Clarify the final paragraph of the rationale. 2. Modify the description section to state either: a) That cp always follows links in the destination. b) That cp never follows links in the destination. c) That whether cp follows links in the destination is unspecified. The best option is probably c) since there is historical precedent for (and therefore also against) both a) and b). 3. Modify the phrase “symbolic link(s) encountered during traversal of a file hierarchy”, which appears twice in the description section and twice in the options section, to clarify whether it only refers to the source, or to both source and destination. 4. Modify the list of steps taken for each source file in the description section to clarify what happens if the source is a directory and the destination is a symbolic link. 5. Optionally expand the list of steps taken for each source file in the description section to also describe what happens if the source is a regular file and the destination is a symbolic link. |
|||||||||||
Tags | No tags attached. | |||||||||||
Attached Files | ||||||||||||
|
Notes | |
(0006731) dannyniu (reporter) 2024-04-02 06:19 edited on: 2024-04-02 06:22 |
Here's how I understand the (new) wording of the rationale: 1. Implementations are allowed to copy directories to their own implementation-defined file types, 2. the wording is chosen such that implementations may support symbolic links (pointing to whatever that the implementation supports) as copy destinations for directories. As such, the normative text is modified to "support" this "loophole". |
(0006734) geoffclare (manager) 2024-04-02 15:51 |
The final paragraph of rationale dates back to the original POSIX.2-1992 standard, where the text was "implementation-defined file types not specified by POSIX.1 {8}". The reference to "POSIX.1 {8}" was to the POSIX.1-1990 standard which did not specify symbolic links. |
(0006737) geoffclare (manager) 2024-04-04 14:30 edited on: 2024-04-08 08:42 |
I believe the standard is clear on all of the points raised here. Taking the bullet items after "To summarize" in turn: - The rationale implies that symbolic links are an extension, which I believe to be incorrect. Yes, as per my previous note, the final paragraph of the rationale is out of date. It should be disregarded when interpreting the normative text. - It is unclear whether symbolic links in the destination should be followed, and whether the -L and -P options apply, when inspecting destination paths. XBD 4.16 Pathname Resolution requires that symbolic links are followed except when all of the following are true: 1. This is the last pathname component of the pathname. 2. The pathname has no trailing <slash>. 3. The function is required to act on the symbolic link itself, or certain arguments direct that the function act on the symbolic link itself. - There is historical precedent for answering these questions with "yes" and "no", respectively. Recent history suggests the wording is vague enough that implementers are confused on the second point. I believe the text in the standard is sufficient to answer the questions. Whether the required behaviour matches what implementations do is another matter. The recent history could be taken as an indication that (at least some) implementors are willing to make changes to conform, once they are made aware of the correct interpretation of the standard. - The description section does not adequately discuss how cp should behave if the source is a directory and the destination exists and is a symbolic link. It needs to be read in combination with XBD 4.16 Pathname Resolution, and noting that -P implies symlinks are not followed for both source and destination. - The description section does not consider the type of the destination at all in the case where the source is a file and the destination exists. Correct, and I don't see any problem with that. If the destination is a directory, the open() call in 3.a.ii will fail and cp will report an error. If -f is specified, an unlink() is attempted in 3.a.iii and will normally also fail, although it does seem to imply that if the implementation supports privileged unlinking of directories then cp will do so when run with appropriate privilege. We should consider adding a "the file is a non-directory file" condition to 3.a.iii. |
(0006739) dag-erling (reporter) 2024-04-04 16:01 |
Geoff, I think you should look more carefully at the context of your quote. You plucked it from a portion of the text which discusses traversal of the source; I don't agree that it clearly requires -P to apply to the destination. Furthermore, if you are correct and line 90628 applies equally to the destination as to the source, then the same must go for the rest of the surrounding text, including lines 90619-90620 which say that the default behavior is unspecified. This contradicts your claim that the default behavior should be to follow symbolic links. Your insistence that this is all perfectly clear clashes with the reality that implementers (of which I am one) either find it unclear or outright disagree... and with your own apparent confusion. |
(0006740) geoffclare (manager) 2024-04-04 16:59 edited on: 2024-04-05 08:16 |
What is there before line 90628 that makes you think its context is traversal of the source? I'm not seeing it. Quite the opposite, as line 90613 says "The term source_file refers to the file that is being copied, whether specified as an operand or a file in a file hierarchy rooted in a source_file operand." Lines 90619-90620 say "If none of the options −H, −L, nor −P were specified, it is unspecified which of −H, −L, or −P will be used as a default." It doesn't make any sense to try and think of this in terms of it applying to certain files and not to others. It is simply saying "cp -R a b" can behave the same as "cp -RH a b", "cp -RL a b", or "cp -RP a b". You assert that I claimed "the default behavior should be to follow symbolic links". That is not true. I said that the pathname resolution rules require that symbolic links are followed except when a certain set of conditions is met. For cp, one of those exception conditions is met when the -P option is in effect. If a cp implementation has -P as the default when -R is specified and none of -H, -L or -P is specified, then that condition is met by default and symlinks will not be followed by default for cp -R. |
(0006741) dag-erling (reporter) 2024-04-05 11:33 |
Line 90628 is within a section about -R. Furthermore, lines 90714-90715, which describe -P in the options section, speak only of source_file or traversal, which we've established is about the source, not the destination. |
(0006743) geoffclare (manager) 2024-04-08 08:39 edited on: 2024-04-08 08:44 |
> Line 90628 is within a section about -R. That doesn't imply "its context is traversal of the source". You can use -R without any traversal occurring, e.g.: cp -RL symlink_to_regfile copy_of_regfile cp -RP symlink copy_of_symlink and lines 90623-90625 (for -L) and 90626-90628 (for -P) apply to these cases. > Furthermore, lines 90714-90715, which describe -P in the options section, speak only of source_file or traversal, which we've established is about the source, not the destination. Aha! You've finally identified something I agree is a problem. This text conflicts with line 90628, as it implies (together with the pathname resolution rules) that symlinks are always followed for the destination whereas 90628 says they aren't. The description of -P in OPTIONS was missing from the final POSIX.2b draft and was added by IEEE PASC Interpretation 1003.2 #194. I suspect the working group which processed that interpretation just came up with the wording by comparison to the -H and -L descriptions and missed the significance of the DESCRIPTION text for -P saying "shall not follow any symbolic links" as regards the destination. Note that the rationale for the interpretation says "The standard is clear as the -P option is described in the description section. However, it would be better to have the option described in the Options section as well." (See https://web.archive.org/web/20050116074829/http://www.pasc.org/interps/unofficial/db/p1003.2/pasc-1003.2-194.html [^] ). So, it is clear the intention was for the new -P text in OPTIONS to match the existing DESCRIPTION text. |
(0006788) geoffclare (manager) 2024-05-20 09:26 edited on: 2024-05-20 09:28 |
Before we can work on wording, we need to decide what behaviour(s) to require/allow for -P. We should be guided by existing practice. I tried a few tests on the systems I have access to. - First I tried this: mkdir targdir; ln -s targdir destdir echo src > src cp -RP src destdirand cp created src in targdir (on Solaris, Linux, and macOS). - Then I tried this: mkdir destdir; cd destdir; echo targ > targ; ln -s targ src cd ..; echo src > src cp -RP src destdirand cp copied src contents into the targ file (on Solaris, Linux, and macOS). - Then I tried this: mkdir subdir destdir; cd destdir; mkdir targdir; ln -s targdir subdir cd ../subdir; echo src > src; cd .. cp -RP subdir destdirSolaris and macOS created src in targdir. Linux failed with "cannot overwrite non-directory 'destdir/subdir' with directory 'subdir'" Conclusion: Solaris and macOS consistently follow destination symlinks, as does historical FreeBSD according to the bug description, but Linux (GNU coreutils 9.1 on a Debian system) is inconsistent; it follows destination symlinks in two out of three of my test cases but does not in the other case. |
(0006807) eblake (manager) 2024-06-10 18:16 |
A lot of this stems from Linux's intentional decision years ago that when readlink("dangling") returns "newdir" but stat("dangling") fails with ENOENT, then rename("olddir", "dangling/") should fail with ENOTDIR, instead of leaving "dangling" intact as a symlink and renaming "olddir" to "newdir". It is not just rename() affected; mkdir(), unlink(), and several other Linux syscalls intentionally refuse to dereference through a symlink followed by a trailing slash on the grounds that it is somewhat ambiguous on whether you wanted to act on a (potential) directory name or on the symlink itself. Is it time to recognize the Linux syscall behaviors on dangling symlinks as a valid alternative to traditional Unix behavior (a much bigger change to identify all of those affected interfaces, but would make it easier for Linux to finally comply with POSIX), or are we only wanting to paper over this scenario at the command line interface of cp despite Linux being unwilling to change kernel behavior at the C level, or something else altogether? |
(0006808) geoffclare (manager) 2024-06-11 09:38 |
Re: Note: 0006807 I thought that behaviour only happened when there are trailing slashes. In my tests in Note: 0006788 I intentionally did not test trailing slashes so as to avoid that issue. To answer your question, this has been discussed in the past and we decided not to allow the Linux behaviour for trailing slashes. I see no reason to change that decision now. Note that the Linux systems which achieved UNIX03 certification (Inspur K-UX and Huawei Euler-OS) changed their trailing slash behaviour to conform to the standard, so not all Linux systems behave that way. |
(0006809) geoffclare (manager) 2024-06-11 09:53 |
> A lot of this stems from Linux's intentional decision ... I repeated my tests from Note: 0006788 with /usr/gnu/bin/cp on Solaris 11.4 and got the same results as on Linux. So I think you are mistaken to blame the GNU cp behaviour on the behaviour of the underlying Linux system calls it uses. |
(0006818) mirabilos (reporter) 2024-06-14 17:57 |
FWIW, MirBSD (“historic BSD from the 2000s”) does the same as Solaris and Mac OSX in Note: 0006788. |
(0006904) dag-erling (reporter) 2024-10-04 17:02 edited on: 2024-10-04 17:04 |
In macOS Sequoia, cp no longer follows symlinks in the destination: % mkdir subdir destdir; cd destdir; mkdir targdir; ln -s targdir subdir; cd ../subdir; echo src > src; cd .. % cp -RP subdir destdir cp: destdir/subdir: Not a directory % find destdir destdir destdir/targdir destdir/subdir |
(0006907) geoffclare (manager) 2024-10-07 16:37 edited on: 2024-10-08 08:17 |
> In macOS Sequoia, cp no longer follows symlinks in the destination Sequoia has a known non-conformance to UNIX 03 in this area, so this new behaviour is likely related to that issue, in which case it will probably change soon. The UNIX 03 certification was made with a few "minor system faults" which Apple have 12 months to fix, and one of them is for cp failing to create a regular file at the destination of a symlink. Those who have an opengroup.org account can see the MSF details here: https://www2.opengroup.org/pr/protected/openbrand/PRView?PR=2757 [^] This is the problem statement: If you want to cp into symlink and the target of this symlink doesn't exist then it should to be created as a regular file. In the current test, ga11_linktarget should be created as a regular file with contents of tetcp001.inp. Whereas in the current version of our OS, it is not creating the file for the ga11_linktarget and that is causing the test to fail with file missing error. However, this makes the test case sound simpler than it is. These are the commands used in the test (note that tetcp001.inp already exists as a regular file and the test first ensures that ga11_linktarget does not exist): mkdir dircp006.tmp ln -s `pwd`/ga11_linktarget dircp006.tmp/tetcp001.inp cp tetcp001.inp dircp006.tmp/ It then expects ga11_linktarget to have been created with the contents of tetcp001.inp but reports failure because the file could not be found. Note that it doesn't use -RP so the symlink should definitely be followed. This has been tested for UNIX 03 certifications for 20 years. In case anyone thinks I'm revealing confidential information about Apple here, I'm not; it is public if you know where to look. The MSF doesn't name Apple, but it is listed (as MSF.X.0138) in the Sequoia "Commands and Utilities V4" CSQ here: https://www.opengroup.org/csq/repository/noreferences=1&RID=apple%252FCX1%25252F19.html [^] |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |