|Anonymous | Login||2022-01-16 18:14 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0001011||[1003.1(2008)/Issue 7] System Interfaces||Comment||Enhancement Request||2015-12-10 13:46||2020-04-08 15:43|
|Priority||normal||Resolution||Accepted As Marked|
|Final Accepted Text||Note: 0003410|
|Summary||0001011: Make the binary search tree functions friendlier to use: require balancing, add typing and allow destruction|
Though binary search tree functions provided by <search.h> are functional, they have a couple of shortcomings that really prevent people from using them effectively.
The main problem with these interfaces is that the typing is pretty poor. Keys are stored in the form of void pointers, which is fine. What is annoying is that in an attempt to make the structure of the tree opaque, tree nodes themselves are also void pointers. This makes it hard to understand how the functions should be invoked, how callbacks will be called and what is being returned.
Changing the typing to use structures for tree nodes is going to be impossible, as it will break a lot of existing code. People will need to change code to no longer use void pointers, but some other type. What we can do to provide a smooth transition into a typed world is by extending <search.h> to define the following type:
typedef void TNODE;
If my interpretation of the standard is correct, the function prototypes could then safely be changed to the following:
void *tdelete(const void *restrict key, TNODE **restrict rootp,
int (*compar)(const void *, const void *));
TNODE *tfind(const void *key, TNODE *const *rootp,
int (*compar)(const void *, const void *));
TNODE *tsearch(const void *key, TNODE **rootp,
int (*compar)(const void *, const void *));
void twalk(const TNODE *root,
void (*action)(const TNODE *, VISIT, int));
Notice how the API suddenly becomes a lot more self-documenting. Existing code will still build as before, but new code can be written down in a better typed way. If we ever get to a point in the far future where TNODE is being commonly used, we could even consider dropping the requirement of defining it as void.
Notice how I didn't change the return type of tdelete() to be a TNODE *. In my opinion we should weaken the existing sentence:
"The tdelete() function shall return a pointer to the parent of the deleted node, or an unspecified non-null pointer if the deleted node was the root node, or a null pointer if the node is not found."
"The tdelete() function shall return an unspecified non-null pointer if the deleted node was the root node, or a null pointer if the node is not found."
As already mentioned in the application usage, there is nothing meaningful that can currently be done with the address returned by this function. The caller should just interpret it as a boolean value.
I think that the term "deleted node" is vague. Many algorithms for deleting nodes from trees may delete the predecessor/successor instead of the node with the matching key. Which of the two parents should be returned in that case? The parent of the node, or the parent of the deleted predecessor/successor?
We currently don't provide any functions to easily destroy an entire tree. If you don't happen to keep track of the contents of a binary search tree, you essentially have to call twalk(), store the results and then call tdestroy() iteratively, which is both complex and has a suboptimal time complexity.
glibc solved this by adding the following function (adjusted to use TNODE):
void tdestroy(TNODE *root, void (*free_node)(void *nodep));
It walks over the tree, invokes the free_node callback with the key of every node (read: not the node itself) and frees all of the nodes. Simple. We should also strongly consider including it.
Though algorithms for balancing trees efficiently already appeared in 1962, these functions are currently not required to do any balancing.
The first issue with that is that performance is horrible. On FreeBSD it takes 1 minute to insert 100k elements in ascending order, while it only takes 0.03 seconds on Linux. Second, it also allows you to craft trees that are too deep to call twalk() on. You can easily run into a stack overflow trying to walk such an unbalanced tree.
I propose that we simply add an additional sentence to the article, stating that the depth of the tree may at most be c*log(n).
|Desired Action||Either improve them or deprecate them.|
reserving the name TNODE for search.h is i think problematic
and not that useful.
there is already an application usage note:
Since the return value of tdelete() is an unspecified non-null
pointer in the case that the root of the tree has been deleted,
applications should only use the return value of tdelete() as
indication of success or failure and should not assume it can
so the return value is only a success/failure indicator and thus
the text about parent nodes is meaningless and confusing.
tdestroy with glibc semantics may be useful.
i think changing the semantics now is problematic for compatibility
reasons, but posix could add implementation notes about making the
tree balanced to avoid stack overflow.
that said there is a genuine bug in the spec:
the functions return a pointer to a node, but the caller cannot do
anything with a node, since it's not a public type.
historically it was assumed that the first member of the node struct
is a pointer to an element (the key stored in the node) which can be
accessed by *(void**)p where p is the returned node pointer.
this is undefined behaviour in c and dangerous with current
compiler optimizations, the spec should say that the return value
is a pointer to the element pointer in the found node (this is
backward compatible with existing code and makes it possible to
use this api in c).
edited on: 2015-12-11 14:28
I'll add that documentation-related changes to these interfaces are forthcoming: http://austingroupbugs.net/view.php?id=551 [^]
Issue 511 at least partially addresses #3 (destruction) by repeatedly calling tdelete() on the root node. e.g., assuming memory for the user data in the node needn't be freed:
while (rootp != NULL)
tdelete(*(struct node **)rootp, &rootp, compar);
Regarding #4: the poor performance of FreeBSD's implementation in that situation should be noted in a bug report to the FreeBSD project.
while (rootp != NULL)
tdelete(*(struct node **)rootp, &rootp, compar);
has undefined behaviour.
rootp is void* (because &rootp is void**).
rootp cannot be dereferenced through any user defined type
because it stores a pointer to an implementation internal
node type, so *(struct node**)rootp is undefined behaviour.
this approach cannot work: the user has to pass a not yet
deleted element pointer as the first argument, but there
is no way to access the element pointer of the root node.
sorry my analysis was not entirely correct.
a pointer to struct can be converted to a pointer to the type
of its first member and dereferenced to accessed that member
(according to iso c 22.214.171.124p15 and similarly for unions 126.96.36.199p16).
but for that
1) the node type must be a struct or union.
(the internal type, returned by the functions and stored in rootp)
2) the member type holding the element pointer must be known.
(e.g. it can be void*)
3) the member must be at the start ot the struct/union layout.
the resolution in issue #551 uses
would be valid and only with the three guranatees above.
the current text for twalk callback says:
"The structure pointed to by this argument is unspecified and
shall not be modified by the application, but it shall be possible
to cast a pointer-to-node into a pointer-to-pointer-to-element
to access the element stored in the node."
this requires that all pointers must have the same representation
because the implementation has no way to know the user defined
posix otherwise only requires that pointers can be converted to
void* and back, they might have different representations, so
this is unjustified requirement.
edited on: 2016-09-01 14:10
The storage allocated to a pointer is fixed in width by the compilation environment specified at compile time to the c99 utility, as a CX extension. The bits of this allocation which are actually significant to reference various pointer type classes (object allocation, function definition, incomplete type placeholder, null, trap, and label at compile time) may still vary, but the fixed width allows that cast to be reliable for the standard environments. It is up to the compiler to emit code that handles casting of different physical representations a platform may use to and from the logical void, null, intptr_t, or an object type reliably, per <stddef.h> and <stdint.h> of POSIX.
Off topic: I feel the definition of ptrdiff_t in both standards is better expressed as "result of subtracting two object pointers to elements of the same complete structure or array", explicitly excluding function and incomplete type classes thereby, as the case where the difference calculated has defined usages.
(The following page and line numbers are based on document C165, the merged Issue 7+TC2 document.)
On page 327 after line 11111 (XBD <search.h> description), insert the following new paragraph:
The <search.h> header shall define via typedef the posix_tnode type as an alias for void.
On page 327 lines 11123-11130 (XBD <search.h> description), change:
to:void *tdelete(const void *restrict, void **restrict, int(*)(const void *, const void *)); void *tfind(const void *, void *const *, int(*)(const void *, const void *)); void *tsearch(const void *, void **, int(*)(const void *, const void *)); void twalk(const void *, void (*)(const void *, VISIT, int ));
void *tdelete(const void *restrict, posix_tnode **restrict, int (*)(const void *, const void *)); posix_tnode *tfind(const void *, posix_tnode *const *, int (*)(const void *, const void *)); posix_tnode *tsearch(const void *, posix_tnode **, int (*)(const void *, const void *)); void twalk(const posix_tnode *, void (*)(const posix_tnode *, VISIT, int));
On page 327 line 11134 (XBD <search.h> rationale), change:
Earlier versions of this standard explicitly used void for both node and key references where this version now uses posix_tnode for nodes and keeps void in the text referring only to keys. In order to preserve backwards compatibility, this version defines posix_tnode as an alias for void. The change was made to make the function prototypes more easily understandable.
On page 2136 line 68415 (XSH tdelete synopsis) change:
to:void *tdelete(const void *restrict key, void **restrict rootp, int(*compar)(const void *, const void *)); void *tfind(const void *key, void *const *rootp, int(*compar)(const void *, const void *)); void *tsearch(const void *key, void **rootp, int (*compar)(const void *, const void *)); void twalk(const void *root, void (*action)(const void *, VISIT, int));
void *tdelete(const void *restrict key, posix_tnode **restrict rootp, int (*compar)(const void *, const void *)); posix_tnode *tfind(const void *key, posix_tnode *const *rootp, int (*compar)(const void *, const void *)); posix_tnode *tsearch(const void *key, posix_tnode **rootp, int (*compar)(const void *, const void *)); void twalk(const posix_tnode *root, void (*action)(const posix_tnode *, VISIT, int));
On page 2136 line 68445-68446 (XSH tdelete description) change:
The variable pointed to by rootp shall be changed if the deleted node was the root of the tree.to:
The variable pointed to by rootp shall be set to a pointer to the new root of the tree if the root of the tree was changed.
On page 2137 line 68478 (XSH tdelete return value) add a new paragraph:
In all cases where a pointer to a node is returned, the structure pointed to by the return value is unspecified and shall not be modified by the application, but it shall be possible to cast a pointer-to-node into a pointer-to-pointer-to-element to access the element stored in the node.
On page 2137-2139 line as below (XSH tdelete examples) change void * to posix_tnode * on the following lines:
68494 void *root = NULL; 68500 void *node; 68501 void print_node(const void *, VISIT, int); 68562 print_node(const void *ptr, VISIT order, int level)
On page 2139 line 68578 (XSH tdelete application usage) change:
Since the return value of tdelete() is an unspecified non-null pointer in the case that the root of the tree has been deleted, applications should only use the return value of tdelete() as indication of success or failure and should not assume it can be dereferenced.to:
Since the return value of tdelete() is an unspecified non-null pointer in the case that the root of the tree has been deleted, applications should only use the return value of tdelete() as indication of success or failure in this case and should not assume it can be dereferenced. However, the only way that applications can tell if this case may have occurred is by checking whether the variable pointed to by rootp changed. Since this variable can change for other reasons (e.g., tree balancing), using the return value of tdelete() as anything other than a boolean indicator is unreliable at best and is discouraged.
On page 2139 line 68584 (XSH tdelete rationale) change:
Implementations are encouraged to use balanced trees to reduce the depth of the trees that are created and improve tree search times.
This bug was discussed in the 13 Oct 2016 teleconference with the following outcome:
Accepted (using posix_tnode instead of TNODE). See Note: 0003410.
There may be existing applications that rely on the requirement that tdelete() returns the parent of the deleted node (if the root was not deleted). Therefore, rather than changing the normative text, the application usage will be changed to discourage its use.
Rejected: destroying a whole tree can be done with a simple tdelete() loop, as shown in the example code in TC2. If various implementations start providing a convenience function to destroy a whole tree we can standardize it at that time.
This is a quality-of-implementation issue. The standard should not require balancing, but can recommend/encourage it in rationale.
|2015-12-10 13:46||EdSchouten||New Issue|
|2015-12-10 13:46||EdSchouten||Status||New => Under Review|
|2015-12-10 13:46||EdSchouten||Assigned To||=> ajosey|
|2015-12-10 13:46||EdSchouten||Name||=> Ed Schouten|
|2015-12-10 13:46||EdSchouten||User Reference||=> Nuxi|
|2015-12-10 13:46||EdSchouten||Section||=> search.h|
|2015-12-10 13:46||EdSchouten||Page Number||=> 323|
|2015-12-10 13:46||EdSchouten||Line Number||=> 10822|
|2015-12-10 17:56||nsz||Note Added: 0002973|
|2015-12-11 14:26||weeks||Note Added: 0002975|
|2015-12-11 14:28||weeks||Note Edited: 0002975|
|2015-12-11 19:15||nsz||Note Added: 0002976|
|2015-12-11 21:19||nsz||Note Added: 0002977|
|2015-12-14 20:19||shware_systems||Note Added: 0002978|
|2016-08-18 16:27||nick||Relationship added||related to 0000551|
|2016-09-01 14:10||shware_systems||Note Edited: 0002978|
|2016-09-14 14:22||emaste||Issue Monitored: emaste|
|2016-10-13 15:35||rhansen||Note Added: 0003410|
|2016-10-13 15:36||geoffclare||Note Added: 0003411|
|2016-10-13 15:38||geoffclare||Interp Status||=> ---|
|2016-10-13 15:38||geoffclare||Final Accepted Text||=> Note: 0003411|
|2016-10-13 15:38||geoffclare||Status||Under Review => Resolved|
|2016-10-13 15:38||geoffclare||Resolution||Open => Accepted As Marked|
|2016-10-13 15:38||geoffclare||Tag Attached: issue8|
|2016-10-13 15:38||geoffclare||Final Accepted Text||Note: 0003411 => Note: 0003410|
|2017-08-02 09:38||EdSchouten||Issue Monitored: EdSchouten|
|2020-04-08 15:43||geoffclare||Status||Resolved => Applied|
|2021-05-06 09:31||geoffclare||Relationship added||related to 0001469|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|