I thought I'd post this here in case someone else is interested in
implementing something similar:
Step (1) is the slowest stage for building a THREAD=REFERENCES tree. If
its result tree is permanently saved, the following thread builds can be
based on it by updating the tree incrementally.
Adding new messages to the tree is simple: simply follow the normal
rules as when building a new tree from scratch. Expunging messages gets
more problematic though.
Each node in the tree keeps a "link reference count" which specifies how
many messages contain a reference (in References: or In-Reply-To:
header) to this message. This link refcount must be updated when adding
and expunging messages. When the link refcount is zero, there are no
messages referencing the node. However a non-zero link refcount doesn't
mean that the node has children. Link refcount also doesn't tell much
about the number of children the node has, because References: headers
may reference any number of its ancestors.
The optimistic approach assumes that usually there are no problematic
links. In such case expunging a message simply updates the link
refcounts and marks the message's node as expunged. If the node's link
refcount is non-zero the node acts as a dummy node. If any unreferenced
dummy node's link refcount drops to zero, it's converted to a
(duplicate) root node (the expunged node may become one as well). These
duplicate root nodes are later dropped by changing their children to
point to the primary root node.
Problematic links are handled by marking nodes affected by them. If such
a node is expunged or unreferenced, the thread tree must be rebuilt.
This is assumed to be a rarely occurring event. The problematic cases
are:
1) Duplicate Message-ID: headers. If the first message having it is
expunged, the thread tree must be rebuilt.
2) Message-ID loops. If a message referencing the looping path gets
expunged, the loop may break and the thread tree must be rebuilt.
Because it can't be determined easily which loops get broken by which
expunges, this case can be handled in a bit easier way: When detecting a
loop between parent and child, rebuild the tree if parent or any node
between child and parent get unreferenced.
3) A message changes its parent because an earlier message's References:
header contained a different link. If the message gets expunged, the
thread tree must be rebuilt to get the original parent back.
4) A link in a message's References: header is ignored, because the
referenced child already specified a different parent to itself. If the
message gets expunged, the thread tree must be rebuilt to determine its
new parent.
5) A link in a message's References: header is ignored, because an
earlier message's References: header already specified a different link.
If the earlier message gets expunged, the parent may change. The earlier
message could be found out quickly by keeping some extra state (or with
a slow scan), but since this is assumed to be a rare problem, there's an
easier (but less specific) way to handle this: Rebuild the tree if the
child node is unreferenced (or alternatively if the original parent node
is unreferenced, but it probably happens more often).
Pseudocode:
node {
parent: Pointer to parent node. Children pointers aren't required.
uid: Message's UID (0 = expunged or never even existed)
link_refcount: Number of messages referencing this node as a parent
(e.g. "References: a b" would add reference to both a and b, regardless
of how the links are really added to the thread tree).
expunge_rebuilds: If this message gets expunged, rebuild the thread tree
unref_rebuilds: If this node gets unreferenced, rebuild the thread tree
}
node_is_root(node)
if node == NIL
return true
// check also if expunging had changed this node to a root node.
// see fix_node() for details
return node.link_refcount == 0 and node.uid == 0
link_reference(parent_node, child_node)
parent_node.link_refcount++
if is_ancestor(child_node, parent_node)
// child_node is an ancestor of parent_node. Adding child_node ->
// parent_node would introduce a loop. If any messages referencing the
// path between parent_node's parent and child_node get expunged, we
// have to rebuild the tree because the loop might break. For example:
// #1: a -> b (a.ref=1, b.ref=1)
// #2: b -> a (a.ref=2, b.ref=2)
// #3: c -> a -> b (a.ref=3, b.ref=3, c.ref=1)
// Expunging #3 wouldn't break the loop, but expunging #1 would.
for node in nodes[parent_node.parent .. child_node]
node.unref_rebuilds = true
else if child_node.parent == parent_node
// The same link already exists
else
// Set parent_node as child_node's parent
if node_is_root(child_node.parent)
child_node.parent = parent_node
else
// Conflicting parent already exists, keep the original.
// We get here only when handling References: header.
if child_node.uid != 0
// We've already seen this message. It specifies its own parent and
// it doesn't matter what any other reference says. The only way its
// parent can change is if the message itself gets expunged.
child_node.expunge_rebuilds = true
else
// Message doesn't exist, so it was one of the node's children
// that created the original reference. If that reference gets
// dropped, the parent is changed. We could catch this in one of
// several ways:
// a) Original parent node gets unreferenced
// b) This node gets unreferenced
// c) Any of the child nodes gets expunged
// b) is probably the least likely to happen, so use it
child_node.unref_rebuilds = true
thread_add_msg(uid)
// get the Message-IDs as specified by the thread spec
(msgid, parent_msgid, references) = message_get_thread_headers(uid)
if msgid != NIL
if nodes[msgid].uid == 0
nodes[msgid].uid = uid
else
// duplicate Message-ID. if the original ever gets expunged,
// rebuild the thread tree
nodes[msgid].expunge_rebuilds = true
msgid = NIL
if msgid == NIL
msgid = get_unique_msg_id()
node = nodes[msgid]
if not node_is_root(node.parent) and
(parent_msgid == NIL or node.parent.msgid != parent_msgid)
// Conflicting parent, remove it. If this message gets expunged, we have
// to revert back to the original parent.
node.parent = NIL
node.expunge_rebuilds = true
if parent_msgid != NIL
link_reference(nodes[parent_msgid], node)
// go through References (skipping the last one)
for (parent_msgid, child_msgid) in references
link_reference(nodes[parent_msgid], nodes[child_msgid])
fix_node(node)
if node_is_root(node)
// this node is now a root node and must be treated as such when adding
// new messages. Usually we don't have children and we could just remove
// this node, but if we have messages:
// #3: References: 1 2
// #4: References: 2
// If #3 gets expunged, we get here with node=1 but it still has child 2.
node.parent = NIL
unref_node(node)
if node.unref_rebuilds
return false
node.link_refcount--
fix_node(node)
return true
// returns false if thread tree needs to be rebuilt
thread_expunge_msg(uid)
// get the Message-IDs as specified by the thread spec
(msgid, parent_msgid, references) = message_get_thread_headers(uid)
node = nodes[msgid]
if node.uid != uid
// Removing a duplicate Message-ID
return false
if node.expunge_rebuilds
return false
if parent_msgid != NIL and not unref_node(nodes[parent_msgid])
return false
// go through References (skipping the last one)
for (parent_msgid, child_msgid) in references
if not unref_node(nodes[parent_msgid])
return false
// mark this node as expunged
node.uid = 0
fix_node(node)
return true
thread_iterate()
root_nodes = []
free_nodes = []
children = [][]
// Steps (2) and (3)
for node in nodes
if node_is_root(node)
// node itself is a duplicate root. free it later.
free_nodes[] = node
else if node_is_root(node.parent)
if node.parent != NIL
// node points to a duplicate root. replace it with the real root.
node.parent = NIL
root_nodes[] = node
else if node.uid != 0
// Find the node's first non-dummy parent and add the node as its child.
// If there are no non-dummy parents, add it as the highest dummy's child.
nondummy_parent = node.parent
while nondummy_parent.uid == 0 and nondummy_parent.parent != NIL
nondummy_parent = nondummy_parent.parent
children[nondummy_parent][] = node
for node in root_nodes
if node.uid == 0
if children[node] == NIL
// remove dummy roots that have no children.
delete(node)
else if count(children[node]) == 1
// dummy root has a single child, replace the root with its child
node = children[node]
for node in free_nodes
free(node)
// root_nodes and children now contain a tree equivalent to a tree built by
// THREAD=REFERENCES specification steps (1)-(3). The rest of the steps
// can be performed using them. Note that node.parent should not (and need
// not) be used because it points its parent before steps (2) and (3).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: This is a digitally signed message part
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20080509/995f9a34/attachment.sig>