[Imap-protocol] UIDNEXT updates upon new message arrival

E-mail headers

From:	Jan Kundrát <jkt@gentoo.org>
To:	imap-protocol@u.washington.edu
Date:	Fri, 08 Jun 2018 12:34:45 -0000
Message-ID:	4D385AB5.90208@gentoo.org permalink / raw / eml / mbox

Hi,
when I'm SELECTing a mailbox and have some local state to work with
(cached UIDs and message parts etc), I'm currently using the UIDNEXT as
a hint for finding out what happened to the mailbox since the last
disconnect. My behavior is inspired by a certain document I've read a
few years ago (and unfortunately I can't find it right now),  it goes
like this:

if sync_state_not_usable || old.uidValidity != new.uidValidity:
  full_sync()
else:
  if new.uidNext == old.uidNext:
    // certainly no new arrivals
    if new.exists == old.exists:
      // no deletions, either
      sync_just_flags()
    else:
      sync_only_deletions()
  else:
    // Some messages were delivered since the last time,
    // and there's no guarantee that they're still there
    if new.uidNext - old.uidNext == new.exists - old.exists:
      // only new arrivals
      sync_only_additions()
    else:
      sync_generic()

The goal here is to get a list of all UIDs in a mailbox and find out
which messages got deleted (so that I can purge them from the local
cache) and which arrived (so that I can initiate a prefetch for them,
and because I use UID FETCH exclusively, to actually fetch them at all).
I also sync FLAGS to be able to show updated "X new messages" while
messages arrive or are deleted from asynced mailbox.

full_sync() forgets everything from the last time (wiping the cache
completely) and then synces UIDs and FLAGS.

sync_just_flags() fetches all flags.

sync_only_additions() performs "UID SEARCH UID old.uidNext:*" followed
by flags sync.

sync_only_deletions() calls "UID SEARCH ALL" and then synces flags; this
could be optimized to use an algorithm similar to binary search to try
to locate the range where the deletions happened, but that would have
the drawback of increased latency, so I don't do that.

sync_generic() performs in the same way as sync_only_deletions() at this
time (and couldn't be optimized in that manner).

There's a problem with this approach, though, because IMAP servers won't
send UIDNEXT updates when sending EXISTS that informs about new
arrivals. Therefore, when I see the EXISTS which increments the message
count, I walk all the messages I have in my cache at that point,
starting at the message with the highest sequence number and going
backwards to message #1, and stop at the "first" (counting backwards)
message for which I already know the UID. Then I issue a "UID FETCH
highest-known-uid+1:* (FLAGS)" and when processing the untagged FETCH
responses, whenever I store a UID for a message which is higher or equal
than the UIDNEXT I have for the current mailbox, I also set the UIDNEXT
to the UID+1. I'm also listening for ordinary * OK UIDNEXT responses, so
servers which choose to send these updates are handled, too.

I've read various posts to this list about reliability of the UIDNEXT,
but am reasonably sure that this is actually the safe way of using this
value -- the worst thing which could happen is wasting time with
excessively long response to the UID SEARCH command, AFAIK.

I do not support QRESYNC yet.

I'd appreciate if someone could check that the above makes sense and is
actually correct, or advise me about how to change that.

With kind regards,
Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 270 bytes
Desc: OpenPGP digital signature
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20110120/addfc6b1/attachment.sig>

Reply

E-mail headers

From:	dave@cridland.net
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:45 -0000
Message-ID:	28835.1295567529.285091@puncture permalink / raw / eml / mbox

On Thu Jan 20 15:54:29 2011, Jan Kundr?t wrote:
> when I'm SELECTing a mailbox and have some local state to work with
> (cached UIDs and message parts etc), I'm currently using the  
> UIDNEXT as
> a hint for finding out what happened to the mailbox since the last
> disconnect. My behavior is inspired by a certain document I've read  
> a
> few years ago (and unfortunately I can't find it right now),  it  
> goes
> like this:
> 
> 
You're describing (a slightly broken version of) Polymer's sync  
algorithm.

See https://github.com/dwd/Polymer/blob/master/infotrope/imap.py  
around line 3611, and also line 3800.

> There's a problem with this approach, though, because IMAP servers  
> won't
> send UIDNEXT updates when sending EXISTS that informs about new
> arrivals.

No, but that's okay. If N messages arrive, UIDNEXT must increase by a  
minimum of N. In fact, if N messages arrive, UIDNEXT will in all  
likelyhood increase by precisely N. I can conceive of edge cases  
where it increases by more than N in a reasonable implementation, but  
they're unlikely. So just add one for each new message that you're  
told about during a session. (ie, each time you're told about an  
increase to EXISTS, add the difference to your UIDNEXT).

This is probably easier than your current approach, although that  
looks valid. (Just complicated).

> I do not support QRESYNC yet.

You should do - it's effectively the same sync algorithm, but changed  
to take advantage of the server's knowledge instead - so the  
complicated selection is done by the server.

Polymer doesn't do a binary search, by the way - it fetches chunks of  
1000 UIDs at once, and has an amusing algorithm to decide how many  
chunks to fetch, depending on whether ESEARCH is used (which'll help  
you hugely) and how sparse the UID mapping is. For finding a UID in  
the mapping, it actually does a successive approximation technique,  
on the assumption that the map is uniformly sparse, which works  
pretty well - usually it find the right chunk in its first guess.

For resync purposes, if there have been expunges, it discards and  
refetches just the last block - because generally expunges will  
happen in the recent portion of the mailbox - and if the beginning of  
the newly fetched UID block matches the beginning of the one it's  
replacing, the UID map is now synchronized.

Otherwise, it starts hunting backward through the blocks, several at  
a time, looking for the first unseen expunge, but this is really rare.

Most of this algorithm dates from a time when there was a real  
emphasis on reducing the actual octet count - I don't think that's  
the case so much anymore, so it's probably overkill.

Dave.
-- 
Dave Cridland - mailto:dave@cridland.net - xmpp:dwd@dave.cridland.net
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Reply

E-mail headers

From:	arnt@gulbrandsen.priv.no
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:45 -0000
Message-ID:	4D394D09.50406@gulbrandsen.priv.no permalink / raw / eml / mbox

On 01/21/2011 12:52 AM, Dave Cridland wrote:
> Most of this algorithm dates from a time when there was a real emphasis
> on reducing the actual octet count - I don't think that's the case so
> much anymore, so it's probably overkill.

I have a feeling that it's based on a description I wrote of a handheld 
client I also wrote, ten years ago. At the time handhelds had negligible 
RAM and less bandwidth.

Last year I started porting that to Android (I have a Galaxy S and hate 
both of the MUAs there) and dropped this algorithm. Instead I ask for 
VANISHED if available, and if it isn't available. handle EXPUNGES in a 
somewhat Texan manner... when EXPUNGE arrives, I issue "UID SEARCH UID 
x,y,z" for the entire set of messages I have in cache, and use the 
search result to clean the cache. Primitive, simple, good enough.

Arnt

Reply

E-mail headers

From:	jkt@gentoo.org
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:45 -0000
Message-ID:	4D38DE97.6060509@gentoo.org permalink / raw / eml / mbox

On 01/21/11 00:52, Dave Cridland wrote:
> You're describing (a slightly broken version of) Polymer's sync algorithm.
> 
> See https://github.com/dwd/Polymer/blob/master/infotrope/imap.py around
> line 3611, and also line 3800.

Thank you, I've read various docs at your web a few years ago. It's
possible that the EXISTS/UIDNEXT thing was there back then. I wasn't
able to find it today, but I'm sure I've read various things which
aren't there anymore.

Could you please mention what is broken in what I described, though? It
isn't obvious from a quick glance at your source code. (I've already
found that you sort the response to the UID SEARCH command locally, but
that isn't related to this question. Fixed now, thanks.)

> No, but that's okay. If N messages arrive, UIDNEXT must increase by a
> minimum of N. In fact, if N messages arrive, UIDNEXT will in all
> likelyhood increase by precisely N. I can conceive of edge cases where
> it increases by more than N in a reasonable implementation, but they're
> unlikely. So just add one for each new message that you're told about
> during a session. (ie, each time you're told about an increase to
> EXISTS, add the difference to your UIDNEXT).

Yes, but reading threads about people adding a fake EXISTS/EXPUNGE pair
to the IDLE, just to make Outlook happy, scares me. I can't think of a
situation where this extra increment would break anything, but maybe I'm
just missing something -- that's why I'm asking.

>> I do not support QRESYNC yet.
> 
> You should do - it's effectively the same sync algorithm, but changed to
> take advantage of the server's knowledge instead - so the complicated
> selection is done by the server.

Yes, it's just a matter of making sure that I have the basic
functionality well-tested and covered by test cases before I build a
more complicated setup on top of it.

> Polymer doesn't do a binary search, by the way - it fetches chunks of
> 1000 UIDs at once, and has an amusing algorithm to decide how many
> chunks to fetch, depending on whether ESEARCH is used (which'll help you
> hugely) and how sparse the UID mapping is. For finding a UID in the
> mapping, it actually does a successive approximation technique, on the
> assumption that the map is uniformly sparse, which works pretty well -
> usually it find the right chunk in its first guess.
> 
> For resync purposes, if there have been expunges, it discards and
> refetches just the last block - because generally expunges will happen
> in the recent portion of the mailbox - and if the beginning of the newly
> fetched UID block matches the beginning of the one it's replacing, the
> UID map is now synchronized.

I'm assuming that these chunks are always 1000 items in length (ie. they
cover 1000 messages, or 1000 sequence numbers), and therefore your
blocks start at message #1, #1001, #2001 etc, is that right? That indeed
looks like a nice optimization, albeit with a similar latency tradeoff.

Cheers,
Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 270 bytes
Desc: OpenPGP digital signature
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20110121/032d3d32/attachment.sig>

Reply

E-mail headers

From:	dave@cridland.net
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:45 -0000
Message-ID:	28835.1295599605.789384@puncture permalink / raw / eml / mbox

On Fri Jan 21 01:17:11 2011, Jan Kundr?t wrote:
> On 01/21/11 00:52, Dave Cridland wrote:
> > You're describing (a slightly broken version of) Polymer's sync  
> algorithm.
> >
> > See https://github.com/dwd/Polymer/blob/master/infotrope/imap.py  
> around
> > line 3611, and also line 3800.
> 
> Thank you, I've read various docs at your web a few years ago. It's
> possible that the EXISTS/UIDNEXT thing was there back then. I wasn't
> able to find it today, but I'm sure I've read various things which
> aren't there anymore.
> 
> Could you please mention what is broken in what I described,  
> though? It
> isn't obvious from a quick glance at your source code. (I've already
> found that you sort the response to the UID SEARCH command locally,  
> but
> that isn't related to this question. Fixed now, thanks.)
> 
> 
Your technique might not be broken, but it's not clear to me that it  
isn't.


> > No, but that's okay. If N messages arrive, UIDNEXT must increase  
> by a
> > minimum of N. In fact, if N messages arrive, UIDNEXT will in all
> > likelyhood increase by precisely N. I can conceive of edge cases  
> where
> > it increases by more than N in a reasonable implementation, but  
> they're
> > unlikely. So just add one for each new message that you're told  
> about
> > during a session. (ie, each time you're told about an increase to
> > EXISTS, add the difference to your UIDNEXT).
> 
> Yes, but reading threads about people adding a fake EXISTS/EXPUNGE  
> pair
> to the IDLE, just to make Outlook happy, scares me. I can't think  
> of a
> situation where this extra increment would break anything, but  
> maybe I'm
> just missing something -- that's why I'm asking.
> 
> 
In the case where an EXISTS/EXPUNGE pair is done such that no UIDNEXT  
increment occurs, the server is broken. Still, the algorithm works -  
you just find that the UIDNEXT isn't where you expected it to be.


> >> I do not support QRESYNC yet.
> >
> > You should do - it's effectively the same sync algorithm, but  
> changed to
> > take advantage of the server's knowledge instead - so the  
> complicated
> > selection is done by the server.
> 
> Yes, it's just a matter of making sure that I have the basic
> functionality well-tested and covered by test cases before I build a
> more complicated setup on top of it.
> 
> 
Oh, sure. Just build your code with QRESYNC in mind. Or, alternately,  
build QRESYNC with your code in mind, which is what I did.


> > Polymer doesn't do a binary search, by the way - it fetches  
> chunks of
> > 1000 UIDs at once, and has an amusing algorithm to decide how many
> > chunks to fetch, depending on whether ESEARCH is used (which'll  
> help you
> > hugely) and how sparse the UID mapping is. For finding a UID in  
> the
> > mapping, it actually does a successive approximation technique,  
> on the
> > assumption that the map is uniformly sparse, which works pretty  
> well -
> > usually it find the right chunk in its first guess.
> >
> > For resync purposes, if there have been expunges, it discards and
> > refetches just the last block - because generally expunges will  
> happen
> > in the recent portion of the mailbox - and if the beginning of  
> the newly
> > fetched UID block matches the beginning of the one it's  
> replacing, the
> > UID map is now synchronized.
> 
> I'm assuming that these chunks are always 1000 items in length (ie.  
> they
> cover 1000 messages, or 1000 sequence numbers), and therefore your
> blocks start at message #1, #1001, #2001 etc, is that right? That  
> indeed
> looks like a nice optimization, albeit with a similar latency  
> tradeoff.

Yeah, it leans toward saving bandwidth, more than round-trips, but  
ESEARCH means it's fetching multiple blocks at once (typically the  
entire mailbox, since it figures out how many holes are likely to  
exist, and bigger mailboxes have fewer expunges, I find).

Dave.
-- 
Dave Cridland - mailto:dave@cridland.net - xmpp:dwd@dave.cridland.net
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Reply

mailing list archives

[Imap-protocol] UIDNEXT updates upon new message arrival