When I reply by email to a hosted Clearspace instance the message gets delivered fine to the Jive mail gateway, and then disappears, only to reappear as a bounce after 5 days. The latest email message that went missing was delivered to the mail gateway with the following log line:
Jun 27 01:30:48 smtp1 postfix/smtp[12010]: C8321F00481: to=<clearspace-113556920-5552-2-2066133@mail.forums.adobe.com>, relay=mail.sgaur.hosted.jivesoftware.com[209.46.39.252], delay=2, status=sent (250 2.0.0 Ok: queued as E092D2408550)
Times in GMT + 0200
This message still hasn't shown up in the forum. I expect to get a bounce in 2 days or so.
On previous bouncs the NDN read:
Reporting-MTA: dns; mail.sgaur.hosted.jivesoftware.com
X-Postfix-Queue-ID: 7418F24087B0
X-Postfix-Sender: rfc822; spam@vandieten.net
Arrival-Date: Mon, 15 Jun 2009 13:45:00 -0600 (MDT)
Final-Recipient: rfc822; clearspace-1668662174-1ry-2-7VUV@mail.forums.adobe.com
Original-Recipient: rfc822;clearspace-1668662174-1ry-2-7VUV@mail.forums.adobe.com
Action: failed
Status: 4.4.2
Diagnostic-Code: X-Postfix; lost connection with 10.137.24.42[10.137.24.42]
while sending RCPT TO
Reporting-MTA: dns; mail.sgaur.hosted.jivesoftware.com
X-Postfix-Queue-ID: C13D22408391
X-Postfix-Sender: rfc822; spam@vandieten.net
Arrival-Date: Thu, 11 Jun 2009 12:34:50 -0600 (MDT)
Final-Recipient: rfc822; clearspace-1789239401-1ry-2-8eU1@mail.forums.adobe.com
Original-Recipient: rfc822;clearspace-1789239401-1ry-2-8eU1@mail.forums.adobe.com
Action: failed
Status: 4.4.2
Diagnostic-Code: X-Postfix; delivery temporarily suspended: lost connection
with 10.137.24.42[10.137.24.42] while sending end of data -- message may be
sent more than once
Jochem,
I've forwarded this to our hosting team. They will look in to it.
Cheers,
Brad
While the hosting team is looking into this (AFAIK they have been for over a month), can the maximum queue lifetime on the email delivery queue please be lowered? If people get a bounce in 2 days they can resend the message, but after the 5 days that is configured on the Jive email gateway the thread is dead and burried. (That would be the "maximal_queue_lifetime" parameter in main.cf on the email gateway.)
Hey Jochem,
Can you please try taking this through Adobe's support team? Adding this to their list will go a long way to improving the overall system. Likewise we would need Adobe's permission to modify settings on their system no matter how minute they might seem.
Thanks,
Brad
Brad Heller wrote on 2009-07-01 20:35:
Can you please try taking this through Adobe's support team?
Tried that, didn't work. I have no idea where the communication problem
is, but what I hear from Adobe is that they reported both the bug and
asked for the workaround long ago.
Besides, that is not the only point. This is intentionally a duplicate
from the report you already have from Adobe. Because as you said in
http://www.jivesoftware.com/jivespace/message/232149#232149 I need
public bug reports and public numbers t be able to track bugs. So I am
now in the process of creating public duplicates of all the bugs I
already reported through Adobe.
Jochem
--
Jochem van Dieten
Seems like a reasonable request, queue max lifetime set to 2 days.
As far as I am aware this was never suggested from Adobe.
Thank you for reconfiguring the queue timeout.
But is there a case number and a schedule for a fix for the original issue?
Tyler Walden wrote on 8/11/2009 11:41 PM:
Yes, we believe it will fix the issue where multiple posts are made from the same message.
I don't suppose UAT is accessible for me. John?
One of the regulars in a forum I moderate had this problem this weekend
so I have some nice reproductions if you are interested. How will you be
handling the invalid character sets, return to sender?
Jochem
--
Jochem van Dieten
I am seeing some results from the fix now. It does appear to reduce the repeating email problem, but I see no change on the bounces. (Messages don't show up and I don't get a bounce either, hasn't been 2 days for the queue timeout yet.)
That seems consistent with the changes in Subversion. The list with previously processed messageIDs should help with the duplicates (except when that list is truncated at the wrong time), but there is no reason why it should affect the bounces. Why are you not just fixing the root cause?
I am only seeing one message currently in the queue from macromedia@vandieten.net. Which was placed into the queue a little over an hour ago. Do you believe you should have any messages to the UAT site that did not make it through?
I do know that the plugin currently deployed in UAT has many more fixes than the JIRA bug you posted above. I don't have the exact details but I believe various localization issues were corrected, along with upgrading the underlying smtp library.
Have you been able to reproduce similar bounce-backs from the UAT site? If so please post the bounce message here so I can investigate further in the logs.
The message I mentioned above is not being accepted by the Jive plugin and the connection is being closed after sending the RCPT TO.
The two questions here are:
1. Why did the plugin not accept the message in the first place?
2. Why was the connection closed? Proper behaviour would be to return a error response code, causing and immedaite bounce back.
I will speak with the engineers who worked on the modificatins to the plugin and see if they believe any changes in the current plugin in UAT addresses either of these issues.
I think we have a little bit of confusion here, which I must admit is
caused by me because I initially didn't identify the symptoms I reported
as two separate bugs.
There are two bugs. One occurred when CS closed the connection after the
RCPT TO (which it clearly says in the bounce message). That one caused
messages to be bounced back after 5 (now 2) days. That is the one I
wanted to ask about here. You reported the fix were in UAT (or really
Client AT, because Users don't have access), so I peeked in SVN to see
what was changed:
I didn't see any fixes for the bounce issue occurring in the RCPT TO
phase. But I saw many fixes for the repeating email bug (which can be
identified as a different bug, because it occurs at the end of the DATA
phase and the bounce says so) so I asked about that too. Sorry I wasn't
clear about that being a different issue.
Now Adobe told me the fix went live Tuesday evening. At that time there
were a number of messages of me in the queue. I was expecting the
bounces for those issues to arrive between 12 and 10 hours ago (based on
a max delivery interval of the default 4000 seconds in Postfix). They
didn't come. Messages never appeared on the live site as well.
After the fix settled down I started resending problem cases. Problem
cases that previously caused the repeating email bug (DATA) to occur now
work correctly. (Or as well as can be expected considering the root
cause.) I also tried to reproduce the problem case that caused the
bounce problem (RCPT TO). That is the message you see in the queue now
and I am expecting the bounce in 36 hours.
So from my point the results are:
- emails in the queue when the fix went live disappeared;
- repeating email (DATA) issue fixed;
- bouncing email (RCPT TO) issue not fixed.
How this all translates to issue numbers on your end I have no idea. For
me as an end user there is a total disconnect between this forum and
Jira and the commit messages in SVN leave much to be desired in terms of
identifying the Jira cases they are supposed to fix.
Tyler Walden wrote on 8/13/2009 1:39 AM:
1. Why did the plugin not accept the message in the first place?
I have no idea. If I understand the plugin correctly the RCPT TO phase
checks whether I have permissions to post to that thread. Which I do, if
I reply to other messages in that thread the message goes through.
2. Why was the connection closed? Proper behaviour would be to return a error response code, causing and immedaite bounce back.
Preferably a 5xx, because this issue is completely repeatable. And
please make sure the user receives something else then the usual
"address unknown" which is totally unhelpful for en end-user.
I will speak with the engineers who worked on the modificatins to the plugin and see if they believe any changes in the current plugin in UAT addresses either of these issues.
The changes went live already.
Jochem
--
Jochem van Dieten
It appears the new version was deployed on tuesday evening.
I have asked the engineers who were working on the advanced email plugin modifications to chime in on this thread as there is still an issue.
Hi Jochem,
Thanks for your detailed troubleshooting on this issue. I'll confer with Tyler and take a deeper look into this case early next week.
Regards,
Karl
Did you find anything?
So?
Can I at least get a Jira case number?
Hi Jochem,
I am blocking off some time this week to look into your issue.
Regards,
Karl
Did you find anything?
Okay Jochem, I have been looking into this case. A few things to get us started:
Karl Cyr wrote on 10/5/2009 5:28 PM:
Are you still seeing this problem occur frequently?
Occasionally.
Are there multiple users reporting this problem?
Me and 2 others. The rest has pretty much given up on working through
email due to the number of bugs.
I have seen some similar errors appearing in the mail logs over the past week, but there really isn't any way to tell that it isn't simply a matter of a misconfiguration, such as a user's outgoing email address not matching the address associated with their Clearspace user account.
Yes there is. If the From or To address doesn't match you immediately
get a 553 address unknown. That is very different from getting a bounce
after 2 days.
Are there any other error codes associated with the bounce or is it always the same 4.4.2 message?
450
Have you observed any consistent scenarios for reproducing this behavior?
There are some specific messages that replying to will produce this
problem, while replying to different messages in the same thread will
just work.
There are some specific messages that replying to will produce this
problem
Can you provide me with an example or two of a specific message which has this issue?
I am analyzing our mail logs for details about the frequency of the error, and any other patterns which may give us some clues about why this happens.
Regards,
Karl
Karl Cyr wrote on 10/6/2009 2:56 PM:
Can you provide me with an example or two of a specific message which has this issue?
I have this problem with message 1890933. If I reply to the Reply-To
address of clearspace-1668662174-1ry-2-7VUV@mail.forums.adobe.com the
message will be returned to me after 2 days.
I also had this problem with message 1964425 through the address
clearspace-1789239401-1ry-2-8f2h@mail.forums.adobe.com. I don't think
that is a good candidate for your testing though, because that thread
recently got archived.
I am analyzing our mail logs for details about the frequency of the error, and any other patterns which may give us some clues about why this happens.
I have given you this advice before when you couldn't reproduce the
repeating email bug, but ISTM what you really should do is run "qshape
deferred" on your mail server, grep it for the mail.forums.adobe.com and
hook that up to your monitoring. With the way the AdvancedEmail plugin
works any delay of more then 30 minutes is an indication of a problem
and at that time you can simply copy all delayed email messages straight
from the mail queue for analysis. It would really save all of us (Jive,
Adobe and me) a lot of time if you just monitored the mail queue delay.
In older mail logs from before the plugin was updated, I see a lot of entries like this:
maillog.0624:Jun 24 14:58:49 sgaurmgmt01p postfix/qmgr[21718]: 9035B2408732: removed
maillog.0624-0626:Jun 24 14:38:28 sgaurmgmt01p postfix/smtpd[25030]: 9035B2408732: client=smtp1.prisma-it.com[93.188.252.11]
maillog.0624-0626:Jun 24 14:38:28 sgaurmgmt01p postfix/cleanup[4026]: 9035B2408732: message-id=<4A428EBE.3080703@email.com>
maillog.0624-0626:Jun 24 14:38:28 sgaurmgmt01p postfix/qmgr[21718]: 9035B2408732: from=<your@email.com>, size=2118, nrcpt=1 (queue active)
maillog.0624-0626:Jun 24 14:38:28 sgaurmgmt01p postfix/qmgr[21718]: 9035B2408732: to=<clearspace-331453137-5552-2-2061806@mail.forums.adobe.com>, relay=none, delay=0.15, delays=0.15/0/0/0, dsn=4.4.2, status=deferred (delivery temporarily suspended: lost connection with 10.137.24.42[10.137.24.42] while sending end of data -- message may be sent more than once)
maillog.0624-0626:Jun 24 14:58:48 sgaurmgmt01p postfix/qmgr[21718]: 9035B2408732: from=<your@email.com>, size=2118, nrcpt=1 (queue active)
maillog.0624-0626:Jun 24 14:58:49 sgaurmgmt01p postfix/smtp[4239]: 9035B2408732: to=<clearspace-331453137-5552-2-2061806@mail.forums.adobe.com>, relay=10.137.24.42[10.137.24.42]:2500, delay=1221, delays=1220/0/0.02/0.79, dsn=2.0.0, status=sent (250 Ok)
However, I do not see any such errors in recent logs (from the past few weeks). It appears that all emails which arrive at the server from your email address are going to the proper place in a timely manner.
Here is what is interesting:
Oct 11 05:44:56 sgaurmgmt01p postfix/smtpd[24013]: 5B3D72408457: client=ns21.supremeservers.co.uk[91.192.193.32]
Oct 11 05:44:57 sgaurmgmt01p postfix/cleanup[24011]: 5B3D72408457: message-id=<20091011124443.smdv2a7aos8kk0oo@webmail.my-mail2u.com>
Oct 11 05:44:57 sgaurmgmt01p postfix/qmgr[13493]: 5B3D72408457: from=<attorney@my-mail2u.com>, size=3341, nrcpt=3 (queue active)
Oct 11 05:44:57 sgaurmgmt01p postfix/smtp[8561]: 5B3D72408457: to=<clearspace-113556920-5552-2-2066133@mail.forums.adobe.com>, relay=10.137.24.42[10.137.24.42]:2500, delay=1.3, delays=0.97/0/0/0.37, dsn=5.0.0, status=bounced (host 10.137.24.42[10.137.24.42] said: 553 <clearspace-113556920-5552-2-2066133@mail.forums.adobe.com> address unknown. (in reply to RCPT TO command))
Oct 11 05:44:57 sgaurmgmt01p postfix/smtp[8561]: 5B3D72408457: to=<clearspace-1668662174-1ry-2-7VUV@mail.forums.adobe.com>, relay=10.137.24.42[10.137.24.42]:2500, delay=1.5, delays=0.97/0/0/0.55, dsn=5.0.0, status=bounced (host 10.137.24.42[10.137.24.42] said: 553 <clearspace-1668662174-1ry-2-7VUV@mail.forums.adobe.com> address unknown. (in reply to RCPT TO command))
Oct 11 05:44:57 sgaurmgmt01p postfix/smtp[8561]: 5B3D72408457: to=<clearspace-1789239401-1ry-2-8eU1@mail.forums.adobe.com>, relay=10.137.24.42[10.137.24.42]:2500, delay=1.7, delays=0.97/0/0/0.71, dsn=5.0.0, status=bounced (host 10.137.24.42[10.137.24.42] said: 553 <clearspace-1789239401-1ry-2-8eU1@mail.forums.adobe.com> address unknown. (in reply to RCPT TO command))
Oct 11 05:44:57 sgaurmgmt01p postfix/bounce[11054]: 5B3D72408457: sender non-delivery notification: DA6ED24085E4
Oct 11 05:44:57 sgaurmgmt01p postfix/qmgr[13493]: 5B3D72408457: removed
It appears as though a spam account is attempting to send messages to the system using your token.
karlcyr wrote:
However, I do not see any such errors in recent logs (from the past few weeks). It appears that all emails which arrive at the server from your email address are going to the proper place in a timely manner.I have learned to only reply to recent messages. Then I seem to get an immediate bounce if something is wrong. See for instance the email message with queue ID D101824084CF from Sunday for an immediate bounce with a correct To and From address. That is still annoying, but since it is immediateI can take other actions and post through the webservices when that happens. I just emailed an example of that to you.
And I just replied to an email that is known to cause this problem, so there is an example of the original problem in the queue for the next two days.
It appears as though a spam account is attempting to send messages to the system using your token.
That is nothing: I have been receiving spam for a while in 2002 where I could see from the headers I had wroten the mailer they used to send it ![]()
jochemd wrote:
And I just replied to an email that is known to cause this problem, so there is an example of the original problem in the queue for the next two daysI just got the bounce back from the message after 2 days and 42 minutes. The NDN reads:
Reporting-MTA: dns; mail.sgaur.hosted.jivesoftware.com
X-Postfix-Queue-ID: 5C55F2408A8A
X-Postfix-Sender: rfc822; ********@vandieten.net
Arrival-Date: Tue, 13 Oct 2009 13:47:00 -0600 (MDT)
Final-Recipient: rfc822; clearspace-1668662174-1ry-2-7VUV@mail.forums.adobe.com
Original-Recipient: rfc822;clearspace-1668662174-1ry-2-7VUV@mail.forums.adobe.com
Action: failed
Status: 4.4.2
Diagnostic-Code: X-Postfix; lost connection with 10.137.24.42[10.137.24.42]
while sending RCPT TO
Did you grab this from the queue while it was in there? Do you have enough information to debug this issue?
Hi Jochem,
In reply to your email:
The email token is actually part of the core product, and is used for the Email Monitor functionality. Since the AEP is essentially an extension of the Email Monitor functionality, it borrows the email token functionality from the core code base. By default, tokens expire after seven days.
It does seem as though there may be an issue with the AEP here, since an email that is delivered to an address with an expired token should be outright rejected immediately, rather than being kicked around the mail queue until the timeout. It is also possible to extend the timeout value, if necessary.
karlcyr wrote:
The email token is actually part of the core product, and is used for the Email Monitor functionality. Since the AEP is essentially an extension of the Email Monitor functionality, it borrows the email token functionality from the core code base. By default, tokens expire after seven days.
So is there any documentation that we can reference in our FAQ for this functionality? And are users really supposed to get an "address unknown" error, because here I am getting a "token expired" error?
karlcyr wrote:
It does seem as though there may be an issue with the AEP here, since an email that is delivered to an address with an expired token should be outright rejected immediately, rather than being kicked around the mail queue until the timeout.
So do I need to resend it or did you grab it from the queue for debugging?
Hi Jochem,
So do I need to resend it or did you grab it from the queue for debugging?
I have all the information I need for debugging now. From the evidence I have seen, it looks like the messages are bouncing as a result of the expired email tokens. If you do see any messages bounce for any other reason, please let me know.
I could not find any documentation on this functionality, though I am happy to answer any questions you may have.
And are users really supposed to get an "address unknown" error, because here I am getting a "token expired" error?
Without having researched this yet, it sounds like a bug in the plugin. If the email tokens are expired, the plugin should be rejecting the messages immediately and the sender should receive a 5xx bounce message. While an "address unknown" error might technically be correct, because the token is used to generate the recipient address, it is not descriptive and leads to the type of confusion we've seen here.
There is also the option of extending the mail timeouts beyond 7 days.
My recommendation would be to bring this up to John C. or another Adobe employee who has the ability to file a support case in Adobe's private space. For procedural reasons, I cannot take either of these actions (fixing the plugin or changing the timeout) without a customer submitted support ticket.
Regards,
Karl
Karl Cyr wrote on 10/16/2009 3:16 PM:
>> So is there any documentation that we can reference in our FAQ for this functionality?
I could not find any documentation on this functionality, though I am happy to answer any questions you may have.
Could you file a documentation bug? People getting error messages
through no fault of their own under conditions the system does not
consider an error needs to be documented in the user guide.
>> And are users really supposed to get an "address unknown" error, because here I am getting a "token expired" error?
Without having researched this yet, it sounds like a bug in the plugin.
OK, I'll wait for your research on that issue.
My recommendation would be to bring this up to John C.
I will when he gets back from vacation.
Hi Jochem,
I agree about the documentation on the email tokens. I have created a feature request with the doc team to have this added to future documentation.
As for the other issues with the plugin and the configuration, these requests will need to be routed through the Adobe team.
Regards,
Karl
I understand John C. has escalated this issue to you last week. I received another bounce that took 2 days today again:
Reporting-MTA: dns; mail.sgaur.hosted.jivesoftware.com
X-Postfix-Queue-ID: E487E240844A
X-Postfix-Sender: rfc822; ********@vandieten.net
Arrival-Date: Tue, 27 Oct 2009 11:31:12 -0600 (MDT)
Final-Recipient: rfc822; clearspace-123451231979-1ry-2-7UV@mail.forums.adobe.com
Original-Recipient: rfc822;clearspace-123451231979-1ry-2-7UV@mail.forums.adobe.com
Action: failed
Status: 4.4.2
Diagnostic-Code: X-Postfix; lost connection with 10.137.24.42[10.137.24.42]
while sending RCPT TO
Hi Jochem,
We were finally able to track down the exact nature of the issue.
When the plugin was upgraded to fix the repeating messages issue, the change included a fix for a problem which caused some emails from certain email clients to fail to be processed correctly. The fix for the issue was to move from the base62 encoded address string format like:
clearspace-123451231979-1ry-2-7UV
to an integer string format like:
clearspace-123451231979-5552-2-10293
The problem we found is that the code tries to parse the string value '1ry' into an Integer object, which will of course fail. The plugin throws an exception, and immediately stops processing the mail message without logging anything or returning a mail error code. The relay server assumes that the connection was severed for an unknown reason, and holds on to the message to retry delivery. This cycle continues until the delivery timeout threshold is reached, and then the relay server gives up.
What should happen is that the exception is caught, and a bounce message is delivered immediately. I have modified the code to fix this error. In the case of this particular message, it would not matter much; even if the address was processed successfully, the email would not be delivered because the token is expired.
Regards,
Karl
Karl Cyr wrote on 10/29/2009 9:49 PM:
What should happen is that the exception is caught, and a bounce message is delivered immediately. I have modified the code to fix this error.
I am holding off the final verdict on this until this fix has gone into
production. Judging by my archives I ran into this problem even before
the update which switched from encoded to plain went in production last
August.
In the case of this particular message, it would not matter much; even if the address was processed successfully, the email would not be delivered because the token is expired.
Which is the next problem: people are getting an "address unknown" error
instead of a "token expired" error. Is it clear yet what is causing that
one?
I am holding off the final verdict on this until this fix has gone into
production. Judging by my archives I ran into this problem even before
the update which switched from encoded to plain went in production last
August.
If you can provide an example of this happening with an address string which does not contain invalid characters, I will investigate.
Which is the next problem: people are getting an "address unknown" error
instead of a "token expired" error. Is it clear yet what is causing that
one?
This is how we have chosen to implement the plugin. The error message is technically correct, just not the most helpful. I will see if this is an easy change to make, and commit it to the plugin. However, it will be Adobe's decision to choose to update the plugin or not. Given the edge-case nature of these issues, it may not be necessary.
Jive combines collaboration software, community software & social networking software into the leading SBS solution.
© Copyright 2000–2009 Jive Software. All rights reserved.
915 SW Stark St., Suite 400, Portland, OR 97205