Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Using Bounced E-mail Messages to Clean Your Address List

0.00/5 (No votes)
15 Jul 2002 1  
An easy and accurate way to use bounced messages to clean your address list.

Introduction

Lately at Quiksoft, we have been talking a lot about cleaning up our e-mail address list. Many of our customers have been asking how to reliably track the status of outbound e-mail messages, and how to update their address database when a message is returned undeliverable, otherwise known as a bounce.

In this article you will learn:

Three very important reasons why your must clean your e-mail address list now What you need to know about how SMTP servers route bounced messages
The secret to automatically matching bounced messages to addresses in your database The difference between hard and soft bounces and why you should track both Bonus secret to tracking failures on a mailing by mailing basis

This edition also contains downloadable sample code that will:

Encode your outbound messages with the proper information so that they can be matched to your address database if they are returned undeliverable
Scan your bounced messages and flag the addresses in your database
Provide you with tons of phrases found in typical bounced messages, which can be used to programmatically discover their meaning

Three reasons why you must clean your list now...

I used to think that the quality of my list didn't matter. I thought It would be better to send to the entire list and let failures take care of themselves. But that was then, and this is now, and over the years experience has taught me three important reasons why it is important to keep a clean list: Some popular mail servers may block all mail from you if you repeatedly send mail to a bad address on their domain.
Repeatedly sending e-mail to bad addresses wastes bandwidth. Even if bandwidth is not an issue now, this problem will grow in scale with time.
If you are going to do any type of response tracking, you must subtract out the failures for an accurate report.
So with these reasons in mind, I set out to clean our address list. But how to do it reliably was the question...

A simple answer to a complex problem...

To clean our address list I would have to identify bad addresses and flag them in our address database so that I did not send e-mail to them anymore. I decided that I did not want to delete the bad addresses, I just wanted to flag them as being bad. But how do you determine that an address is bad?

Most SMTP servers will accept mail addressed to just about anyone in their domain, and only later figure out that the user does not exist. That means that whatever app you use to send mail will almost never know that there is a problem. As far as your app is concerned, the SMTP server accepted the message -- period.

I tried looking at so called "address verifier" components. These components check the email address for syntactical errors and for non-existent domains, but they can not actually tell if the user part of the address is valid. I used several of these to validate buggs.bunny@microsoft.com and was excited to find that Buggs does work at Microsoft these days, but when I sent him an e-mail, it bounced back with the following message: "Delivery to the following recipients failed: buggs.bunny@microsoft.com". The truth is that these "address verifier" components were no better at verifying addresses than my app was, so they were of no use to me.

So how do you reliably determine if an address is good? The answer is -- you can't. But you can determine if an address is bad when a message sent to it is returned undeliverable (bounced), and that is the key to solving this problem.

The best part of this solution is that it is not dependant on extended SMTP features. It will work all the time provided that the recipient's mail server correctly adheres to RFC-821, the minimum requirements for any SMTP server. The SMTP protocol as outlined in RFC-821 provides for a notification mechanism when a message can not be delivered. This notification mechanism works by creating a new e-mail message which is sent to the original sender to inform them that their message was not delivered. This e-mail message is commonly referred to as a bounce. The first step to cleaning our address list is to funnel the bounced messages into a central location where they can be programmatically analyzed.

The following 3 step process, will enable you to capture bounced messages, figure out which address in your database they belong to, and flag the record.

Three Easy Steps

Step 1. Use a bounce box...

The first step in cleaning your list is to trap bounced messages in a central location. We suggest that you create a "bounce box". A bounce box is a dedicated e-mail account that is setup to trap returned messages i.e. bounce@yourdomain.com. To be sure that returned messages find their way to your bounce box you must understand how these messages are routed by SMTP servers.

When a message is submitted to an SMTP server it is tagged with a reverse-path. The reverse-path is specified by the sending application with the MAIL FROM: command as outlined in the SMTP RFC-821. The reverse-path is the path the the server should use to communicate with the original sender of the message, and therefore the reverse-path is typically the e-mail address of the sender (the from address).

The SMTP sever stores the reverse-path internally, not in the actual message, and forwards it with the message through any relay servers as necessary until the message encounters an error or reaches its destination. Since the return-path is not recorded in the actual message it is typical to add a From: header to the e-mail message which contains the address of the sender and an optional friendly name. i.e. "Joe Sender" . Mail readers use the From: header to display who a message is from.

It is very important to understand that the reverse-path and the address in the From: header need not be the same. Therefore it is possible to send a message which will be displayed by mail readers as coming from joe.sender@domain.com, but has a reverse-path of some_other_address@domain.com.

Once you understand the difference between the reverse-path and the From: header, and the roles they play, you are on your way to building messages that will be displayed in a friendly manner if delivered, or will be returned to your centralized bounce box if there is a failure.

Step 2. Add custom data to bounced messages...

This step requires that your mail server is capable of being configured to use a wildcard address. In other words, it needs to be able to route all mail to bounce*@yourdomain.com to one specific account such as bounce@yourdomain.com. If your mail server does not support wildcard addresses, you can accomplish the same thing by using a "catch-all" box and a dedicated domain.

You can then append custom data to the end of the account name portion of the return-path and it will still be delivered to the bounce@yourdomain.com account. For example, suppose each e-mail address in your database is identified by a unique numerical id. You can then encode this id into your bounce address. For example, suppose that the recipient address is jane.recipient@domain.com, and the id of this address in your database is 1063. You could then build an address such as bounce_1063@yourdomain.com.

You can then send a message to jane.recipient@domain.com and specify bounce_1063@yourdomain.com as the reverse-path by passing that address to the SMTP server with the MAIL FROM command. i.e. MAIL FROM:. To provide a friendly "from" name or address for Jane's mail reader to display, you can add a From: header to the message. i.e. From: "Joe Sender" .

The sample at the end of this article shows how easily this can be done.

If the message is delivered successfully, Jane's mail reader will display it as coming from Joe Sender. If for some reason the message is undeliverable, a "undeliverable mail" notification message will be sent to bounce_1063@yourdomain.com. Since your mail server has been instructed to deliver all messages for bounce*@yourdomain.com to bounce@yourdomain.com, this returned messages should now land in your bounce box.

Additionally, since returned messages are returned to the address specified by its reverse-path, each of these messages should have your custom bounce address in the To: header. In other words, each of the messages in the bounce box will be addressed to bounce_@yourdomain.com, where represents the id of the e-mail address in your database which is related to the bounce. Our testing has indicated however that some mail servers use the From: address of the original message as the To: address of its resulting bounce. This is not what should be going on according to the RFC, but we have a fix for that too. If the To: header address does not begin with bounce_, you can scan the message's "Received" headers and find your bounce address there. The sample code shows you how this is done.

Following these rules, you can now easily match bounced messages up to your database, as you will see...

Step 3. Retrieve the bounced messages and update your database...

At this point, assuming you have sent mail as prescribed above, and some of those messages were returned, you will have one or more messages in your bounce box. Each of these messages will be addressed to bounce_@yourdomain.com, where represents the id of the e-mail address in your database which is related to the bounce.

Now it is important to understand that there are two types of bounces: hard and soft. Permanent failures, such as a nonexistent account or domain, are considered hard bounces. Other failures, such as a full mailbox or blocked domain, are considered soft bounces. Instead of flagging your addresses as good or bad, your database can keep a running count of hard and soft bounces for each address. That way, your mailing application can be more intelligent about determining which addresses to exclude from future mailings. For example you might only want to send mail to any addresses with less than 8 soft bounces and less than two hard bounces. I usually do not like to exclude someone from future mailings unless they have more than one hard bounce. Just to be sure that the address is really invalid, I look for at least two hard bounces.

Your application will have to scan the text of the bounced messages looking for phrases that indicate the reason for the bounce. It will look for such phrases as "delivery failure", "box full", etc... (The downloadable sample code includes a database of the phrases we have discovered in typical bounced messages.) Your app will determine if each bounce is hard or soft based on the phrase it finds in the message.

Once your app determines if the bounce is hard or soft, it can increment the bounce_hard and bounce_soft fields in the database accordingly. It can then delete the message from the bounce box. If your app can not determine if the message is a hard or soft bounce the message can be left in the bounce box. Periodically the messages remaining in the bounce box can be analyzed by a human who can visually determine why they were not identified by the phrase scanner algorithm. The algorithm can then be updated to catch this type of message. Once your app is run again, it should handle this message properly and clear it from the bounce box. As time goes on, your phrase scanning algorithm should improve more and more. If you start with the phrases included with the downloadable sample code, your app should immediately id just about every bounced messages.

The Samples

The following VB Script samples interface with an Access database that contains the e-mail addresses. The second sample also interfaces with an XML file that contains the phrases typically found in bounced messages. The downloadable code includes the source code shown below along with the Access and XML files. The samples listed on this page vary slightly from the downloadable code, as the code below has been edited to fit the newsletter format.

SAMPLE 1: Constructing and sending the message...

In this sample, we will send a message with a friendly address in the From: header, and our bounce address specified as the reverse-path. This example uses VB Script and the EasyMail SMTP object. The The SMTP object contains a FromAddr property, and by default the SMTP object will use the value specified by this property for both the reverse-path and automatic creation of the From: header. We will override this behavior by setting the OptionFlags property to 1 which turns off the automatic creation of the From: header. We will then create the From: header ourselves with the AddCustomHeader() method.

'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02V4BFDSFFDFSD62"
strMailServer="mail.yourdomain.com"
strBounceBoxDomain="yourdomain.com"
strFriendlyFromName="Joe Sender"
strFriendlyFromAddress="joe.sender@domain.com"
'End To Do

Dim objSMTP, Data, RS, nRetVal

'create EasyMail SMTP object and set basic properties
Set objSMTP = CreateObject("EasyMail.SMTP")
objSMTP.LicenseKey = strLicenseKey
objSMTP.MailServer = strMailServer
objSMTP.OptionFlags = 1
objSMTP.AddCustomHeader "From", _
"""" & strFriendlyFromName & """" &_
" <" & strFriendlyFromAddress & ">"
objSMTP.Subject = "Subject..."
objSMTP.BodyText = "Message text"

'setup database and select addresses.
'This sample uses a access database.
Set cnnData = CreateObject("ADODB.Connection")
strConnection = "DBQ=email_database.mdb"
cnnData.Open "DRIVER=" &_
"{Microsoft Access Driver (*.mdb)};" &_
strConnection
Set RS = CreateObject("ADODB.RecordSet")
RS.Open "SELECT hard_bounces,id, name, address" &_
" FROM email_table" &_
" where hard_bounces < 2" &_
" and soft_bounces < 4", cnnData, 1, 3"

'send to each address selected
Do While RS.EOF = False

'encode record id in from address
objSMTP.FromAddr = "bounce_" & RS("id") &_
"@" & strBounceBoxDomain
objSMTP.AddRecipient RS("name"), RS("address"), 1
nRetVal = objSMTP.Send

'if the recipients address fails right
'away then we mark it as a hard bounce now.
If nRetVal = 8 Then
RS("hard_bounces") = RS("hard_bounces") + 1
End If

'remove the recipients
objSMTP.Clear 1

RS.MoveNext

Loop

'free remaining resources
RS.Close
cnnData.Close

Sample 2: Scanning the bounced messages and updating your database...

This sample uses the EasyMail POP3 object to download each message in our bounce box. Each message is parsed and the body text is scanned for specific phrases to determine if the message is a hard or a soft bounce. Once the code determines the type of bounce, it parses the id off of the To: address which identifies the address in our database. If the To: address does not begin with "bounce" it scans the received headers for the bounce address by using the TimeStamps collection. The sample then updates the bounce_soft and bounce_hard fields in the database accordingly before deleting the message from the bounce box. If the type of bounce can not be determined it is left in the bounce box for human analysis which will be used to improve the phrase scanning code in the future. The phrases used to identify bounced messages are read from an XML file.

'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02E00220B529204B62"
strMailServer= "mail.yourdomain.com"
strAccount= "bounce_account"
strPassword= "bounce_password"
'End To Do

Main

Sub Main()

Dim objPOP3, nCnt
Dim nBounceType, nId, nPos1, nPos2
Dim strBodyText, strToAddr, nOrdinal
Dim strConnection, nRetVal

'create the EasyMail POP3 object and assign
'the basic properties
Set objPOP3 = CreateObject("EasyMail.POP3")
objPOP3.LicenseKey = strLicenseKey
objPOP3.MailServer = strMailServer
objPOP3.Account = strAccount
objPOP3.Password = strPassword

'connect to the mail server
nRetVal = objPOP3.Connect()
If Not nRetVal = 0 Then
MsgBox "Error connecting to mail server."
exit sub
End If

'prepare the database and select our e-mail table
Set cnnData = CreateObject("ADODB.Connection")
strConnection = "DBQ=email_database.mdb"
cnnData.Open "DRIVER=" &_
"{Microsoft Access Driver (*.mdb)};" &_
strConnection

Set rs = CreateObject("ADODB.RecordSet")
rs.Open "SELECT * FROM email_table", cnnData, 1, 3

'get the count of messages waiting in the
'bounce box and download and process each one
nCnt = objPOP3.GetDownloadableCount()
For x = 1 To nCnt
nOrdinal = objPOP3.DownloadSingleMessage(x)
If nOrdinal < 0 Then
MsgBox "There was an error downloading " &_
"the message. " & nOrdinal
exit sub
End If
strBodyText = objPOP3.Messages(nOrdinal).BodyText

'get id from To: address
set objMsgs = objPOP3.Messages
For Each Recip In objMsgs(nOrdinal).Recipients
strToAddr = Recip.Address
If LCase(Left(strToAddr, 6)) = "bounce" Then
Exit For
End if
Next

'if address is not found then try searching
'timestamps (AKA received headers)
If Not LCase(Left(strToAddr, 6)) = "bounce" Then
For Each TimeS In objMsgs(nOrdinal).Timestamps
strToAddr = TimeS.For
If LCase(Left(strToAddr, 6)) = "bounce" Then
Exit For
End if
Next
End If

'if it is a bounce message we will process it
If Left(strToAddr, 6) = "bounce" And _
InStr(strToAddr, "_") Then
nPos1 = InStr(strToAddr, "_") + 1
nPos2 = InStr(strToAddr, "@")

If nPos2 > nPos1 Then
nId = Mid(strToAddr, nPos1, nPos2 - nPos1)
End If

'call the IdentifyBounce routing which scans
'the bodytext for the phrases found in our
'xml file
nBounceType = IdentifyBounce(strBodyText)

If nBounceType > 0 Then

'the message has been identified as a hard
'or soft bounce so update the database
rs.Find ("id=" & nId)
If rs.EOF = False and rs.BOF=False Then
If nBounceType = 1 Then
rs("soft_bounces")=rs("soft_bounces")+1
Else
rs("hard_bounces")=rs("hard_bounces")+1
End If
'update changes
rs.update
End If
'delete the message from the bounce box
objPOP3.DeleteSingleMessage x

elseif nBounceType = 0 then

'If nBounceType is 0 then it is a warning
'message or auto-responsea so we will
'delete the message from the bounce box.
objPOP3.DeleteSingleMessage x
End If
End If

'free resources used by the parsed message. This
'call does not delete messages from the server.
objPOP3.Messages.DeleteAll

Next

'disconnect from mail server
'and free remaining resources
objPOP3.Disconnect
rs.Close
msgbox "Operation Complete."

End sub

Function IdentifyBounce(strBodyText)

Set st = CreateObject("ADODB.Stream")
Set rs = CreateObject("ADODB.RecordSet")

st.Open
st.LoadFromFile ("bounce_signatures.xml")

rs.Open st
rs.Sort = "weight DESC"

IdentifyBounce = -1

Do While Not rs.EOF
If InStr(1, strBodyText, rs("signature"), _
vbTextCompare) Then
IdentifyBounce = rs("weight")
End If
rs.MoveNext
Loop
rs.Close

End Function

Conclusion

I hope you found this article useful in your efforts to clean your address list. If you have any suggestions for future topics, please let me know. You can find my contact information at the bottom of this page.

Bonus. Measuring failures from a specific mailing...

Some of our customers want to measure the count of delivery failures for each mailing they do. We showed you how to embed an id into the "reverse-path" so that it is easy to match the bounced message up with the address in your database, but you can even go a step further by inserting a mailing identifier as well.

Lets say you want to keep track of the number of bounced messages for a specific mailing, and lets assume that each mailing is represented by a row in a table. The row has a unique id field which is the mailing identifier. You can encode the mailing identifier onto the account portion of the reverse-path like this: bounce_1063_34@yourdomain.com, where 1063 is the id of the address and 34 is the id of the mailing. You can then modify your database update routine to flag the number of hard and soft bounces for each mailing as well as each address.

This article can be found at http://www.quiksoft.com/newsletter/issue001/

�2002 Quiksoft Corporation. All rights reserved. Unauthorized duplication or distribution prohibited. Quiksoft, EasyMail, EasyMail Objects, EasyMail .Net Edition, EasyMail Advanced API, EasyMail SMTP Express, and MailStore are trademarks of Quiksoft Corporation. Other trademarks mentioned are the property of their legal owner.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here