Introduction
I have been manually entering address information in a Microsoft Access application I have for years, and basically all I have been doing is reparsing information that I can just copy bulk from the web. I knew this was stupid, and I finally decided to do something about it. I searched the web thinking that maybe I could use regular expression to do this. Never did find anything close, and so finally decided to do it myself.
NOTES
The way I use this code is to give preliminary values, and expect to have to edit some of the values to make corrections. This works fine for me since it is used to more quickly auto fill out a form, instead of having to type or copy and paste individual elements.
This is really dirty code. I just got it to work and that was all I was really interested in. It is VBA after all, and only done for one of my own pet Microsoft Access projects. Will probably do some cleanup in the future.
Limitations
This code is pretty basic and has many limitations:
- The company name is expected to be the first information (after removing white space) in the
string
and ends with either a comma or a carriage return. Once extracted, the information is removed so that it will not interfere with later operations. - The phone number is extracted with a regular expression. It is one of the first things done, and once extracted, the phone information is removed so that it will not interfere with later operations.
- Likewise, the web address is extracted with a regular expression. It is one of the first things done, and once extracted, the web address information is removed so that it will not interfere with later operations.
- The street address is assumed to start in the first location that contains a number, and it is assumed to end with either a comma or a carriage return. It is then removed.
- The
City
is assumed to come after the street address with a comma separating it from the two letter State
with the five or nine number zip code after that.
Including the Reference to Use RegExp
If you are using Microsoft Access, this is how you VBA reference to "Microsoft VBScript Regular Expressions 5.5":
- Select "Create" tab on the Ribbon.
- Find "Macros and Code" ribbon section and click "Visual Basic".
- In "Microsoft Visual Basic for Applications" window, select "Tools" from the top menu.
- Select "References...".
- Check the box next to "Microsoft VBScript Regular Expressions 5.5" to include in your workbook.
- Click "OK".
The Code
The code is as follows:
Option Compare Database
Public Function Parse(str As String) As Collection
Dim output As New Collection
str = TrimLeftImproved(str)
phonenumber = getPhoneNumber(str)
uri = GetURI(str)
location = getMaxInstr(str, ",", vbCr)
output.Add Left(str, location - 1)
str = Trim(Mid(str, location + 1))
isNotValidStreet = True
Do While isNotValidStreet
location = GetPositionOfFirstNumericCharacter(str)
str = TrimLeftImproved(Mid(str, location))
location = getMaxInstr(str, ",", vbCr)
testStreet = Left(str, location - 1)
splitstr = Split(testStreet, " ")
If UBound(splitstr) > 1 Then
isNotValidStreet = False
End If
str = TrimLeftImproved(Mid(str, location + 1))
Loop
output.Add Left(testStreet, location - 1)
location = getMaxInstr(str, ",", vbCr)
output.Add Left(str, location - 1)
str = TrimLeftImproved(Mid(str, location + 1))
State = Mid(str, 1, 2)
output.Add UCase(State)
str = TrimLeftImproved(Mid(str, 3))
strPattern = "^\d{5}(?:[-\s]\d{4})?$"
Dim regEx As New RegExp
With regEx
.Global = True
.Multiline = True
.IgnoreCase = False
.Pattern = strPattern
End With
ZipCode1 = Mid(str, 1, 10)
ZipCode2 = Mid(str, 1, 5)
If (regEx.test(ZipCode1)) Then
ZipCode = ZipCode1
str = TrimLeftImproved(Mid(str, 10))
ElseIf (regEx.test(ZipCode2)) Then
ZipCode = ZipCode2
str = TrimLeftImproved(Mid(str, 5))
Else
ZipCode = ""
End If
output.Add GetZip(str)
output.Add phonenumber
output.Add uri
Set Parse = output
End Function
Private Function getMaxInstr(str, val1, val2) As Integer
location1 = InStr(str, val1)
location2 = InStr(str, val2)
If location1 = 0 Then
location = location2
ElseIf location2 = 0 Then
location = location1
Else
location = IIf(location1 < location2, location1, location2)
End If
getMaxInstr = location
End Function
Public Function GetPositionOfFirstNumericCharacter(ByVal s As String) As Integer
For i = 1 To Len(s)
Dim currentCharacter As String
currentCharacter = Mid(s, i, 1)
If IsNumeric(currentCharacter) = True Then
GetPositionOfFirstNumericCharacter = i
Exit Function
End If
Next i
End Function
Function getPhoneNumber(str) As String
Dim regEx As New RegExp
With regEx
.Global = True
.Multiline = True
.IgnoreCase = True
.Pattern = "(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}"
End With
If regEx.test(str) Then
getPhoneNumber = regEx.Execute(str)(0)
str = regEx.Replace(str, "")
Else
getPhoneNumber = ""
End If
End Function
Function GetURI(str) As String
Dim regEx As New RegExp
With regEx
.Global = True
.Multiline = True
.IgnoreCase = True
.Pattern = "(((ht|f)tp(s?))\://)?(<a href="http://www.|_
[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|_
us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\
[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|_
us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\
End With
If regEx.test(str) Then
GetURI = regEx.Execute(str)(0)
str = regEx.Replace(str, "")
Else
GetURI = ""
End If
End Function
Function GetZip(str) As String
Dim regEx As New RegExp
With regEx
.Global = True
.Multiline = True
.IgnoreCase = True
.Pattern = "[0-9]{5}(?:-[0-9]{4})?"
End With
If regEx.test(str) Then
GetZip = regEx.Execute(str)(0)
str = regEx.Replace(str, "")
Else
GetZip = ""
End If
End Function
Function TrimLeftImproved(str) As String
For i = 1 To Len(str)
Dim currentCharacter As String
currentCharacter = Mid(str, i, 1)
If IsNumeric(currentCharacter) Or (Asc(currentCharacter) >= 65 And _
Asc(currentCharacter) <= 122) Then
TrimLeftImproved = Mid(str, i)
Exit Function
End If
Next i
End Function
It will be noted that the address values are returned in a collection. This allows reuse. Also, if the search for the web URI and phone number does not work, then just find a Regular Expression that works better for you.
Using the Code
Here is an example of using this code:
Dim values
Set values = Parse(FullStringEntry)
If Company = "" Then
Company = values(1)
Address = values(2)
City = values(3)
State = values(4)
ZipCode = values(5)
Phone = values(6)
WebSite = values(7)
End If
The Set
for assigning the result of Parse
is important, and will not work without the Set
in the assignment.
Test Input
Here is some text that was copied on a bing search, and it works:
Extended Stay America
<a href="http://www.extendedstayamerica.com">www.extendedstayamerica.com</a>
6961 Lenoir Ave E, Jacksonville, FL 32216
(904) 296-0181
Here is another example of input the works, copied from Google Maps:
United States Postal Service
4.3
·24 reviews
Post Office
SAVE
NEARBY
Send to your phone
SHARE
135 E 100 S, Salt Lake City, UT 84111
Located in: Harmons
usps.com
(801) 428-0366
History
- 2018/05/25: Initial version