Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / string

Validate and Find Addresses with RegEx

4.55/5 (12 votes)
6 May 2015CPOL2 min read 89.5K  
Learn how to use regular expressions to find addresses or parts of an address in a given string. These patterns can also be used to verify that a given string is a true address.

Introduction

There may come a time when you have a document full of various pieces of data, possibly one that contains information on a long list of people, but you only need the addresses. This is where RegEx comes in handy. You can use RegEx to iterate through the characters of the document and locate a string that matches the pattern you specify. RegEx can also be used to check a short string to see if its format and contents match a specified pattern. For a detailed reference on RegEx, check out this article.

Using the Code

To start, the easiest piece of the address to match is the zip code although it's the least exact.

A simple pattern to match a zip code would look like the following:

ZIP Code

\b\d{5}(?:-\d{4})?\b

That pattern matches five primary digits and allows the option of having a hyphen and four extended digits. This matches all zip codes, however it is possible for there to be a match of five digits that is not a zip code. Adding to our pattern will fix that.

Next, we need to match a city and state. This pattern will match most cities:

City

(?:[A-Z][a-z.-]+[ ]?)+

There is room for false matches with this pattern too but when we add the state pattern, it will be much more accurate.

The only sure way to test for a state name is to create a pattern that contains the name of each state. It's long, but you can always know it's 100% accurate. 

State

Alabama|Alaska|Arizona|Arkansas|California|Colorado|Connecticut|Delaware|Florida|Georgia|Hawaii|
Idaho|Illinois|Indiana|Iowa|Kansas|Kentucky|Louisiana|Maine|Maryland|Massachusetts|Michigan|
Minnesota|Mississippi|Missouri|Montana|Nebraska|Nevada|New[ ]Hampshire|New[ ]Jersey|New[ ]Mexico
|New[ ]York|North[ ]Carolina|North[ ]Dakota|Ohio|Oklahoma|Oregon|Pennsylvania|Rhode[ ]Island
|South[ ]Carolina|South[ ]Dakota|Tennessee|Texas|Utah|Vermont|Virginia|Washington|West[ ]Virginia
|Wisconsin|Wyoming

I've added line breaks for readability but make sure to remove those when using the pattern. This is a sure fire way to test for states that are spelled out but in some addresses the states are abbreviated.

State Abbreviations

AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT
|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY

When you are searching for a city, state, zip combination, combine the patterns like this:

City, State, Zip

{city pattern},[ ](?:{state pattern}|{abbrev. state pattern})[ ]{zip pattern}

To finish off our address RegEx pattern, we need to test for a street address.

Street

\d+[ ](?:[A-Za-z0-9.-]+[ ]?)+(?:Avenue|Lane|Road|Boulevard|Drive|Street|Ave|Dr|Rd|Blvd|Ln|St)\.?

For the full pattern, combine the city, state, zip pattern with this street pattern separated by \s to test for either a space or a line break and you're done!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)