Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

PERL: Removing Text Within Parentheses

0.00/5 (No votes)
18 Aug 2010 1  
Removing Text Within Parentheses


I recently wanted to get a list of major landmarks, but the text list had the name of the landmark, followed by the location of the landmark in parentheses: e.g. Eiffel Tower (Paris, France). I just wanted the names of the landmarks without the text in the parentheses, so had to figure out the command to remove all the parenthesized text.

Perl was the winner as tool of choice. There was a small trick to doing this seemingly trivial task, so document it we shall.

Original: Eiffel Tower (Paris, France)
Desired: Eiffel Tower

There are two ways to do it based on what you want:

perl -p -e 's#\(.*\)##g' textfile

You may have seen 's/oldtext/newtext/g' as the syntax before and are wondering why I am using hash marks (or pound signs) instead. You don't have to use the forward slash, it is just the common way, but if you want to use the forward slash in the search text without having to escape, using hash marks is the way. It can also be used to make it easier to read. Now, onto the command--the \( obviously says look for a left parenthesis, then there is the critical .* which says find any number of any characters. Finally, we close it off with a right parenthesis. This will find anything encapsulated by two parentheses.

perl -p -e 's#\([^)]*\)##g' textfile

This solution will also do the same thing based on our Original text string, but it is slightly different. The [^)] is telling "any character that is not a right parenthesis." The carat (^) is negating everything in the brackets. This is useful if you are making an exclusion set. You can place several characters, [^$)?], and it will look for any character except a $, ), or ?.

Since the two commands work the same for the given example, let's show how the commands will vary in different situations:

If textfile contains:

  1. Paris (France,) Hilton (Hotel)
  2. Paris (France (Hilton) Hotel)

Using:

perl -p -e 's#\(.*\)##g' textfile

The results would be:

  1. Paris
  2. Paris

Note the danger here is that, even though in line 1 Hilton is not in parentheses, it gets removed because there is an ending right parenthesis at the end of the line. This may not be the expected/intended operation.

Next, using:

perl -p -e 's#\([^)]*\)##g' textfile

The results would be:

  1. Paris Hilton
  2. Paris Hotel)

The operation for line 1 may have been what we were expecting, but line 2 doesn't look good. The moral here is to understand what you are trying to do and choose the correct command to do the appropriate operation.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here