I recently wanted to get a list of major landmarks, but the text list had the name of the landmark, followed by the location of the landmark in parentheses: e.g. Eiffel Tower (Paris, France). I just wanted the names of the landmarks without the text in the parentheses, so had to figure out the command to remove all the parenthesized text.
Perl was the winner as tool of choice. There was a small trick to doing this seemingly trivial task, so document it we shall.
Original: Eiffel Tower (Paris, France)
Desired: Eiffel Tower
There are two ways to do it based on what you want:
perl -p -e 's#\(.*\)##g' textfile
You may have seen 's/oldtext/newtext/g' as the syntax before and are wondering why I am using hash marks (or pound signs) instead. You don't have to use the forward slash, it is just the common way, but if you want to use the forward slash in the search text without having to escape, using hash marks is the way. It can also be used to make it easier to read. Now, onto the command--the \( obviously says look for a left parenthesis, then there is the critical .* which says find any number of any characters. Finally, we close it off with a right parenthesis. This will find anything encapsulated by two parentheses.
perl -p -e 's#\([^)]*\)##g' textfile
This solution will also do the same thing based on our Original text string, but it is slightly different. The [^)] is telling "any character that is not a right parenthesis." The carat (^) is negating everything in the brackets. This is useful if you are making an exclusion set. You can place several characters, [^$)?], and it will look for any character except a $, ), or ?.
Since the two commands work the same for the given example, let's show how the commands will vary in different situations:
If textfile contains:
- Paris (France,) Hilton (Hotel)
- Paris (France (Hilton) Hotel)
Using:
perl -p -e 's#\(.*\)##g' textfile
The results would be:
- Paris
- Paris
Note the danger here is that, even though in line 1 Hilton is not in parentheses, it gets removed because there is an ending right parenthesis at the end of the line. This may not be the expected/intended operation.
Next, using:
perl -p -e 's#\([^)]*\)##g' textfile
The results would be:
- Paris Hilton
- Paris Hotel)
The operation for line 1 may have been what we were expecting, but line 2 doesn't look good. The moral here is to understand what you are trying to do and choose the correct command to do the appropriate operation.
CodeProject