Click here to Skip to main content
16,022,234 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
C++
/*******************************************************************************************
*  comment info
*  comment info
********************************************************************************************/

I have several thousand C programs with a program header that I need to shorten. The header is formatted as above. I would like the asterisks (*) lines to be no longer than 75 chars. All my attempts with regex either destroy other lines with asterisks or ignore the long line. My biggest problem is to escape the multitude of asterisks because it is recognized as a regex command char :-(

I have a general purpose bash script for search and replace but have been unable to find correct syntax for the regex. I hope I don't have to do something like "\*\*\*..." add nauseam. Perhaps there is a command to search and replace a defined number of repeating chars?

What I have tried:

^<forward slash>/***********... etc
and the matching end of line ***********<dollar sign>... etc
Posted
Updated 15-Jul-24 10:31am
v4

1 solution

Repeat in regex is easy. From "man grep":
man
Repetition
    A regular expression may be followed by one of several repetition operators:
    ?      The preceding item is optional and matched at most once.
    *      The preceding item will be matched zero or more times.
    +      The preceding item will be matched one or more times.
    {n}    The preceding item is matched exactly n times.
    {n,}   The preceding item is matched n or more times.
    {,m}   The preceding item is matched at most m times.  This is a GNU extension.
    {n,m}  The preceding item is matched at least n times, but not more than m times.


So a snippet of \*{18,27} would match any string of 18 to 27 asterisks, no more, no less.

Mark II:
You mention you're using bash, so I presume you have access to sed.
You should be able to put together a short sed script. (Pick a trivial example from any tutorial or man and massage it.)
The guts of it would be something like:
s|^\(/\*{72}\)\*+$|\1| ~~~~~

Pulling it apart:
s is regex substitution command
I'm using the pipe | as delimiter to avoid slashes fore and back

The regex:
^ start anchor
\( \) to isolate a subexpression for later (numbered capture) ~~~~~
/ a slash
\*{72} 72 asterisks (or as many as you want to keep)
\*+ one or more extra asterisks (will be removed)
$ end of line

The replacement
\1 the first matched subexpression ( / and 72 *'s )

I'll leave it to you to write the one for ****/
You can put both of them in the same script and it will just fly through your files.
 
Share this answer
 
v5
Comments
Jan Zumwalt 15-Jul-24 19:35pm    
Peter, thank you - your syntax for the search works! However my problem now is to replace the long line of asterisks to something like 75 of them. Once again, I have been unable mostly due to it being a control char. Any ideas?
Peter_in_2780 15-Jul-24 21:30pm    
I've updated the solution.
Jan Zumwalt 16-Jul-24 19:23pm    
Thank you, you have been very helpful.
This is my command line...
find . -type f -name "main.c" -print0 | xargs -0 sed -i '' -e "s|^(/\*{72})\*+$|\1|"

And I get this error???
sed: -e expression #1, char 20: invalid reference \1 on `s' command's RHS
Peter_in_2780 16-Jul-24 23:50pm    
Oops! Forgot that you need escaped parens for a numbered capturing group.
I've marked the update with ~~~~~ on the affected lines.
I haven't verified it, but it should be awful close.
Jan Zumwalt 17-Jul-24 10:05am    
Thanks, this will save me a lot of time.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900