|
I found them extremely helpful to generate a parser from some logging template which logs data to a text file in a configurable format. From the template I generated the corresponding parser by converting the template to the corresponding regular expression to extract the data of the log file.
It is not super fast but still pretty good.
|
|
|
|
|
We have this application filtering files for a selection list presented to the user. The filter criteria is read from a JSON file delivered with the application. In theory, advanced users may edit the filter specification - if they know about it. We are the ones defining and interpreting the filter (a new application release might update it, but usually, it just moves on. The common user doesn't know.
I made the filter string a regex, to give ourselves more flexibility for future revision. I was ordered to remove it: What if advanced users discovered that the filter string was a regex, without having the proper competence to edit it, but tried to, anyway? That could creates some very costly support cases...
So I was ordered to remove the four code line lines matching a file name to a regex with a couple functions interpreting multiple filter strings, each specifying a series of AND criteria, each filter adding to the result set, e.g. an OR. That way, it was claimed, no advanced user, but incompetent with respect to regex, would cause complex support cases ...
Obviously, this was BS argumentation, serving no other purpose than enforcing the position of the guru: He could turn down those of inferior rank or position.
This was my first attempt to embed regex into a commercial product. It will probably be my last as well. I am not really a fan of regex. I'd rather code in APL, if you get my drift...
|
|
|
|
|
I still find it handy for testing my expressions against data I need to parse.
I'll very likely need to use it later today.
I should probably post an update.
RegexTester[^]
|
|
|
|
|
Looks interesting, thanks for this reminder, next todo for my list ...
|
|
|
|
|
You should colorize the code in your article.
Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
|
|
|
|
|
and as least often as I can get away with.
I'm not sure how many cookies it makes to be happy, but so far it's not 27.
JaxCoder.com
|
|
|
|
|
PowerGREP/RegexMagic/RegexBuddy owner.
It was the "Welcome to the wonderful world of regex creation" which made me do it (purchase them). But I still haven't managed to free myself from the alarming restraints of the STEEP LEARNING CURVE(S).
Someday ...
|
|
|
|
|
As I mentioned in a recent lounge thread, my most common use is for input validation. HTML pattern attribute on text inputs, then "real" regex on the server to make sure nobody's been bending the page code in their browser. Client-side validation is user-friendly, but it sure isn't secure!
Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012
|
|
|
|
|
In addition to input validation, I use regex for transformations. I support an application that has far too little input validation, and when copying data to other databases, I run nested regex on numerous fields to "fix" the data as best I can.
Any time I use regex, I document what the pattern string does. Later remembering what the pattern does is not always easy -- saves a lot of time by documenting it.
With regex there is no such thing as "self documenting code".
|
|
|
|
|
I have used it for validation. When receiving files from banks, hospitals, etc., they sometimes have bad data (apostrophes, commas, etc.) that need removed to be able to import the data.
Sometimes you need to pull the contents of a certain field that matches a specific format to process. RegEx is the perfect tool for such situations. It is just another valuable tool in the toolbox.
|
|
|
|
|
Not sure if the *.xsd for an *.xml is considered a RegEx, but from those I have done some during a while.
Some very easy checks to evaluate if an input is number or letters.
And a couple of times I have had to work in code done by someone else and there were a couple places with RegExes but... beyond that, nothing.
Luckily?
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
And, I am still fun at parties when I tell about it.
The sh*t I complain about
It's like there ain't a cloud in the sky and it's raining out - Eminem
~! Firewall !~
|
|
|
|
|
So you never use regex? (I mean, if you restrict it to when the are 'readable')
|
|
|
|
|
I worked mainly in closed loop control systems, there is little use for regexp and they are also quite heavy on performances. Sometimes I use them in scripts but using mostly Windows machines it's not that common.
GCS d--(d+) s-/++ a C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|
|
DFA regex is actually one of the most efficient ways to scan and tokenize text.
you seem to be talking about NFA regex, which is more expressive but doesn't perform as well.
Real programmers use butterflies
|
|
|
|
|
My knowledge of regexp is very limited and outdated, when I used them I found them hard and tricky to write and terrible in performance.
I will study DFA regex then, seem useful.
GCS d--(d+) s-/++ a C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|
|
also known as non-backtracking regular expressions.
There are these core operators
() - subexpression (abc) matches "abc"
? - match zero or one occurance abc? matches "ab" or "abc". (abc)? matches "abc" or ""
| - alternation abc|def matches "abc" or "def"
* - kleene star - matches zero or more occurances abc* matches "ab" or "abc" or "abcccccc" etc. (abc)* matches "" or "abcabcabcabc" etc
The rest is syntactic sugar:
[] - charset [abc] (equiv [a-c] is shorthand for a|b|c
[^] - not charset. [^a-c] matches anything but a b or c. The longhand is too long to write out here but is like d|e|f|g... plus all the symbols and control chars
+ - match one or more (abc)+ is equiv to (abc)(abc)*
. - match any single character. the longhand for . is a|b|c|d|e|... etc
also \ is the escape. it matches the literal character that comes next no matter what it is.
That's all you really need to know.
Real programmers use butterflies
|
|
|
|
|
Thanks for the overview
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
Regular expressions are a tool, and a powerful one.
But like all tools, they need to be used in the right place, and the right way or they cause more problems than they solve.
A hammer is a powerful tool - but for some jobs a screwdriver is better. And instead of screwdriver, a spanner undoes bolts better. And the right size spanner won't round off the bolt so you can't use it again.
That's a Regex. It's a hammer / screwdriver / spanner thing that does what it is designed for extremely well: processing text strings to extract patterns. Misuse it to try and get it to count things, or order months, or - gawd forbid - process HTML and it will bite you very hard indeed.
The syntax is arcane, they are pretty slow to execute1, and the match strings can be hard to understand. But in the right place, at the right time? they can save you hours of work! Get a helper app and it can make them a load easier to work with2.
1 Counting Lines in a String[^] shows the difference.
2 I use Expresso[^] - it's free, and it examines and generates Regular expressions.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Excellent summary! I've used Expresso a lot in the past when creating a regex as it makes it a lot easier for testing and double checking my work. That's saved me a lot of Advil over the years...
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music."
-- Marcus Brigstocke, British Comedian
|
|
|
|
|
Quote: Misuse it to try and get it to count things, or order months, or - gawd forbid - process HTML
Well, I would guess that most people would realise that you cannot parse non-regular syntax (html) with a regular expression. After all, it's in the name: <n>REGULAR expression
|
|
|
|
|
Member 13301679 wrote: I would guess that most sensible people would realise ...
Unfortunately, that leaves a whole load of people: most of the "my site scraper don't work" questions we get are because they are trying to use a Regex instead of the proper tool: HtmlAgilityPack.
A Regex can do it. Sort of. But it gets to be truly horrible, very quickly!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I have been using regex for years, both .Net and Perl with no bugs or issues.
|
|
|
|