|
I feel like it might be chasing ghosts, particularly since I already get really great performance out of the thing, especially compared to .NET Regex even though that always uses ReadOnlySpan. I still beat it by 3x in the best case.
Microsoft Regex "Lexer": [■■■■■■■■■■] 100% Found 220000 matches in 35ms
Microsoft Regex compiled "Lexer": [■■■■■■■■■■] 100% Found 220000 matches in 20ms
FAStringRunner (proto): [■■■■■■■■■■] 100% Found 220000 matches in 7ms
FATextReaderRunner: (proto) [■■■■■■■■■■] 100% Found 220000 matches in 13ms
FAStringDfaTableRunner: [■■■■■■■■■■] 100% Found 220000 matches in 10ms
FATextReaderDfaTableRunner: [■■■■■■■■■■] 100% Found 220000 matches in 14ms
FAStringStateRunner (NFA): [■■■■■■■■■■] 100% Found 220000 matches in 145ms
FAStringStateRunner (Compact NFA): [■■■■■■■■■■] 100% Found 220000 matches in 43ms
FATextReaderStateRunner (Compact NFA): [■■■■■■■■■■] 100% Found 220000 matches in 48ms
FAStringStateRunner (DFA): [■■■■■■■■■■] 100% Found 220000 matches in 11ms
FATextReaderStateRunner (DFA): [■■■■■■■■■■] 100% Found 220000 matches in 16ms
FAStringRunner (Compiled): [■■■■■■■■■■] 100% Found 220000 matches in 7ms
FATextReaderRunner (Compiled): [■■■■■■■■■■] 100% Found 220000 matches in 12ms
7ms is about what I get compared to microsoft's 20 if I'm making the fairest comparison possible (apples vs apples) 'cept mine doesn't backtrack or support a bunch of fluff. (though it lacks anchors )
If I can't get another 10% out of this I don't think it's worth the trouble.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
That is impressive.
Have you seen the latest updates to RegEx ? Regular Expression Improvements in .NET 7 - .NET Blog[^] ... Source Generators for RegEx are next level!
I would dig into the source code for the changes to the RegEx and the Source Generator for RegEx and see how they do it to get ideas for yours.
Quote: If I can't get another 10% out of this I don't think it's worth the trouble.
I hear you, and for a one-off can agree, however, you will need to understand that last 10% for future projects. The question is now or then?
Graeme
"I fear not the man who has practiced ten thousand kicks one time, but I fear the man that has practiced one kick ten thousand times!" - Bruce Lee
|
|
|
|
|
That's actually targeting Microsoft's .NET 7 implementation, and yeah I've looked at their source generator and considered making my own using the same tech. Right now I'm using the CodeDOM for that, which is older, but doesn't require near as much buy in in terms of your install base. For instance, you don't need compiler services running, and I'm not even sure how compatible it is with DNF and there are other unknowns. I need to do more research.
I actually did dotNetPeek them which is how I figured out the Span stuff. I don't like their code. Frankly, I'm impressed with the code-synthesis but they still made it hard to follow, and I'm not sure if that's so beneficial. My code looks machine generated, but it's easy to follow, as state machines go:
private FAMatch _BlockEnd0(ReadOnlySpan<char> s, int cp, int len, int position, int line, int column) {
q0:
if ((cp == 42)) {
this.Advance(s, ref cp, ref len, false);
goto q1;
}
goto errorout;
q1:
if ((cp == 47)) {
this.Advance(s, ref cp, ref len, false);
goto q2;
}
goto errorout;
q2:
return FAMatch.Create(0, s.Slice(position, len).ToString(), position, line, column);
errorout:
if ((cp == -1)) {
return FAMatch.Create(-1, s.Slice(position, len).ToString(), position, line, column);
}
this.Advance(s, ref cp, ref len, false);
goto q0;
}
private FAMatch NextMatchImpl(ReadOnlySpan<char> s) {
int ch;
int len;
int p;
int l;
int c;
ch = -1;
len = 0;
if ((this.position == -1)) {
this.position = 0;
}
p = this.position;
l = this.line;
c = this.column;
this.Advance(s, ref ch, ref len, true);
if ((ch == 47)) {
this.Advance(s, ref ch, ref len, false);
goto q1;
}
goto errorout;
q1:
if ((ch == 42)) {
this.Advance(s, ref ch, ref len, false);
goto q2;
}
goto errorout;
q2:
return _BlockEnd0(s, ch, len, p, l, c);
errorout:
if (((ch == -1)
|| (ch == 47))) {
if ((len == 0)) {
return FAMatch.Create(-2, null, 0, 0, 0);
}
return FAMatch.Create(-1, s.Slice(p, len).ToString(), p, l, c);
}
this.Advance(s, ref ch, ref len, false);
goto errorout;
}
Main thing that makes it tough is the use of UTF-32 codepoints instead of chars - necessary for surrogate handling in a seamless manner. I originally used char literals for the transitions but that was incompatible with VB.NET source generation. Frankly, VB.NET is an utter curmudgeon in terms of what it will accept and I hate creating language agnostic code templates that can target it. I always ALWAYS have to massage them for VB.NET.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Quote: I actually did dotNetPeek them
In my experience, I have found peeking and looking at the actual source code is not always the same thing. The compiler tends to optimise and rewrite code these days...
Graeme
"I fear not the man who has practiced ten thousand kicks one time, but I fear the man that has practiced one kick ten thousand times!" - Bruce Lee
|
|
|
|
|
In that case I wasn't as interested in the source code as I was in the final code. I was looking for optimization opportunities, and I was looking at the IL as well and comparing it to mine.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
I missed the bottom line of your post. I'll respond here.
Well, this 10% is for a library, so it kind of changes the calculus of it somewhat. The 10% has a multiplier effect based on how popular the library gets, if that makes sense.
I don't think the 10% would make the code that much more confusing unless it also forces me to do something to it I am unwilling to anyway. I am not looking for a total remodel at this stage given the stability and performance numbers I have.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Having written .Net enumerators, yes, it can be slow. It can also be a very powerful abstraction when used properly. Linq on the other hand, is a nightmare to write in VB. C# supports the Lambda syntax while VB doesn't. Personally, most Linq uses can easily be rewritten as a for loop which is easier to read and understand. Some require a function call, but then so do all Linq statements.
|
|
|
|
|
Yeah, and it's not like there aren't developers that that have sweated over how to make that particular segment of code as fast as they can. There is a crap ton of flexibility built into IEnumerable that would be difficult to optimize away.
I worked on a project years back where that level of optimization made a big difference, and I still used the convenience of LINQ group by, distinct, orderby, etc. to build up hashtables that I used in the optimized sections of code I wrote. Using LINQ helped in the part it was good at.
|
|
|
|
|
From CP newsletter.
the rust project has a burnout problem[^]
The author is lamenting that some developers on Rust are suffering from burnout. And then describing an environment where new contributors either have no mentor or the mentor leaves.
When I read that I think so what?
Seems like a description of almost everything. Certainly every company I have worked for. Only time that didn't happened was when the company went bankrupt before those working there got tired.
When I look for new libraries I look for robust user communities because I have used libraries before where there were few or even one developer. Props for the continued support but for long term use even popular libraries have suffered from reduced support.
I have even seen that at night clubs and restaurants. First two years after they open you can't get in the door. 2 years later they have shut the door. (Actually I think I know of a promoter/company that specifically relies on that model.)
|
|
|
|
|
Bad naming. Just the other day, I broke through a mental fog by renaming some modules, methods, etc. to better reflect what they were doing. I'm now wheeling and rotating and charging and following when before I was just "moving".
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
What probably doesn't help is that among developers, especially young developers, there is a certain mindset.
Burning midnight oil, working stupid hours, often unpaid, is considered some sort of flex.
I see this in corporate environments as well: the people most likely to burn out are those who do not set boundaries and want to solve everything / feel responsible for everything. The people most likely to last are the ones who like their job but at the end of their shift say: the rest can wait till tomorrow.
|
|
|
|
|
There's yet another article on Slashdot claiming Bing's market share has barely budged despite ChatGPT being added to it, and initially being a big hit, yada-yada.
And yet barely 2 days prior, another article read, "Google Search Really Has Gotten Worse, Researchers Find". I've been seeing those sorts of articles time and again over the last few months. I can't say whether that's true. I try to do as little as possible with Google nowadays.
I've actually been doing my searches through Bing for a few years now, using Edge (since it moved to the Chromium engine), and frankly, it's a rare occurrence where I can't find what I'm looking for within the first few results of a query. Generally if I can't find anything on Bing, Google really doesn't fare quantifiably "better".
And MS already has all the information it wants from me, if not through my search queries and browser telemetry, then through telemetry sent by the OS itself. If Google's search results truly are getting worse (not my claim), I'm not sure why they should still be in my life. Frankly if my data's gonna get collected anyway, I'd rather have one company have it then two. Especially when one of them is purely an advertising company (whereas the other is merely trying to move in that direction, despite failing repeatedly - and the company's very survival doesn't rely on its ad department).
Nobody it seems ever misses a chance to ridicule Edge or Bing. Laugh all you want, I'm actually rather happy with how it's working out for me.
Obviously, YMMV. So what say you? Are Google's results getting as unusable as some claim? If so, why stick with it?
|
|
|
|
|
Hmm, the Quote button doesn't seem to be working. Aanywaayy...
Are Google's results getting as unusable as some claim?
I have no idea; I've never used it.
|
|
|
|
|
Hmm, the Quote button doesn't seem to be working.
I've noticed that...the button is there, clicking on it doesn't do anything except for removing the highlight. So, I've been manually copying and pasting, and then adding a ">" before the pasted line.
The hamsters must be acting up.
|
|
|
|
|
Are Google's results getting as unusable as some claim?
Yes. Very much yes when you are looking for specific information, or trying to recall an article you read a while ago. Their verbatim search used to be the bees knees, or however the saying goes. Even it sucks now. And has for about at least a year.
|
|
|
|
|
If you've never used it why comment? Or is this some new nerd virtue signal?
|
|
|
|
|
I'm a creature of habit. I've been using Google search so long, I forget Bing is there. I'm a Firefox guy for my browser. I find what I'm looking for so I'm not worried. Maybe if you're searching eccentric stuff it matters. But for boring stuff like "Where to buy a 16" husky chainsaw" Google does just fine for me.
Hogan
|
|
|
|
|
I use Vivaldi as my main browser, and Firefox for some other stuff (its Facebook container is its main selling point to me). Yesterday I checked the memory footprint between them. Vivaldi - 106 tabs, 5.5 GB. Firefox: 10 tabs, 4 GB. This is in line with every time I've checked memory usage in the past. For some reason FF just sucks it up a lot more. Maybe the extension are heavy, although I only have three.
|
|
|
|
|
Interesting. I'm 6 tabs deep in FF with only 1.5Gb of memory used. Opened up Facebook and it brought me to 2Gb. So not too big of a deal to me. Maybe I just do boring stuff with my machine.
Hogan
|
|
|
|
|
I don't know. It has been an issue every time I've looked at Firefox, seemingly into the distant past.
|
|
|
|
|
I'm using Chrome. I have six windows open, with a total of about 150 tabs open. Current memory usage is at 572 MB. YMMV
|
|
|
|
|
That is great memory usage. To me, its crazy to have 150 tabs open. I could never work that way. I open and close tabs often. Its my goal to know what is on every tab when using my computer or close them. When I'm researching something, I might go crazy and have 30 tabs open, but I work quickly to either decide the information is useful, or close it.
Hogan
|
|
|
|
|
I agree with that.
Besides keeping it straight in the head, what does 150 tabs even look like in the UI?
I should note I don't use tabs at all. I open a new instance for each.
|
|
|
|
|
I have six windows open for different categories (personal, work, training, etc.) and within each, I use tab groups quite a bit. Most of the open pages are work-related and is the kind of stuff that I might need later today or two weeks from now, but hunting the pages down would be a big pain in the butt (our internal file-sharing and job tracking systems are a bit of a mess...but getting better). I suppose I could switch to creating lots of bookmarks instead, but since Chrome got their act together on memory management, it doesn't seem to be a real issue for me.
|
|
|
|
|
In case you want to look into it Windows allows for multiple desktops. Key press to switch between. So for example personal could be on one and work on another.
|
|
|
|
|