|
I thought about doing something like that... i.e. breaking it up into 36 lists. So you would eliminate the first char and have 36 lists of 36^6 lists. Turns out it would use the same amount of memory or more , but you wouldn't have to have the entire list loaded at once.
|
|
|
|
|
I don't think so. For starters I'm not storing all these symbols over and over. Second, it is the last symbol that gets collected in a bitmask, assuming they are handed out pretty much sequentially. Third, I'm not using linked lists nor real pointers, instead I suggested using arrays and indexes.
And yes, whatever approach, it should take advantage of the set being filled sparsely (I don't think there are 78B license plates around).
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Yeah, they said don't assume license plates are handed out sequentially... Otherwise you would just do a return lastLicensePlate++ type thing .
I was thinking you were going for something like:
list[0] = list (of 6 char words) for plates starting with 0
list[1] = list (of 6 char words) for plates starting with 1
etc and have 36 of those. If you had 36 lists of 36^6 words, thats = just having one list of 36^7 words.
I'm not getting your idea though... maybe you can explain a bit further?
Yeah, this wasn't a "real world scenario"... they were just testing to see if I can deal with huge data sets and large amounts of data.
|
|
|
|
|
SledgeHammer01 wrote: I'm not getting your idea
which one? I offered 3 ideas.
Here is the gist of the third and simplest one:
private static Dictionary<string, long) plates;
private static string alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890";
public static bool Exists(string plate) {
string lead=plate.SubString(0,6);
if (!plates.ContainsKey(lead)) return false;
long bitmask=plates[lead];
int index=alphabet.IndexOf(plate[6]);
return (bitmask & (1L<<index))!=0;
}
You might notice it requires just one 6-char string and one long to hold information about 36 consecutive plates.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
So if we have 78B #'s, that'd be 78B / 36 = 2B dictionary entries? If each entry had no other data except the 6 byte string and the 4 byte long, thats 10 bytes = 20GB . Unless my math is wrong I mean (which is entirely possible). Vast improvement over 1000GB, but you still fail the interview because my original packed bit array was only 9GB . I'm thinking there's probably a very compact solution out there using some obscure data structure. Thats what I'm trying to find. So far, the leads are kind of pointing towards a DAWG.
|
|
|
|
|
If the set is fully populated, then it simply deserves one bit per plate, obviously; and then it doesn't need any intelligence. It is when the set is sparse that some intelligence can improve the situation. You don't want 78B license plates, do you? (that would be 10 per human being) So don't judge the quality, efficiency, or whatever characteristic for a sparse solution by feeding it the numbers of a full set.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Oh right... I missed a minor part of your solution... you are only storing the prefixes in use . Just places like Google, when they interview you, the want solution that work at all levels .
|
|
|
|
|
Again, nothing to do with license plates, just how you handle vast amounts of data (think Google or something along that scale).
With your solution, you are using the first 6 chars as the "prefix" / key and then "compressing" the last char... (36 plates into one entry).
So wouldn't you need the FULLY populated dictionary if say, every 36th plate was taken .
So your storage requirement is 20GB between 2B plates and 78B plates . 2B in 20GB vs. 78B in 9GB with the bit array.
Sorry, hope I'm not getting you angry or anything , just this is how the interview went, so its a real world thing. They kept poking holes in everything haha...
But unless I misunderstood your algorithm, it does seem like you need the full 20GB as soon as you hit 2B plates (assuming 1 in every range of course)... if you had them in tightly packed groups, then your requirements wouldn't be as great.
I even told the interviewer once I started getting annoyed at his hole poking "Well, at that point I would probably change the license plate # selector algorithm to hand out #'s from a more tightly packed range". Lol.
|
|
|
|
|
Again, I offered 3 ideas. Your current concerns get handled by the first one. I gave a detailed implementation for the third and simplest approach, I am not going to do the others as well.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
If the data is going to be the worst case for whatever datastructure you pick, I pick the packed bit array. Information theory gets in the way otherwise. Any datastructure that can be smaller than the packed bit array can only do so because it's using some sort of regularity in the data - if there isn't one then there are 2^(36^7) possible states requiring 36^7 bits to store and that's the end of it.
|
|
|
|
|
right.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
Yeah, I dunno what structure he was expecting. My guess is he was just trying to get me to punch him in his face or something. Who knows? The packed bit array is best in the worst case obviously, but it's not good in the best case or an even remotely "average" case. 1 license plate should not take up 9GB of memory . I think he wanted something along the lines of a DAWG. If Office can store the entire English dictionary in 4MB, then surely this problem can be solved in less space .
|
|
|
|
|
Was googling how spell checkers and dictionaries store words. Seems like most use a DAWG.
http://en.wikipedia.org/wiki/Directed_acyclic_word_graph[^]
I guess if I stored each license plate in the DAWG and did a "spell check" on it... it would work, although I'm not sure how well that'll scale.
Says a DAWG is most space efficient, so maybe thats the answer.
EDIT: saw a sample on the net where the guy said a dictionary was 17MB and the DAWG version was only 4MB. He didn't say how many words, etc.
|
|
|
|
|
|
Thanks for the suggestion.
Does it work in such a way that Chris understands when and how texts gets pasted in forum messages, resulting in CP article links being turned into article titles, other links being linkified, and pasted code being formatted with PRE tags?
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
my previous reply was using Dragon and CP auto-magically linked the URLs and created tags.
|
|
|
|
|
Thanks. I'll give it a spin.
Luc Pattyn [My Articles] Nil Volentibus Arduum
Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.
|
|
|
|
|
|
|
|
SledgeHammer01 wrote: I had one large company disqualify me because they assumed I couldn't write
socket code because I didn't memorize the 7 layer OSI model
lol.
I had one interviewer who got visibly upset with me after I said I was capable of writing database code but I couldn't explain the 5 rules of normalization.
Might note that I still can't. But I do know that 2 of them are absolutely worthless for practical programming.
|
|
|
|
|
jschell wrote: I had one interviewer who got visibly upset with me after I said I was capable of writing database code but I couldn't explain the 5 rules of normalization.
SEVEN!
There are seven levels of normalization as defined, with 6NF being the top. None of the companies that I worked for went beyond BNF. Now, would that moron be able to explain why he'd need to normalize to the fifth level, or was it merely random?
Bastard Programmer from Hell
|
|
|
|
|
You know, I couldn't name a single one of these rules, but I went and looked up the poster. OMG!!! Instead of having duplicate data break it out into a separate table and create a FK index into it!! OMG... you are so high brow!! (not directed at you lol, but the people who care about knowing the 'book term' I mean) . Ok, I've been designing my tables like this for 16 yrs, but seriously had ZERO clue that this is what it was called. Ok... guess I suck at SQL now too .
Many years ago, I went to an interview at AutoByTel when I was first starting out in C# and the guy asked me if I knew what boxing & unboxing was. I had no clue. I went home and looked it up and was like OMG!!!! its passing around something as an object!!! OMG!!! I don't know C# at all. I have never done anything so complex!!! That guy must have 7 phd's in C#!!! He is brilliant!!
|
|
|
|
|
I once got ask why a variable would be scoped with friend (back in my VB days) I had no idea, 3 hours later I certainly did but the opportunity was long gone!
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
SledgeHammer01 wrote: Pretty much Think most "real" companies ask these retarded doomsday questions now.
That's a good thing; it's not like there's only one job available, and I almost always walk away smiling
SledgeHammer01 wrote: I had one large company disqualify me because they assumed I couldn't write socket code because I didn't memorize the 7 layer OSI model
You need a thief to catch a thief, and a programmer to identify a software-developer. If they're looking for someone who can memorize well, they are obviously not in the position to pick a decent developer.
Sounds like a bureaucracy, and you can't put a worker between people who merely shove with paper and responsibilities.
Bastard Programmer from Hell
|
|
|
|