|
Twitter has a well defined API for retrieving information in a variety of formats. Don't waste time trying to screen scrape when you can get the data easily using the RESTful API.
|
|
|
|
|
I'm building a little puzzle game that can have many (Trillions) of puzzles. Each puzzle can be simplified to a grid that is 6 by 6 where each grid can be one of 8 possible values. Right now I have each puzzle stored as a 36 char string (each char can be one of {"L", "R", "T", "B", "E", "P", "H", "V"} does anyone know a cool trick to take this type of string and compress it down so that it takes less space? Even shaving a few bytes off of it would be a huge help. (It'll make searching for it in the database a lot faster)
|
|
|
|
|
This is NOT string or text compression, it is a very dedicated problem.
8 possible values for a cell means a cell only needs 3 bits, hence a 6*6 board could be stored in 108 bits, i.e. 14 bytes (with 4 unused bits).
|
|
|
|
|
Yeah this is the path I'm taking now. It gets a bit more complicated because while some cells can have all 8 values, some specific cells (Corner cells) can have only 3 of those values, and other rules as well. I guess I'm wondering if anyone has seen a class that can be configured to do this. Or should I build one and submit a CP Article.
|
|
|
|
|
If you give me a grid with in each cell the number of values it can take, I can probably give you a more detailed answer. For now I'd say Range Encoding would take care of oddities such as 3 possible values, but maybe tricks can be used to avoid it.
|
|
|
|
|
if corners can only have 4 or fewer values that saves 1 bit each, hence the whole board now fits in 13 bytes. You'll need strong further restrictions on the possibilities to do much better than that.
The simple rule is:
numberOfBitsRequired = CEIL(base-two logarithm of the number of acceptable board states).
|
|
|
|
|
Luc Pattyn wrote: if corners can only have 4 or fewer values that saves 1 bit each
With 4 corners and each corner allowing only 3 possible values, all 4 corners can be stored using 7 bits (because 3^4 = 81 possible combinations, which is less than the 128 combinations 7 bits can store). So, rather than use 2 bits per corner (for a total of 8 bits), only 7 bits are required. It's only a bit, but that might just help to make it down to the next lower byte if the OP can come up with other compression/encoding techniques to save space.
|
|
|
|
|
there's way too many ifs in your message. In my earlier reply I had 4 spare bits, so shaving one bit of every corner yielded another byte. Why make things more complex than they need to be, the scheme I offered is the most efficient for the situation as it has been described. And I'm with Einstein, keep things as simple as possible, but no simpler than that. And I'll join the scientists who amend the theory as soon as new facts come in that don't fit the current one.
|
|
|
|
|
Nothing wrong with that. In fact, having your reply so clear and having my reply which elaborates on it probably will help the OP to understand both approaches more clearly. Just thought I'd add some additional input to this little brainstorm session.
|
|
|
|
|
I can only hope he is not studying DCT algorithms and the like.
I don't think he'll bring it well below 13 bytes, unless there is a lot he hasn't told.
And that is the main reason I offered him Shannon, so he can calculate the optimum himself.
|
|
|
|
|
And if all the cells on the edge can only have 5 different values, you can combine each corner cell with an edge cell, giving 4 * 15 possibilities (6 bits), and then you have 5^12 possibilities left for all other edge cells together which fits in 28 bits, giving 85 bits in total.
|
|
|
|
|
harold aptroot wrote: And if all the cells on the edge can only have 5 different values
I don't really follow what you said (e.g., what do you mean by "combine each corner cell with an edge cell"?), but keep this in mind:
some specific cells (Corner cells) can have only 3 of those values
We already know each corner cell can only use 3 (rather than 8) values. I didn't just pick an arbitrary value, if that's what you are implying (I honestly don't know what you mean).
|
|
|
|
|
If the corner cells can have 3 values and the middle cells 8, then I see the pattern "they can hold as many values as there are adjacent cells". The OP did not say it, but he's been a bit slow in providing information anyway. It's just a guess, and that's why there is an if near the beginning of my post.
The combining is .. just combining. Combine a corner cell (3 states) with an edge cell (might have 5 states) to get 15 in total, which is very close to a power of 2.
|
|
|
|
|
Luc and harold made good suggestions... There's one other route that will work if you're generating the boards randomly. A lot of randomly-generated games work by not storing the board/scenario itself, but instead storing the seed for the random number generator.
Usually, you get random numbers by seeding with Environment.TickCount, or something similar, but if you use a specific seed (Which could itself be randomly generated), you're guaranteed to get the same results every time, even with a complex algorithm (As long as you don't multi-thread it).
That way, you're only storing one 4-byte integer.
|
|
|
|
|
An interesting concept, however it mainly makes apparent a 4-byte seed isn't sufficient to generate all conceivable board set-ups: rather than some 2^108 it will produce no more than 2^32 of them, a mere drop in the ocean.
BTW: the OP mentioned trillions, not sure he meant that in the long or short scale of ways (see here[^])
|
|
|
|
|
True, but 2^32 ought to be enough for anybody
|
|
|
|
|
like 640KB?
|
|
|
|
|
|
It wasn't a very obscure joke, for a site like this
|
|
|
|
|
Haha, yeah, I saw it posted the other day. I wondered if perhaps it was you who posted it.
|
|
|
|
|
Don't think it was me... But hey, it's an old joke
|
|
|
|
|
I'd say nobody will ever need more than 640KB of RAM.
|
|
|
|
|
for 6*6 cells with no more than 8 states each, not really compression, is it?
|
|
|
|
|
That depends on the subset of puzzles you are storing. Some puzzle combinations will be more compressible than other combinations. For example, a checker board that is setup before the game starts will be very compressible using RLE compression (run-length encoding), as most adjacent squares will have the same value (red, black, empty). Also, if there are common configurations between puzzle sets (e.g., two puzzles have half their board setup the same), then you can use referential compression (i.e., one row would point to another row and that other row would indicate part of the game board's setup and the main row would indicate the rest of the setup). The trick with that form of compression is to not overdo it, because the key you use to point to the other row takes up space too. You can also separate the boards into different tables to take off a few bits. So, you could have Table_1, Table_2, Table_3, ..., Table_256. The table the data resides in represents the first few bits of data. So if you have 256 tables, you can shave off 8 bits (aka, a byte) from the data. But that doesn't make sense for all scenarios. And if you can store several boards together, that might offer further opportunities for compression. For example, if two boards are very similar, you can take the difference between them and compress that (such as with RLE compression) rather than both boards. That way, you only need one of the boards and the uncompressed difference to recreate the other board. That will only work if the distribution of game boards is optimal (i.e., if there are lots of groups of boards where they are very similar). This technique is used in video encoding. A diff is taken between a two frames and that along with one frame is stored rather than storing both frames. You can chain them so you store, say, 99 diffs and one full frame to recreate all 100 frames. The problem with that is that you must process 99 frames until you get to that 100th frame. That wouldn't be a problem if you actually want to do some processing on each frame (or, in your case, each board).
|
|
|
|
|
I try to use DataSets in each of 2 projects than grouped in one solution.
I have common app.config file (Visual Studio makes for each project one app.config - and I need make only one manually).
I see that changing app.config file do not make sence to DataSets.
The questions are:
- how to merge two app.config properly?
- probably connection string is saved in another place?
|
|
|
|