Re: Compress specific text for storage. - C# Discussion Boards

Re: webspider/crawler extract Javascript related info

Pete O'Hanlon26-Aug-10 12:00

26-Aug-10 12:00

Twitter has a well defined API for retrieving information in a variety of formats. Don't waste time trying to screen scrape when you can get the data easily using the RESTful API.

I have CDO, it's OCD with the letters in the right order; just as they ruddy well should be

My blog | My articles | MoXAML PowerToys | Onyx

Compress specific text for storage.

kevinnicol26-Aug-10 8:52

kevinnicol

26-Aug-10 8:52

I'm building a little puzzle game that can have many (Trillions) of puzzles. Each puzzle can be simplified to a grid that is 6 by 6 where each grid can be one of 8 possible values. Right now I have each puzzle stored as a 36 char string (each char can be one of {"L", "R", "T", "B", "E", "P", "H", "V"} does anyone know a cool trick to take this type of string and compress it down so that it takes less space? Even shaving a few bytes off of it would be a huge help. (It'll make searching for it in the database a lot faster)

Re: Compress specific text for storage.

Luc Pattyn26-Aug-10 9:05

Luc Pattyn

26-Aug-10 9:05

This is NOT string or text compression, it is a very dedicated problem.
8 possible values for a cell means a cell only needs 3 bits, hence a 6*6 board could be stored in 108 bits, i.e. 14 bytes (with 4 unused bits).

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

Re: Compress specific text for storage.

kevinnicol26-Aug-10 9:08

kevinnicol

26-Aug-10 9:08

Yeah this is the path I'm taking now. It gets a bit more complicated because while some cells can have all 8 values, some specific cells (Corner cells) can have only 3 of those values, and other rules as well. I guess I'm wondering if anyone has seen a class that can be configured to do this. Or should I build one and submit a CP Article.

Re: Compress specific text for storage.

harold aptroot26-Aug-10 9:25

harold aptroot

26-Aug-10 9:25

If you give me a grid with in each cell the number of values it can take, I can probably give you a more detailed answer. For now I'd say Range Encoding would take care of oddities such as 3 possible values, but maybe tricks can be used to avoid it.

Re: Compress specific text for storage.

Luc Pattyn26-Aug-10 9:29

Luc Pattyn

26-Aug-10 9:29

if corners can only have 4 or fewer values that saves 1 bit each, hence the whole board now fits in 13 bytes. You'll need strong further restrictions on the possibilities to do much better than that.
The simple rule is:

numberOfBitsRequired = CEIL(base-two logarithm of the number of acceptable board states).

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

Re: Compress specific text for storage.

AspDotNetDev26-Aug-10 11:04

AspDotNetDev

26-Aug-10 11:04

Luc Pattyn wrote:
if corners can only have 4 or fewer values that saves 1 bit each

With 4 corners and each corner allowing only 3 possible values, all 4 corners can be stored using 7 bits (because 3^4 = 81 possible combinations, which is less than the 128 combinations 7 bits can store). So, rather than use 2 bits per corner (for a total of 8 bits), only 7 bits are required. It's only a bit, but that might just help to make it down to the next lower byte if the OP can come up with other compression/encoding techniques to save space. Smile | :)

[Forum Guidelines]

Re: Compress specific text for storage.

Luc Pattyn26-Aug-10 11:09

Luc Pattyn

26-Aug-10 11:09

there's way too many ifs in your message. In my earlier reply I had 4 spare bits, so shaving one bit of every corner yielded another byte. Why make things more complex than they need to be, the scheme I offered is the most efficient for the situation as it has been described. And I'm with Einstein, keep things as simple as possible, but no simpler than that. And I'll join the scientists who amend the theory as soon as new facts come in that don't fit the current one.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

Re: Compress specific text for storage.

AspDotNetDev26-Aug-10 11:16

AspDotNetDev

26-Aug-10 11:16

Nothing wrong with that. In fact, having your reply so clear and having my reply which elaborates on it probably will help the OP to understand both approaches more clearly. Just thought I'd add some additional input to this little brainstorm session. Smile | :)

[Forum Guidelines]

Re: Compress specific text for storage.

Luc Pattyn26-Aug-10 11:26

Luc Pattyn

26-Aug-10 11:26

I can only hope he is not studying DCT algorithms and the like. Laugh | :laugh:

I don't think he'll bring it well below 13 bytes, unless there is a lot he hasn't told.
And that is the main reason I offered him Shannon, so he can calculate the optimum himself.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

Re: Compress specific text for storage.

harold aptroot26-Aug-10 11:28

harold aptroot

26-Aug-10 11:28

And if all the cells on the edge can only have 5 different values, you can combine each corner cell with an edge cell, giving 4 * 15 possibilities (6 bits), and then you have 5^12 possibilities left for all other edge cells together which fits in 28 bits, giving 85 bits in total.

Re: Compress specific text for storage.

AspDotNetDev26-Aug-10 11:47

AspDotNetDev

26-Aug-10 11:47

harold aptroot wrote:
And if all the cells on the edge can only have 5 different values

I don't really follow what you said (e.g., what do you mean by "combine each corner cell with an edge cell"?), but keep this in mind:

The OP wrote:
some specific cells (Corner cells) can have only 3 of those values

We already know each corner cell can only use 3 (rather than 8) values. I didn't just pick an arbitrary value, if that's what you are implying (I honestly don't know what you mean).

[Forum Guidelines]

Re: Compress specific text for storage.

harold aptroot26-Aug-10 12:01

harold aptroot

26-Aug-10 12:01

If the corner cells can have 3 values and the middle cells 8, then I see the pattern "they can hold as many values as there are adjacent cells". The OP did not say it, but he's been a bit slow in providing information anyway. It's just a guess, and that's why there is an if near the beginning of my post.

The combining is .. just combining. Combine a corner cell (3 states) with an edge cell (might have 5 states) to get 15 in total, which is very close to a power of 2.

Re: Compress specific text for storage.

Ian Shlasko26-Aug-10 10:02

Ian Shlasko

26-Aug-10 10:02

Luc and harold made good suggestions... There's one other route that will work if you're generating the boards randomly. A lot of randomly-generated games work by not storing the board/scenario itself, but instead storing the seed for the random number generator.

Usually, you get random numbers by seeding with Environment.TickCount, or something similar, but if you use a specific seed (Which could itself be randomly generated), you're guaranteed to get the same results every time, even with a complex algorithm (As long as you don't multi-thread it).

That way, you're only storing one 4-byte integer.

Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)

Re: Compress specific text for storage.

Luc Pattyn26-Aug-10 10:09

Luc Pattyn

26-Aug-10 10:09

An interesting concept, however it mainly makes apparent a 4-byte seed isn't sufficient to generate all conceivable board set-ups: rather than some 2^108 it will produce no more than 2^32 of them, a mere drop in the ocean.

BTW: the OP mentioned trillions, not sure he meant that in the long or short scale of ways (see here[^])

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

Re: Compress specific text for storage.

Ian Shlasko26-Aug-10 11:34

Ian Shlasko

26-Aug-10 11:34

True, but 2^32 ought to be enough for anybody Smile | :)

Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)

Re: Compress specific text for storage.

Luc Pattyn26-Aug-10 11:48

Luc Pattyn

26-Aug-10 11:48

like 640KB? Big Grin | :-D

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

Re: Compress specific text for storage.

AspDotNetDev26-Aug-10 11:51

AspDotNetDev

26-Aug-10 11:51

LOL, you just beat me.

[Forum Guidelines]

Re: Compress specific text for storage.

Ian Shlasko26-Aug-10 11:53

Ian Shlasko

26-Aug-10 11:53

It wasn't a very obscure joke, for a site like this Smile | :)

Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)

Re: Compress specific text for storage.

AspDotNetDev26-Aug-10 12:02

AspDotNetDev

26-Aug-10 12:02

Haha, yeah, I saw it posted the other day. I wondered if perhaps it was you who posted it. Smile | :)

[Forum Guidelines]

Re: Compress specific text for storage.

Ian Shlasko26-Aug-10 12:10

Ian Shlasko

26-Aug-10 12:10

Don't think it was me... But hey, it's an old joke Smile | :)

Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)

Re: Compress specific text for storage.

AspDotNetDev26-Aug-10 11:48

AspDotNetDev

26-Aug-10 11:48

I'd say nobody will ever need more than 640KB of RAM. Roll eyes | :rolleyes:

[Forum Guidelines]

Re: Compress specific text for storage.

Luc Pattyn26-Aug-10 12:13

Luc Pattyn

26-Aug-10 12:13

for 6*6 cells with no more than 8 states each, not really compression, is it?

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

modified on Thursday, August 26, 2010 9:21 PM

Re: Compress specific text for storage.

AspDotNetDev26-Aug-10 10:20

AspDotNetDev

26-Aug-10 10:20

That depends on the subset of puzzles you are storing. Some puzzle combinations will be more compressible than other combinations. For example, a checker board that is setup before the game starts will be very compressible using RLE compression (run-length encoding), as most adjacent squares will have the same value (red, black, empty). Also, if there are common configurations between puzzle sets (e.g., two puzzles have half their board setup the same), then you can use referential compression (i.e., one row would point to another row and that other row would indicate part of the game board's setup and the main row would indicate the rest of the setup). The trick with that form of compression is to not overdo it, because the key you use to point to the other row takes up space too. You can also separate the boards into different tables to take off a few bits. So, you could have Table_1, Table_2, Table_3, ..., Table_256. The table the data resides in represents the first few bits of data. So if you have 256 tables, you can shave off 8 bits (aka, a byte) from the data. But that doesn't make sense for all scenarios. And if you can store several boards together, that might offer further opportunities for compression. For example, if two boards are very similar, you can take the difference between them and compress that (such as with RLE compression) rather than both boards. That way, you only need one of the boards and the uncompressed difference to recreate the other board. That will only work if the distribution of game boards is optimal (i.e., if there are lots of groups of boards where they are very similar). This technique is used in video encoding. A diff is taken between a two frames and that along with one frame is stored rather than storing both frames. You can chain them so you store, say, 99 diffs and one full frame to recreate all 100 frames. The problem with that is that you must process 99 frames until you get to that 100th frame. That wouldn't be a problem if you actually want to do some processing on each frame (or, in your case, each board).

[Forum Guidelines]

Where is saved connection string for DataSet

polzovat7926-Aug-10 5:16

polzovat79

26-Aug-10 5:16

I try to use DataSets in each of 2 projects than grouped in one solution.
I have common app.config file (Visual Studio makes for each project one app.config - and I need make only one manually).

I see that changing app.config file do not make sence to DataSets.
The questions are:
- how to merge two app.config properly?
- probably connection string is saved in another place?

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.