|
I'd forgotten about Fortran - last used it 1975.
Not with punched cards but a paper tape reader on a tele-type terminal.
|
|
|
|
|
The only program I ever used paper tape for was a PDP-8 assembly program on a machine without a
disk; so the "development cycle" consisted of:
- load editor
- load source
- edit
- save source
- load assembler
- load source
etc etc
Things were simple back then, now they are comfortable but complex.
|
|
|
|
|
First of all, I don't claim to have THE solution to your problem.
You've stated there are two possible approaches: 1. sorting the array, 2. keeping a separate bit array. I think the actual solution combines both approaches.
When you're sorting an array, there are two basic operations:
1. comparisons
2. swaps
Let's suppose your algorithm looks like this:
<br />
main loop {<br />
int i = ...;<br />
int j = ...;<br />
if (A[i] > A[j])<br />
swap(A[i],A[j]);<br />
}<br />
However, since you only have to find the repeated element, you might change the algorithm so it looks like this:
<br />
main loop {<br />
int i = ...;<br />
int j = ...;<br />
if (A[i] == A[j])<br />
return A[i];
else if (A[i] > A[j])<br />
swap(A[i],A[j]);<br />
}<br />
Provided the sorting algorithm you use is in-place (for example, heapsort), you only need O(1) additional space.
Even though the worst case is still O(n log(n)) , the average case's complexity HAS TO (read: might... hehe) be lower, since you stop the algorithm once the repeated element was found.
Now, you have to choose the sorting algorithm that gives you the best average case while still being O(n log(n)) in the worst case. And that's way too advanced computer science for me.
Hope that helps,
Eduardo
-----------------------------------------------------------------------
EDIT: Wow. I just read the sum one. Wow. Forget everything I said.
|
|
|
|
|
Computing the sum is an easy approach if there is only one duplicate number (though I prefer to formulate the problem as having a missing number rather than a duplicate). Now for the fun part:
Assume there are 65534 numbers in the range 0 to 65535 without duplicates, so two numbers are missing. Can you write a program to identify both missing numbers on a single pass through the data, that could run on a machine with 64K of code storage and 256 bytes of data storage (the proper solution shouldn't require anywhere near 4K, much less 64K; if your code size isn't ridiculous assume it will fit)?
What if three numbers are missing? How about four?
modified on Thursday, December 18, 2008 7:12 PM
|
|
|
|
|
supercat9 wrote: Can you ...
2 yes
3 provisional no
|
|
|
|
|
3 yes
4 probably
|
|
|
|
|
I wonder what the 'best' approaches are? The amount of state one would have to keep wouldn't be large (256 bytes of state would be enough state to allow for over 100 missing numbers) if one had unlimited time to convert the state into a list. One method would be to use two bytes to explicitly keep track of whether the values 65521-65535 had been observed, and for all other values compute the sums (x mod 65521), (x^2 mod 65521), (x^3 mod 65521) ... (x^n mod 65521). I believe the resulting combination of sums would uniquely identify the missing numbers, though I have no idea how one would determine which numbers were missing except via trial and error.
I have a different approach which would use somewhat more storage (256 counters capable of counting up to the number of missing items) but I don't know how many missing numbers it would be able to support before the count patterns ceased to be unique. The count patterns are easy to resolve when n=2; when n=3, they are more complicated to resolve, but it will always be possible. For n=4, I think that remains true. Not sure about n=5.
What's your approach?
|
|
|
|
|
You could replace 65521 (the largest prime in range) by 65413 (the next prime) with the same result, forgoing the extra counters/flags. Intuitively I agree summing powers mod prime would hold all the information, and extracting it seems not obvious.
However IMO all that is way too complex. Here is mine:
you can accumulate the sum, that's good enough for 1 missing/extra number.
For two numbers, you can accumulate the sum and also the product, like so:
- threat zero separately (with a simple flag);
- use a double ("d0") to multiply all numbers from 1 to 65535
- use a double ("d2") to multiply all numbers present
- rescale both to prevent overflow (say divide both by 65536 each time d0 exceeds 65536)
- finally divide d0 by d2 (or d2 by d0) to get the product.
- even though d0 and d2 are both inaccurate (they lost most of the digits), their quotient
accurately represents the product of the missing (or extra) numbers.
With a 32-bit mantissa you can discriminate all possible products that have the same sum, the
extreme case is 65532*65535 against 65533*65534.
Having sum and product, a simple equation generates the two numbers.
For three numbers (say x y z):
- accumulate the sum x+y+z
- accumulate the product xyz
- accumulate the product (x+1)(y+1)(z+1)
- the products need 48-bit mantissa, a double still is OK
- a simple substitution yields xy+yz+zx
From these three (xyz, , xy+yz+zx, z+y+z) you can find x, y and z either analytically,
or by trial-and-error: just try all possible values of z (1..65535), and solve what remains
for x and y. Most z values will have no solution, you end up finding six permutations of the
one real solution.
For four numbers, you need a 64-bit mantissa, which a double does not offer; so you need a higher-
precision math package. Then accumulate products and two shifted products, in the end leading to
a fourth-order set of equations for which I haven't figured out yet if there is an acceptable
way to solve it.
|
|
|
|
|
Your approach for n=3 seems to devolve into trial and error; the same sort of trial and error would allow for a solution using the (mod 65521) maths. Using a smaller modulus would require having more flags to handle the numbers between the modulus and 65535.
A nice (but sub-optimal) bit-based approach for handling the two-number case would be to keep sixteen words, holding the xor of all the items in the list with bit 0 set, all the items with bit 1 set, etc. At least one of the numbers must have at least one bit set which is clear in the other number, and so that number may be read out easily.
For the case of n=3, a similar approach could work except that instead of doing an xor, one would have to hold a count of how many numbers had the various bits set. If there is exactly one number missing which had a particular bit set, the number may be easily identified. With three numbers, though, it's possible that a number of bits will be present in two missing numbers and none in exactly one. This situation may be resolved, however, by nothing that there must be exactly three missing numbers, and so if there are three different patterns of missing bits they must derive from the missing numbers. I don't quite know how to describe the analysis algorithmically, but it works out.
I think a similar approach will work with n=4, though it starts to get messier. I have no idea about n=5 and beyond.
|
|
|
|
|
supercat9 wrote: Your approach for n=3 seems to devolve into trial and error; the same sort of trial and error would allow for a solution using the (mod 65521) maths. Using a smaller modulus would require having more flags to handle the numbers between the modulus and 65535.
I tend to disagree. First of all there is an analytical solution for one root of a cubic equation;
second, when a trial loop over z is executed, what is left is a simple quadratic equation, so each z-value needs an IsSquare(discriminant) test, which is much more straightforward than your modulo-prime arithmetic.
I did look into xoring the numbers for n=2 earlier, but decided against it; the product being very
easy to handle. I am not sure I see how xoring helps you for n=2 or higher. For n=2 if the two
missing numbers are identical except for 1 bit, then its easy, otherwise...
Just try numbers 0-7 with missing 2 and 3 (easy), 2 and 5 (now what?).
|
|
|
|
|
For the n=2 case, if there were 16 xors produced (xor of all numbers with bit 0 set, with bit 1 set, etc.) then if the missing numbers were 2 and 5, then xor[0] and xor[2] would both be 5, while xor[1] would be 2. Easy.
If the numbers shared a bit but each had at least one bit the other lacked (e.g. the values were 3 and 5) then the xor's would have three non-zero values: xor[0] would be 6 (3 xor 5), xor[1] would be 3, and xor[2] would be 5. Simplest thing to do in that scenario is to ignore the non-zero xor value which does not have the 'indicated' bit set (i.e. xor[0] in this case, since the value 6 has bit 0 clear). The other two non-zero xor values are the missing numbers.
If one number is a subset of the other (e.g. 2 and 3), then one of the non-zero xors will have its corresponding bit clear (e.g. xor[1] will equal 1, which has bit 1 clear) and the other will have it set. The one that has the bit set is one of the missing numbers; xor that with the other one to get the other missing number.
For n=3, one will have a count of bit values, mod 4. Bits that are missing from none of the numbers or from all three will be trivial to place accurately. Bits that are missing from exactly one of the numbers will be easy to place; if any such bits exist, the problem will then decompose to n=2. The only hard case is where three different bits are each used by exactly two of the missing numbers (e.g. if the missing numbers are 3, 5, and 6, then tally[0][2..0] = (1,1,2); tally[1][2..0] = (1,2,1); tally[2][2..0] = (2,1,1). The tally values make it possible to determine which bits are isomorphic; that in turn will make it possible to determine which numbers are missing.
|
|
|
|
|
Another approach:
run 17 sums: one simple sum (S) as before and then S0 is the sum of all numbers with bit 0 set, S1 is the sum of all with bit 1 set .. etc.
Now we can precompute the expected answers for S0 .. S15, call these E0 .. E15.
If there are two numbers missing, then we get a+b from E-S, and then by looking at En-Sn, these can take the values {0,a,b,a+b} and as a and b must differ in at least one place, a or b must appear. So we can always solve the 2 case.
It is possible to do 3 and 4 in most cases, and I suspect you can prove you can do 3 in all cases but someone may prove otherwise.
Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."
|
|
|
|
|
Very interesting indeed.
With numbers a, b and c missing, the difference between the 17 Ei and Si values could take the values
{0, a, b, c, a+b, a+c, b+c, a+b+c} which may or may not be 8 different numbers (a could equal b+c). Some of these should appear in the Ei-Si values; how many different numbers are we expecting? Not sure yet.
If a=arbitrary, b=arbitrary and c=0, then c would not help in providing different numbers, so
we would find at most {0, a, b, a+b}; we are sure to find a+b in E-S
If a=arbitrary, b=65535-a and c=0, then we would find at most {0, a, 65535-a, 65535}
Another point of view: if it were to always provide the unique solution for n=3, and n=4, and...
at what value of n would it start to fail, and why? Is 16 a relevant number, i.e. does widening
the numbers enable the algorithm to find more missing numbers? I guess not, since all the missing
nunbers COULD be in the range [0, 65535] however large the word width, so adding bits does not
strengthen the algorithm.
|
|
|
|
|
Just adding together numbers is apt to cause some confusion as bits carry from one place to the next. For n=3 or n=4 I would think it would be best to do something like:
for (i=0; i<16; i++)
{
if (x & (2 << i))
{
count1[i] ^= (count0[i] & x);
count0[i] ^= x;
}
}
That will indicate whether a bit is set in 0, 1, 2, 3 (mod 4) of the missing numbers that have a particular bit set. The case where a bit is missing in all four numbers can be resolved by noting that there must be some other bit which does not appear in all four numbers, so the count0/1[] values for that bit will show that it's missing in at least some numbers.
|
|
|
|
|
Modify the process slightly to use one of supercat's ideas, as well as having 17 sums, also have 17 counts, so that you know how many are in each of the Sn sums. As well, this may not be necessary, but I'm lazy, so set a flag to detect 0.
For n=3, if 0 is missing treat as n=2, done. Otherwise if any of the Sn is only missing one number, then you get one missing number directly, simply add it back in then you can process as for 2. If any of the Sn is missing two numbers then as E-S gives you a+b+c then you can work out one number, add it back in and process for 2. Done. As the numbers are not all the same, all the sums can't be down by 0 or 3.
For n=4, if 0 is missing treat as n=3, done. Otherwise if any of the Sn is only missing one number, then you get one missing number directly, simply add it back in then you can process as for 3. Done. Otherwise if any of the Sn is missing three numbers, then you get one missing number from a+b+c+d -(a+b+c), so process as if one found. Done. Otherwise all are missing either 0, 2 or 4. Can anyone with more time than me show that under these conditions you can get three different pairs? My guess is that it is likely.
[edit] I guessed wrongly - {1,3,5,7} breaks it. [/edit]
Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."
modified on Sunday, December 21, 2008 9:42 PM
|
|
|
|
|
1,3,5,7 would yield a total of 16 missing, a bit 0 subtotal of 16 (1+3+5+7), a bit 1 subtotal of 10 (3+7), and a bit 2 subtotal of 12 (5+7). Even the total and bit 0 subtotal suffice to show that the numbers are 1, 3, 5, and 7; since the bit 0 subtotal matches the total, all four numbers have to be odd. The smallest 4 odd numbers are 1, 3, 5, and 7; if any other numbers were chosen the total would exceed 16.
A breaking set of numbers would be {1,6,10,13) or (2,5,9,14). The carries from each bit position to the next serve to confuse things; using modulo arithmetic on each bit avoids that complication.
|
|
|
|
|
Storing the list of "already found" numbers isn't a huge issue if you say encode them as runs. However the degenerate cases in this problem are quite nasty indeed. It wouldnt surprise me if this is actually equivalent to a restricted case of the subset-sum problem.
|
|
|
|
|
Mark Churchill wrote: Storing the list of "already found" numbers isn't a huge issue if you say encode them as runs.
After having read 32768 numbers from a set of 65536 possible values, there will be more than 2^65,527 possible states. No sort of run-length coding will get the worst-case (or even average-case) storage requirement below 65,527 bits (8191 bytes). The only way to make the problem workable is to rely upon the fact that the source of numbers also has considerable state information (no number will be read more than once). If there are four missing numbers, it's only necessary for the reader to distinguish (65536*65535*65534*65533)/4! different states at any time while reading the file (not counting the loop index).
|
|
|
|
|
Calculate the index sum and the value sum for for the first index.
Compute the difference between the sums - if its zero then the first number is not missing.
Increment the index and recalculate the index sum and the value sum.
Again compute the difference between the sums and also the 2nd difference - if this value is zero then the first missing number has not been found, if the second difference is 1 then the current index is the first missing number.
Continue until the second difference is 2 then the index at this point is the second missing number. etc.
Only 3 variables required (index, index sum, value sum) differences are temporary storage and it works for any number of missing values in a single pass!!
Apathy Rules - I suppose...
Its not the things you fear that come to get you but all the things that you don't expect
|
|
|
|
|
There are 100 unrepeated numbers, the sum of which will be (1 + 2 + 3 + ... + 100). The summation equivalent to factorial - not sure if there is a proper mathematical name for it. If you then add up all the numbers in your array, the difference between these two answers must be the "rogue" number. I've not tried it but I think that's it.
Edit: Ah! ... just realised there were loads of other answers on the next page ... so this is history not news
|
|
|
|
|
Papadimitriou and Steiglitz wrote the following [see citation, section 15.2]:
"A combinatorial optimization problem is thus the following form of computational problem.
"Given representations of the parameters... find the optimal feasible solution.
"We call this the optimization version of the problem. However, a combinatorial optimization problem can also be posed in the following, more relaxed, form.
"Given [the parameters]... find the cost of the optimal solution.
"This will be referred to as the evaluation version of a combinatorial optimization problem. It is immediate that, under the assumption that... [the cost] is not too hard to compute--the evaluation version of a combinatorial optimization problem cannot be much harder that the optimization version."
I omitted some notation and terms, the definitions of which were earlier in the chapter. As I understand it, they are saying that these two versions are pretty much the same, though the evaluation version may be a bit harder. I don't see how that could be. The evaluation version only seeks the cost of the optimal solution, whereas the optimization version seeks the optimal solution itself. In order to find that optimal solution, certainly you must need to compute the cost of it. Am I missing something?
Thanks,
Alex
Citation: Papadimitriou & Steiglitz, Combinatorial Optimization, Dover, 1982.
|
|
|
|
|
There may be a shortcut that lets you calculate the cost of an optimal solution (or at least prove some bounds on it) without actually calculating it.
This is similar to an existence proof in mathematics, where it's shown that a solution exists without actually finding it.
|
|
|
|
|
Hi,
I need to find some way to get the LCS for some special strings.
Imagine we have two CSV files (tables where , column delimiter)
1)
a b c,d e f
One row with 2 columns
2)
a,d
b,e
c,f
Three rows with 2 columns
When i'm trying to get the regular LCS i'm missing a lot of data:
a b c,d e f
a, d b,e c,f
LCS:
a b c f
But it must be totally equal, because three rows can be fitted in one.
Can someone suggest some algorithm, or some preprocessing method?
The main pain that there can be adds and deletes, so i need to find some longest path.
Thank you.
|
|
|
|
|
follow this algorithm ; What is there bigO
int sum = 0;
for(int i = 0 ; i < n ; i++){
for(int j = 0 ; j < n*n; j++){
sum += j;
}
}
Thank you for Answer
|
|
|
|
|
Hi,
you really should be able to figure it out yourself.
Just imagine you were to execute the code for some value of n and estimate the
number of important operations; and then for a value twice as large.
The ratio between those two "measures of complexity" should be visible in the big O.
|
|
|
|
|