|
/ write_item(X) by T
write_item(X)
{
while(true)
{
if( write_TS(X) > TS(T) || read_TS(X) > TS(T) )
{
abort(T);
}
else if( write_TS(X) < TS(T) )
{
wait until the transaction T', which TS(T) > TS(T'), has
either committed or aborted;
continue;
}
else
{
write_item(X);
write_TS(X) = TS(T);
break;
}
}
}
Hmmm, on the indicated line above, shouldn't that be AND and not OR? Also, you might want to consider what would happen if the timestamps were equal.
Also, just to be clear, to avoid cascading rollbacks, you should only be reading committed values. So, given all that, shouldn't the pseudo-code look more like this:
if t >= wmax(X)
rmax(X) <- max(t, rmax(X))
wait for a committed value of X
perform read
else
abort
if t >= rmax(X) and t >= wmax(X)
wmax(X) <- t
wait until X value is a committed value and
pending reads are done
perform write
else
if t < rmax(X)
abort
else
do nothing
|
|
|
|
|
I have studied your answer carefully and your remarks are quite
insightful. I will incorporate them while implementing the algorithm.
Your willingness to answer my question is greatly appreciated.
Thanks!
|
|
|
|
|
|
I recently had a phone interview for a C++ development position. I was given the following problem. You have an array of 101 elements. In the array are the values 1 through 100. There is exactly one value that appears twice. Find the value that appears twice. My first solution was to sort the array and then find the missing value. The second solution was to use a boolean array to keep track of the values already found. He claims that there is a solution required only a minimal amount of additional memory and is of lower order than sorting the array.
We both agreed that if the numbers went from 1 to 1,000,000,000 and the array had 1,000,000,001 numbers in them then my solution first solution would take a long time and my second solution required a lot of additional space. The interviewer claimed that there is a more efficient solution.
However, I cannot find one. I am hoping that somebody out there could tell me what it is.
Thanks
Bob
|
|
|
|
|
You should try and take advantage of all available information...
|
|
|
|
|
Create a 100 bit array, set a bit by the value if that bit is not already set, otherwise report the duplicate number.
Dave.
|
|
|
|
|
Sorry to double post. the last reply seemed to hang. I also edited the response
reate a 100 bit array, set a bit by each value if that bit is not already set, otherwise report the duplicate number.
Dave.
|
|
|
|
|
Dave,
Thanks for the response. Your solution is basically the same as my solution except that you
use an array of bits where I use an array of booleans. For n numbers, we both need an additional array of n elements. The person interviewing me claims that it can be done with O(1) additional space and time faster than O( n log(n) ). I still do not see how.
Bob
|
|
|
|
|
I suggest you have a better look at my earlier reply...
|
|
|
|
|
Bob,
I dug back 35 years and came up with the answer. Since the numbers are consecutive, add 1 to 100 into a singe location, then clear another location and add all the numbers in the array. That number will exceed the calculated result by the value of the duplicate number.
This was my solution to a problem in the 60's when we booted the mainframe with a deck of binary cards. The decks became huge (more than 1/2 of a tray of cards) and operators would drop the deck and spend hours trying to insure the deck was back in order(smart operations people always kept a spare deck, just in case). The solution was to initialize a location with 0 and then for each card read, increment the location. The location was then accumulated into a sum, and another location was summed with the actual card sequence number. If the two sums came out the same,then each card had been read once and once only. Appropriate measures had to be taken if a card mis-read caused a backspace and reread (dont consider the card read until you got a good status). The only downsize was that we had to insure that overlay loading was not attempted (could not use the ORG directive to backup into a prior defined table etc then restored). You could never be sure of when in the sequence of loading, any particular location was loaded.
Quite soon after that we went to booting from a tape. Note, we saved an autoload file that was the Startup image, and there was a one card boot that could read in that image instead of the whole boot deck, but when necessary, the deck had to be used - at least until the tape boot became implemented.
Dave.
|
|
|
|
|
Right, summing all the numbers is the cheapest and fastest way to identify the duplicate number.
The sum trick only works because you know the range of valid numbers; if the problem was: we have 100 different numbers and 1 got duplicated, then you would need a bitmap. Hence, use all available information. BTW: In order to sum the numbers 1..N you don't need an accumulator, the sum equals N*(N+1)/2 = 5050.
As for the cards, the punch cards I used a long time ago used to have 80 columns; columns 73-80 were not relevant to the program; they typically got used for numbering the cards. And sorting machines were available, so dropping a tray only did cost a few passes through the sorting machine to get everything back in order. IIRC they had around 100 output bins, so for decimal digits, they could handle two digits at a time.
|
|
|
|
|
Luc,
As "Murphy" predicted, the tray usually got dropped in the middle of the night when the Tab room (with the sorter) was locked up at night, so the routine had been to just put them back in some order, and correct the order when an out-of order card was read. The summing solution did the trick, at least until tape boot was perfected.
Dave.
|
|
|
|
|
This sounds like a COBOL deck - it wasn't true of other decks.
I once did some AlgolW stuff on a IBM 360 where all 80 columns could contain code.
But I still think your first reply has it spot on - do a sum and use all available information.
Regards
David Rice
|
|
|
|
|
So you are the first who really read my replies in this thread...
Actually I used punch cards only for Fortran (Fortran IV, Watfor, Watfive) programs;
they had an optional C in column 1 for comment cards; an optional statement number in 2-5,
an optional continuation character in col 6, the actual statement(s) in col 7-72, and
free comment in 73-80, normally used for sequential numbering the cards.
|
|
|
|
|
I'd forgotten about Fortran - last used it 1975.
Not with punched cards but a paper tape reader on a tele-type terminal.
|
|
|
|
|
The only program I ever used paper tape for was a PDP-8 assembly program on a machine without a
disk; so the "development cycle" consisted of:
- load editor
- load source
- edit
- save source
- load assembler
- load source
etc etc
Things were simple back then, now they are comfortable but complex.
|
|
|
|
|
First of all, I don't claim to have THE solution to your problem.
You've stated there are two possible approaches: 1. sorting the array, 2. keeping a separate bit array. I think the actual solution combines both approaches.
When you're sorting an array, there are two basic operations:
1. comparisons
2. swaps
Let's suppose your algorithm looks like this:
<br />
main loop {<br />
int i = ...;<br />
int j = ...;<br />
if (A[i] > A[j])<br />
swap(A[i],A[j]);<br />
}<br />
However, since you only have to find the repeated element, you might change the algorithm so it looks like this:
<br />
main loop {<br />
int i = ...;<br />
int j = ...;<br />
if (A[i] == A[j])<br />
return A[i];
else if (A[i] > A[j])<br />
swap(A[i],A[j]);<br />
}<br />
Provided the sorting algorithm you use is in-place (for example, heapsort), you only need O(1) additional space.
Even though the worst case is still O(n log(n)) , the average case's complexity HAS TO (read: might... hehe) be lower, since you stop the algorithm once the repeated element was found.
Now, you have to choose the sorting algorithm that gives you the best average case while still being O(n log(n)) in the worst case. And that's way too advanced computer science for me.
Hope that helps,
Eduardo
-----------------------------------------------------------------------
EDIT: Wow. I just read the sum one. Wow. Forget everything I said.
|
|
|
|
|
Computing the sum is an easy approach if there is only one duplicate number (though I prefer to formulate the problem as having a missing number rather than a duplicate). Now for the fun part:
Assume there are 65534 numbers in the range 0 to 65535 without duplicates, so two numbers are missing. Can you write a program to identify both missing numbers on a single pass through the data, that could run on a machine with 64K of code storage and 256 bytes of data storage (the proper solution shouldn't require anywhere near 4K, much less 64K; if your code size isn't ridiculous assume it will fit)?
What if three numbers are missing? How about four?
modified on Thursday, December 18, 2008 7:12 PM
|
|
|
|
|
supercat9 wrote: Can you ...
2 yes
3 provisional no
|
|
|
|
|
3 yes
4 probably
|
|
|
|
|
I wonder what the 'best' approaches are? The amount of state one would have to keep wouldn't be large (256 bytes of state would be enough state to allow for over 100 missing numbers) if one had unlimited time to convert the state into a list. One method would be to use two bytes to explicitly keep track of whether the values 65521-65535 had been observed, and for all other values compute the sums (x mod 65521), (x^2 mod 65521), (x^3 mod 65521) ... (x^n mod 65521). I believe the resulting combination of sums would uniquely identify the missing numbers, though I have no idea how one would determine which numbers were missing except via trial and error.
I have a different approach which would use somewhat more storage (256 counters capable of counting up to the number of missing items) but I don't know how many missing numbers it would be able to support before the count patterns ceased to be unique. The count patterns are easy to resolve when n=2; when n=3, they are more complicated to resolve, but it will always be possible. For n=4, I think that remains true. Not sure about n=5.
What's your approach?
|
|
|
|
|
You could replace 65521 (the largest prime in range) by 65413 (the next prime) with the same result, forgoing the extra counters/flags. Intuitively I agree summing powers mod prime would hold all the information, and extracting it seems not obvious.
However IMO all that is way too complex. Here is mine:
you can accumulate the sum, that's good enough for 1 missing/extra number.
For two numbers, you can accumulate the sum and also the product, like so:
- threat zero separately (with a simple flag);
- use a double ("d0") to multiply all numbers from 1 to 65535
- use a double ("d2") to multiply all numbers present
- rescale both to prevent overflow (say divide both by 65536 each time d0 exceeds 65536)
- finally divide d0 by d2 (or d2 by d0) to get the product.
- even though d0 and d2 are both inaccurate (they lost most of the digits), their quotient
accurately represents the product of the missing (or extra) numbers.
With a 32-bit mantissa you can discriminate all possible products that have the same sum, the
extreme case is 65532*65535 against 65533*65534.
Having sum and product, a simple equation generates the two numbers.
For three numbers (say x y z):
- accumulate the sum x+y+z
- accumulate the product xyz
- accumulate the product (x+1)(y+1)(z+1)
- the products need 48-bit mantissa, a double still is OK
- a simple substitution yields xy+yz+zx
From these three (xyz, , xy+yz+zx, z+y+z) you can find x, y and z either analytically,
or by trial-and-error: just try all possible values of z (1..65535), and solve what remains
for x and y. Most z values will have no solution, you end up finding six permutations of the
one real solution.
For four numbers, you need a 64-bit mantissa, which a double does not offer; so you need a higher-
precision math package. Then accumulate products and two shifted products, in the end leading to
a fourth-order set of equations for which I haven't figured out yet if there is an acceptable
way to solve it.
|
|
|
|
|
Your approach for n=3 seems to devolve into trial and error; the same sort of trial and error would allow for a solution using the (mod 65521) maths. Using a smaller modulus would require having more flags to handle the numbers between the modulus and 65535.
A nice (but sub-optimal) bit-based approach for handling the two-number case would be to keep sixteen words, holding the xor of all the items in the list with bit 0 set, all the items with bit 1 set, etc. At least one of the numbers must have at least one bit set which is clear in the other number, and so that number may be read out easily.
For the case of n=3, a similar approach could work except that instead of doing an xor, one would have to hold a count of how many numbers had the various bits set. If there is exactly one number missing which had a particular bit set, the number may be easily identified. With three numbers, though, it's possible that a number of bits will be present in two missing numbers and none in exactly one. This situation may be resolved, however, by nothing that there must be exactly three missing numbers, and so if there are three different patterns of missing bits they must derive from the missing numbers. I don't quite know how to describe the analysis algorithmically, but it works out.
I think a similar approach will work with n=4, though it starts to get messier. I have no idea about n=5 and beyond.
|
|
|
|
|
supercat9 wrote: Your approach for n=3 seems to devolve into trial and error; the same sort of trial and error would allow for a solution using the (mod 65521) maths. Using a smaller modulus would require having more flags to handle the numbers between the modulus and 65535.
I tend to disagree. First of all there is an analytical solution for one root of a cubic equation;
second, when a trial loop over z is executed, what is left is a simple quadratic equation, so each z-value needs an IsSquare(discriminant) test, which is much more straightforward than your modulo-prime arithmetic.
I did look into xoring the numbers for n=2 earlier, but decided against it; the product being very
easy to handle. I am not sure I see how xoring helps you for n=2 or higher. For n=2 if the two
missing numbers are identical except for 1 bit, then its easy, otherwise...
Just try numbers 0-7 with missing 2 and 3 (easy), 2 and 5 (now what?).
|
|
|
|
|
For the n=2 case, if there were 16 xors produced (xor of all numbers with bit 0 set, with bit 1 set, etc.) then if the missing numbers were 2 and 5, then xor[0] and xor[2] would both be 5, while xor[1] would be 2. Easy.
If the numbers shared a bit but each had at least one bit the other lacked (e.g. the values were 3 and 5) then the xor's would have three non-zero values: xor[0] would be 6 (3 xor 5), xor[1] would be 3, and xor[2] would be 5. Simplest thing to do in that scenario is to ignore the non-zero xor value which does not have the 'indicated' bit set (i.e. xor[0] in this case, since the value 6 has bit 0 clear). The other two non-zero xor values are the missing numbers.
If one number is a subset of the other (e.g. 2 and 3), then one of the non-zero xors will have its corresponding bit clear (e.g. xor[1] will equal 1, which has bit 1 clear) and the other will have it set. The one that has the bit set is one of the missing numbers; xor that with the other one to get the other missing number.
For n=3, one will have a count of bit values, mod 4. Bits that are missing from none of the numbers or from all three will be trivial to place accurately. Bits that are missing from exactly one of the numbers will be easy to place; if any such bits exist, the problem will then decompose to n=2. The only hard case is where three different bits are each used by exactly two of the missing numbers (e.g. if the missing numbers are 3, 5, and 6, then tally[0][2..0] = (1,1,2); tally[1][2..0] = (1,2,1); tally[2][2..0] = (2,1,1). The tally values make it possible to determine which bits are isomorphic; that in turn will make it possible to determine which numbers are missing.
|
|
|
|
|