|
it would take a number of steps, possibly iteratively until no more changes, such as:
- eliminating parentheses
- replacing constant expressions by their value
- setting a canonical order (e.g. alphabetical)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Ok I'll test how well it works
|
|
|
|
|
Depending on how these expressions are (they do not make different computations depending on the entry value), and if the inputs are always natural numbers you can take advantage of the induction method for mathematical demonstrations, so you would only have to check equality for: one base case (input = 0 for example); one random case (say input = n, being n different form the base case); and the next case to the random case (input = n + 1). If these three cases macth, the expressons are equivalent. For example:
We want to check if these two expressions are equivalent for every natural numbers bigger than 1:
- Expression 1: 1+2+3+4+...+n
- Expression 2: (1+n)*n/2
So our base case would be n=1:
-Expression 1: result = 1;
-Expression 2: (1+1)*1/2 = 1;
Our base cases match.
Lets take a random n... lets say 7:
-Expression 1: 1+2+3+4+5+6+7=28;
-Expression 2: (1+7)*7/2 = 28;
Our random cases match.
Lets check now for random+1, I mean, n = 8:
-Expression 1: 1+2+3+4+5+6+7+8=36;
-Expression 2: (1+8)*8/2 = 36;
Random + 1 matches as well, so both expressions are equivalent for every natural number greater or equal to 1.
I think making a general expression comparer with delegates to the methods implementing each expression would not be too difficult.
|
|
|
|
|
I'm confused, that's not quite how I learned induction works..
Also, it doesn't seem to work for integers:
-Expression 1: = x/y
-Expression 2: = x/z
If the random sample is less than Min(y, z), they would be detected as equivalent (the result would be zero) regardless of whether y and z are equal.
|
|
|
|
|
David1987 wrote: I'm confused, that's not quite how I learned induction works..
Sure, I am not making demonstrations, but just using the mathematical induction method to check if two expressions are equivalent... It's a heuristic method.
David1987 wrote: Also, it doesn't seem to work for integers: -Expression 1: =
x/y -Expression 2: = x/z
If the random sample is less than Min(y, z),
they would be detected as equivalent (the result would be zero) regardless of
whether y and z are equal.
Errrr, they would not. If you want to represent a division, say for example "n / m", using only integer numbers you just cannot forget the rest of the division, I mean:
n / m = c * m + r, where r is n mod m.
In the case you show, "r" would be equal to "y" in the first case and equal to "z" in the second one, so if y != z then expression 1 is not equivalent to expression 2.
|
|
|
|
|
So then I'd have to convert the expressions first, ok, it's still looking quite weird though because surely "expressions are equivalent for 3 sampled inputs" does not imply "expressions are equivalent for all inputs"?
|
|
|
|
|
Errr, of course not. I am sorry if I am not explaining it well... We do not check if they are equivalent for any three random values, we check them for the base case, a random case and the succesor of the random case. It is just a straight application of the axiom of induction[^] for natural numbers (remember, works only with natural numbers).
|
|
|
|
|
Well then, what if the expressions are both polynomials that both happen to intersect the x-axis at 3 points, namely the base case, the random case, and its successor?
|
|
|
|
|
Induction can only be applied with Natural numbers. If your expression will only evaluate Natural numbers then this is a good, quick and effective approach.
David1987 wrote: Well then, what if the expressions are both polynomials that both happen to
intersect the x-axis at 3 points, namely the base case, the random case, and its
successor?
This means Real numbers, so you cannot apply induction here. That's why I said in my first reply: "if the inputs are always natural numbers...". If you don't know what mathematical induction is then I recommend you to follow the link I gave you before but, please, stop replying as if I was stupid and did not know what I am talking about... I can't belive it...
|
|
|
|
|
_Erik_ wrote: This means Real numbers
Why does it mean that? Why can't two polynomials intersect at 3 places that are all natural integers?
Also I know perfectly well what induction is and this isn't it. You have to prove that you can go from one case to the next - you're just proving that some random case and the next are both accidentally correct.
There is no random in induction. You have to analytically show that if something is true for n, that implies that it is also true for n+1.
|
|
|
|
|
Definitely, I cannot make myself clear today... English is not my first language... That must be the reason.
David1987 wrote: Why does it mean that? Why can't two polynomials intersect at 3 places that are
all natural integers?
It does not matter where the function intersect with an axis. That is not important. The important is: only natural numbers are valid inputs for the function... I'll try to use other words: the function (or expression) can only be applied to natural numbers. However, the result can be any real number. Speaking about polynomials and "X axis" I understand you are sepaking about drawing a continous function (polynomial), so its inputs can be real numbers. I don't mind at all where de function will intersect the axis. If it can receive any real number as input, then induction is out of scope.
David1987 wrote: Also I know perfectly well what induction is and this isn't it. You have to
prove that you can go from one case to the next - you're just proving that some
random case and the next are both accidentally correct.
There is no
random in induction. You have to analytically show that if something is true for
n, that implies that it is also true for n+1.
This is the third time I will say that this is only a heuristic method applying the axiom of induction, not a demonstration. If you increase your random samples (and succesor) to, let's say, 10, for example, and all of them macth along with the base case, then you can be pretty sure both expressions are equivalent. I think it is much faster than checking for 2^64 different possible inputs, and was trying to put you on a good direction (remember, only for natural numbers as input).
|
|
|
|
|
I'm with David on this.
Induction means that for some mathematical statement that depends on an integer i:
1. you verify it holds true for a specific (typically small) value j0 of i;
2. you proof by reasoning, not by evaluation or observation, that assuming it holds true for i=j it then will necessarily also be true for i=j+1.
3. you now simply apply step 2 for consecutive values j0, j0+1, j0+2, etc.
In what you propose there is no proof by reasoning in step 2, it is just evaluations, hence very fallible.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
This is becoming really interesting. Have my 5.
Yes, I agree as well. That's why I stated it was a heuristic algorithm, not a demonstration, so it is theoretically very fallible, of course. But now, take it to practice, increasing the number of random checks with their succesors becouse you know it might fail. Do you think it would not be reliable in practice?
|
|
|
|
|
_Erik_ wrote: Do you think it would not be reliable in practice?
that depends a lot on the kind of expressions, operators, functions, ... you envision.
you will have a hard time to compare (i==48329134)?1:0 and (i==48329135)?1:0 in a heuristic way.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Luc Pattyn wrote: that depends a lot on the kind of expressions, operators, functions, ... you
envision.
Sure, I was giving an idea that might be worth under some previous conditions, and maybe some expressioins, operators, functions... would be hard to compare this way. However...
Luc Pattyn wrote: you will have a hard time to compare (i==48329134)?1:0 and
(i==48329135)?1:0 in a heuristic way.
Not that hard in this case, since these expressions can be defined as:
f1(n) = 1 if n==48329134; 0 in any other case
f2(n) = 1 if n==48329135; 0 in any other case
We have two base cases here for each one of the expressions: n=0 and n=48329134 in the first case; n=0 and n=48329135 on the second one, so the method I have described before would work perfectly and really quick finding the two expressions are not equivalent becouse one of the base cases do not match.
|
|
|
|
|
_Erik_ wrote: It is just a straight application of the axiom of induction[^] for natural numbers
But from the wiki article :-
...proving that if any one statement in the infinite sequence of statements is true, then so is the next one.
What you've done is shown it may be true for two arbitrary sequential cases, but you haven't derived a proof that if case n is true, then so must n+1 be. That really requires symbolic manipulation of the equations.
Days spent at sea are not deducted from one's alloted span - Phoenician proverb
|
|
|
|
|
Yes, I agree, as I said before, this is just a heuristic method, not a demonstration. Increasing the number of random cases (and succesors) would bring a very accurate result without having to check every possible input.
Edit: Have my 5 for your comment
modified on Friday, March 18, 2011 10:53 AM
|
|
|
|
|
Ok, maybe I now understand what you meant. However, it doesn't solve the problem, since false positives are not allowed at all.
|
|
|
|
|
Yes, I know... but I did it when I was at the University and nobody was able to find two expressions that could bring a false positive. It checked for the base case (of course) and another 10 random values (and their respective succesors).
|
|
|
|
|
Eric's suggestion is excellent and is quite similar to how the fast primality test finds large primes quickly.
If you test in enough bases the chance of an equation failing can be reduced to 1 in billions
This means it falls under the meteor rule..
If the chance of a false positive is less than being hit by a meteor today..It's okay to use it
So I think Eric is got an excellent idea.
|
|
|
|
|
I respectfully disagree. This will be part of an optimizing compiler. If the meteor decides to hit us after all then someone will have a nearly impossible to diagnose bug.
There are also large classes of expressions for which Eriks heuristic method does not work at all regardless of how many iterations it uses or require such a large number of iterations that it becomes impractical.
For example: (uint64)x / 3 and ((uint64)x * 0x5555556) >> 32 are the same for x in the interval 0..0x7FFFFFFF but not outside it. That gives a 0.5^k error probability where k is the number of iterations - which is fairly good, but if you combine a couple of those you can make the error probability arbitrarily high.
And I'm not even "cheating" with comparisons here - a single comparison against a constant can make the error probability arbitrarily high. They could be special-cased of course, but can the expressions above be special cased?
|
|
|
|
|
David, if these expressions are analytic expressions (defined and differentiable for all values of input), then a simple proof by induction will ALWAYS verify their equivalence. So you only have to try for a few values.
BHM
|
|
|
|
|
Maybe so, but that's not what we have here, they are expressions with types such as int32 (see original post) and the class of expressions is not limited to something simple.
For example, 1/x and 2/x differ only for x=1 and x=2 (edit: and -1 and -2), but there are 2^32 possibilities for x if it is an int32 and even more if it is an int64 - the chance that either 1 or 2 is tested isn't that big.
modified on Sunday, April 3, 2011 8:58 AM
|
|
|
|
|
Can you define or characterize this "good portion"?
For example, if you want to detect equivalence between bounded-degree polynomials (two computer programs or expressions such that their result is eventually a polynomial in the input), and you know the degree bound, it can be easily done by checking a small and finite number of inputs.
|
|
|
|
|
A good portion is hard to define
It should find "significantly" more equivalences than merely comparing whether two expressions are equal.
The class of expressions is not constrained in any way. I suppose I could first detect whether two expressions are both polynomials and then use a special comparison, but there are plenty of other types of expression so in order to find a "good portion" of equivalences I'd need something for them too.
|
|
|
|
|