The document explores floating-point numbers in VB, covering their approximation, storage, unexpected behaviors, special values, and implications for database primary keys, offering insights into handling and explaining these numbers in various programming languages.
CFloat.vb is a class to handle and explain float point numbers in VB.
What are Floating-point Numbers
Computers use floating point numbers to handle real numbers with decimal points. In some cases, the number we want to represent cannot be expressed exactly by the role of the float point sequence of binary digits. So math operations on floating point numbers may give slightly different results than what we expected. As an example: calculating 0.43 + 0.000001 - 0.430001, in C#, VB, VBA, Python, PHP, and Java will not return 0!!
Storage Diagram
The standard IEEE 754-2019 / ISO IEC 60559:2020 is set to organize the outline of floating-point arithmetic.
The [Float / Single] data type is stored in 4 bytes = 32 bits as the following binary storage diagram. S is the signal bit, E is the Exponents bit & F is the fraction bit.
Fraction is called mantissa too.
The binary format of a Single precision number is as follows:
S EEEE EEEE FFF FFFF FFFF FFFF FFFF FFFF
| Memory
Hexadecimal | Signal
Bit | Exponent
Hexadecimal | Fraction
Hexadecimal |
Epsilon
Smallest positive number
1.4E-45 | 0000 0001 | 0 | 00 | 000 0001 |
Zero; +0! | 0000 0000 | 0 | 00 | 000 0000 |
Negative Zero; -0! | 8000 0000 | 1 | 00 | 000 0000 |
1! | 3F80 0000 | 0 | 7F | 000 0000 |
The smallest number > 1
1.00000012! | 3F80 0001 | 0 | 7F | 000 0001 |
Not a number; NaN | FFC0 0000 | 1 | FF | 400 0000 |
Infinity; ∞ | 0F80 0000 | 0 | FF | 000 0000 |
Negative Infinity; -∞ | FF80 0000 | 1 | FF | 000 0000 |
The binary format of Double precision number is as given below:
S EEEE EEEE EEE FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
| Memory
Hexadecimal | Signal
Bit | Exponent
Hexadecimal | Fraction
Hexadecimal |
Epsilon
Smallest positive number
5E-324 | 0000 0000 0000 0001 | 0 | 000 | 0 0000 0000 0001 |
Zero; +0# | 0000 0000 0000 0000 | 0 | 000 | 0 0000 0000 0000 |
Negative Zero; -0# | 8000 0000 0000 0000 | 1 | 000 | 0 0000 0000 0000 |
1# | 3FF0 0000 0000 0000 | 0 | 3FF | 0 0000 0000 0000 |
The smallest number > 1
1.0000000000000002# | 3FF0 0000 0000 0001 | 0 | 3FF | 0 0000 0000 0001 |
Not a number; NaN | FFF8 0000 0000 0000 | 1 | 7FF | 8 0000 0000 0000 |
Infinity; ∞ | FF00 0000 0000 0000 | 0 | 7FF | 0 0000 0000 0000 |
Negative Infinity; -∞ | FFF0 0000 0000 0000 | 1 | 7FF | 0 0000 0000 0000 |
Unexpected Code Run Caused by Floating Point
Since the floating point is approximating real numbers, then adding and mathematical operation does not always lead to the exact result.
Addition test:
public void TestAdd()
{
double V = 0;
V = 0.43 + 1E-06 - 0.430001;
if (V != 0) {
Interaction.MsgBox("It will go here!");
}
float f = 0;
f = 0.43f + 1E-06f - 0.430001f;
if (f != 0) {
Interaction.MsgBox("It will go here!");
}
}
Sub TestAdd()
Dim V As Double
V = 0.43# + 0.000001# - 0.430001#
If V <> 0 Then
MsgBox("It will go here!")
End If
Dim f As Single
f = 0.43! + 0.000001! - 0.430001!
If f <> 0 Then
MsgBox("It will go here!")
End If
End Sub
Special values:
Sub Test(x As Double)
If x > 0 Then
MsgBox("X > 0")
ElseIf x = 0 Then
MsgBox("X = 0")
ElseIf x < 0 Then
MsgBox("X < 0")
Else
MsgBox("This is a possible case! What is the value of X hear?")
Dim R =
(Double.NaN = 0) = False AndAlso
(Double.NaN < 0) = False AndAlso
(Double.NaN > 0) = False
End If
End Sub
Sub Test(x As Single)
If x > 0 Then
MsgBox("X > 0")
ElseIf x = 0 Then
MsgBox("X = 0")
ElseIf x < 0 Then
MsgBox("X < 0")
Else
MsgBox("This is a possible case! What is the value of X hear?")
Dim R =
(Single.NaN = 0) = False AndAlso
(Single.NaN < 0) = False AndAlso
(Single.NaN > 0) = False
End If
End Sub
public void Test(double x)
{
if (x > 0) {
Interaction.MsgBox("X > 0");
} else if (x == 0) {
Interaction.MsgBox("X = 0");
} else if (x < 0) {
Interaction.MsgBox("X < 0");
} else {
Interaction.MsgBox("This is a possible case! What is the value of X hear?");
dynamic R = (double.NaN == 0) == false && (double.NaN < 0) == false &&
(double.NaN > 0) == false;
}
}
public void Test(float x)
{
if (x > 0) {
Interaction.MsgBox("X > 0");
} else if (x == 0) {
Interaction.MsgBox("X = 0");
} else if (x < 0) {
Interaction.MsgBox("X < 0");
} else {
Interaction.MsgBox("This is a possible case! What is the value of X hear?");
dynamic R = (float.NaN == 0) == false &&
(float.NaN < 0) == false && (float.NaN > 0) == false;
}
}
How to Get Float Point Value from its Exponent and Fraction
The following function will get the double
value of a float point number. The Exponent Bias and Fraction Base of the float point numbers are dependent on the type of floating point. See the table.
Type | Total Bits | Exponent bias | Fraction base |
Half / Float16 | 16 | 15 | 2^10 |
Single / Float | 32 | 127 | 2^23 |
Double | 64 | 1023 | 2^52 |
Quad / Float128 | 128 | 16383 | 2^112 |
The value of a number is calculated from the formula:
Exponent <> 0:
Value = ±2^(Exponent - ExponentBias) * (1 + Fraction / FractionBase)
Exponent = 0:
Value = ±2^(1- ExponentBias) * ( Fraction / FractionBase)
Function GetDoubleValue(IsNegative As Boolean, Exponent As UInt16,
Fraction As UInt64, ExponentBias As UInt16, FractionBase As UInt64) As Double
If Exponent = 0 Then
If Fraction = 0 Then
Return If(IsNegative, -0#, 0#)
End If
Dim FractionRatio = Fraction / FractionBase
Return If(IsNegative, -1#, 1#) * (2 ^ (1 - ExponentBias)) * FractionRatio
Else
If Exponent = 2 * ExponentBias + 1 Then
If Fraction = 0 Then
Return If(IsNegative, Double.NegativeInfinity, Double.PositiveInfinity)
Else
Return Double.NaN
End If
End If
Dim FractionRatio = Fraction / FractionBase
Return If(IsNegative, -1#, 1#) * _
(2 ^ (CInt(Exponent) - ExponentBias)) * (1# + FractionRatio)
End If
End Function
public double GetDoubleValue(bool IsNegative, UInt16 Exponent, UInt64 Fraction,
UInt16 ExponentBias, UInt64 FractionBase)
{
if (Exponent == 0) {
if (Fraction == 0) {
return IsNegative ? -0.0 : 0.0;
}
dynamic FractionRatio = Fraction / FractionBase;
return IsNegative ? -1.0 : 1.0 *
(Math.Pow(2, (1 - ExponentBias))) * FractionRatio;
} else {
if (Exponent == 2 * ExponentBias + 1) {
if (Fraction == 0) {
return IsNegative ? double.NegativeInfinity : double.PositiveInfinity;
} else {
return double.NaN;
}
}
dynamic FractionRatio = Fraction / FractionBase;
return IsNegative ? -1.0 : 1.0 *
(Math.Pow(2, (Convert.ToInt32(Exponent) - ExponentBias))) *
(1 + FractionRatio);
}
}
Database Primary Key and Float Point
Since the floating point is an approximation of real numbers; it is not a good idea to use it as the primary key in a database.
For example:
If we have a table named Customers
with a field id
with the data type single, then referring to one row may result in no record even if the record is in the database.
When we insert 0.4301 as id, the real id will be different a little and may thus result in unexpected results in a database update
or select
.
The flowing SQL may result in no record and this depends on the database drive and how it converts numbers from decimal to floating-point (float) type.
Select Customer From Customers where id = 0.4301
Float Numbers and Programming Languages
References