IEEE 754 Conversion

Andrea Ricchetti

4.00/5 (3 votes)

22 Jan 2019CPOL3 min read

28.9K

501

Two ways to make a IEEE 754 conversion (32 bit) pack and unpack

Download source - 1.8 KB

Introduction

This article will show how to convert a float value into an integer according to IEEE 754 rules.

I will show two ways.

One is faster than the other one, particularly on the unpack function.

Background

Sometimes, you need to send a float value over a protocol (serial or network) but the protocol you are using does not send/receive float value; it means this protocol supports only integer (signed or unsigned). This is very common during the developing of Microcontroller project.

In my personal experience, I've established a serial communication between two different microcontroller families (STM32F3 and STM32F7) using the Modbus rules.

If this scenario happens, you need to find a way to convert a float into an integer (from sender point of view), then convert the integer into its float value (receiver point of view).

A very common way to do this is using the IEEE 754 conversion.

The code was based on 32 bit but can easily be expanded to 64 bit. This could be necessary if you need to have more precision.

On the web, there is a lot of literature about this kind of approach, e.g., https://en.wikipedia.org/wiki/IEEE_754.

Using the Code

This code is quite portable because it has been written using very standard C code (I'm using it on Microcontroller, Windows O.S, Linux and Embedded world as well).

I've created two files (header and code) and two functions.

Here is their prototype and explanation.

This function returns an unsigned long value that is a representation of the input float value.

I have called it pack.

C++

uint32_t pack754_32 ( float f );

The unpaack function returns float values joined to the unsigned long value pass to it.

C++

float unpack754_32( uint32_t floatingToIntValue );

If the input value does not have a valid IEEE 754 representation, an undefined value has returned. I have called it unpack.

In this example, to have platform portability, I've used the defined type uint32_t (stand for unsigned long).

C++

// float unpack754_32 ( uint32_t floatingToIntValue );

FIRST WAY (faster) // Those methods use the implicit conversion of the C language

 uint32_t pack754_32( float f )
 { 
  uint32_t   *pfloatingToIntValue;
  pfloatingToIntValue = &f;

  return (*pfloatingToIntValue);
 }

 float unpack754_32( uint32_t floatingToIntValue )
 {
  float *pf, f;
  pf = &(floatingToIntValue);
  f = *pf;

  return f;
 }

Second Way (More Academic)

By using the union and its implicit conversion, this is another way to convert float into unsigned long value and vice-versa. This method will use BIT WISE functionalities The union will pack/unpack the value f into its representation:

C++

 typedef union UnFloatingPointIEEE754
 {
 struct
  {
   unsigned int mantissa : 23;
   unsigned int exponent : 8;
   unsigned int sign : 1;
  } raw;   
float f;  
} UFloatingPointIEEE754;

The Bit operation extracts the bit in order to create the desired value (exponent and mantissa), according to IEEE 754 method.

I've used a #define instead of a function or inline function, because this is a faster way.

C++

#define NTH_BIT(b, n) ((b >> n) & 0x1)

#define BYTE_TO_BIN(b)   (( b & 0x80 ) ) |\
            (( b & 0x40 ) ) |\
            (( b & 0x20 ) ) |\
            (( b & 0x10 ) ) |\
            (( b & 0x08 ) ) |\
            (( b & 0x04 ) ) |\
            (( b & 0x02 ) ) |\
            ( b & 0x01 )

#define MANTISSA_TO_BIN(b)  (( b & 0x400000 ) ) |\
             (( b & 0x200000 ) ) |\
             (( b & 0x100000 ) ) |\
             (( b &  0x80000 ) ) |\
             (( b &  0x40000 ) ) |\
             (( b &  0x20000 ) ) |\
             (( b &  0x10000 ) ) |\
             (( b &  0x8000 ) ) |\
             (( b &  0x4000 ) ) |\
             (( b &  0x2000 ) ) |\
             (( b &  0x1000 ) ) |\
             (( b &  0x800 ) ) |\
             (( b &  0x400 ) ) |\
             (( b &  0x200 ) ) |\
             (( b &  0x100 ) ) |\
             (( b &  0x80 ) ) |\
             (( b &  0x40 ) ) |\
             (( b &  0x20 ) ) |\
             (( b &  0x10 ) ) |\
             (( b &  0x08 ) ) |\
             (( b &  0x04 ) ) |\
             (( b &  0x02 ) ) |\
              ( b & 0x01 )

Finally, here is the definition of pack/unpack functions.

Those ones use the previous define to return the desired value.

The pack function uses the implicity conversion of the union. By assigning the value of the union, the sign, exponent and mantissa will auto-fitted.

C++

uint32_t pack754_32 ( float f )
{
  UFloatingPointIEEE754 ieee754;
  uint32_t    floatingToIntValue = 0;
  ieee754.f = f;
  floatingToIntValue = (((NTH_BIT(ieee754.raw.sign, 0) << 8) |
  (BYTE_TO_BIN(ieee754.raw.exponent)))  << 23 ) | MANTISSA_TO_BIN(ieee754.raw.mantissa);
  return floatingToIntValue;
}

The unpack function will use the bit wise operations to create the ad-hoc unsigned int value, according to IEEE754 standard.

C++

 float unpack754_32( uint32_t floatingToIntValue )
 {
   UFloatingPointIEEE754 ieee754;    unsigned int mantissa = 0;
   unsigned int exponent = 0 ;
   unsigned int sign = 0;    
   
   sign = NTH_BIT(floatingToIntValue, 31);
   for( int ix=0; ix<8; ix++)
    exponent = (exponent | (NTH_BIT(floatingToIntValue, (30-ix))))<<1;
   exponent = exponent>>1;
   for( int ix=0; ix<23; ix++)
    mantissa = (mantissa | (NTH_BIT(floatingToIntValue, (22-ix))))<<1;
   mantissa = mantissa >> 1;    
   
   ieee754.raw.sign = sign;
   ieee754.raw.exponent = exponent;
   ieee754.raw.mantissa = mantissa;    
   return ieee754.f;
 }

How to Use It

I have also provided a very simple test function that packs and unpacks some values.

C++

 void TestPackUnpack ( void )
 {
  uint32_t n;
  float f;

  n = 0x3FB4FDF4;   f= 1.414
  f = unpack754_32(n);

  n = pack754_32(1.414);
  f = unpack754_32(n);

  n = pack754_32(-1.259921);
  f = unpack754_32(n);

  n = pack754_32(0.58);
  f = unpack754_32(n);

  n = pack754_32(-0.588);
  f = unpack754_32(n);

  n = pack754_32(2);
  f = unpack754_32(n);

  n = pack754_32(-3);
  f = unpack754_32(n);

}

Points of Interest

I think this article will highlight some important functionality related to the C Language, such as implicit conversion, bit wise operator and so on.

The IEEE 754 conversion method can be used also to convert integer. In this way, if my protocol doesn't support float or floating point values, I can always use those methods to share information over the protocol.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)