How to convert base ten decimal numbers into base 16 in IEEE floating point format.

Step 1: Goal

To encode a 32-bit base ten number (normal counting system) that contains decimal values using the IEEE (Institute of Electrical and Electronics Engineers) floating point standard.

Step 2: Terms to Know

--Bit: a reference to a 1 or 0 that make up binary

--4210: this 10 is a reference to a base 10 number which is what most people use all the time, the 10 is often not used because it’s assumed that every number is base 10 unless otherwise specified

--10102: this 2 is a reference to a binary number

--47BF16: this 16 is a reference to a number in hexadecimal which means it uses 0-9 and A-F to count as a base 16 number

Step 3: Process Preparation

--This instructable will take approximately 10-20 minutes.

--The math required to complete the steps does not need a calculator.

--Just a writing utensil and paper is required to do the math.

Step 4: Background

This process is taking a number like 35.12 and transforming it into a version that a computer can read and manipulate. This is especially useful when precision is needed for calculations and a whole number that is rounded will not suffice. Rounding does occur which is called truncation error, but is negligible in most cases of use. The number that is encoded consists of three parts. The first bit on the very left is the sign bit. This bit tells the computer if the number is positive (0) or negative (1). The next eight bits tells the computer to what magnitude the number is. The last 23 bits are called the mantissa which actually store the value that was converted. For this instructable I will be using the example number 101.62510 that should model the process that is used to calculate the real number. Do not calculate the number the number above. It is only used for the example and as a guide to calculate the following.

Step 5: Problem

Use IEEE single format to encode the following decimal number into 32-bit floating point format: -10.312510

Step 6: Convert Both Sides of the Decimal Point Into Binary Numbers.

--First, divide the number on the left of the decimal point by two while storing the remainder each time. The result of this division should be all of the remainders that were gathered until you get to a number that cannot be divided by 2. The result should be stored in order of the remainders with the first remainder being the rightmost bit and so on.

--Second, multiply the number on the right side of the decimal point by 2 until you get 1.0. This time you store the result of the numbers past the decimal point as either a one or a zero. The first gathered number is the leftmost bit and so on.

Step 7: Move the Decimal Point All the Way to the Right of the Leftmost Bit.

--Move the decimal point until the last position to the right of the leftmost bit. Keep track of how many places the decimal moved. This will be used to report the magnitude.

Step 8: Determine the Leftmost Sign Bit.

--If the number is positive the sign bit is zero and one if the sign is negative.

Step 9: Determine the Bits of the Magnitude.

--This portion is made up of 8-bits total. The final number for this result is the magnitude plus 127. This is calculated by dividing the result of the previous magnitude from step 2 by 2 and recording the remainder. This process is the same process you used to calculate the binary value of the left side of the decimal in step 1.

Step 10: Determine the Bits That Will Make Up the Mantissa.

--This is the third portion of the result and it will be the value that was converted. As oppose to the other two sections this will be just copying what you already have into the proper format. This section is 23-bits long. Even though some bits are not used you back fill zero until you have 23-bits in total. Take the bits from step two that are to the right of the newly placed decimal position and stop when you get to the magnitude symbol.