Beta rounding floating point numbers

Question

Beta rounding floating point numbers

The first bullet point on slide 27 in the "Floating Point Arithmetic" chapter provides a definition for Beta as follows:

Beta = (B^e-1) / 2 = B^e-1 - 1

However, when examining this equation, I've encountered a discrepancy, specifically when B=2 and e=2:

(2² - 1) / 2 = 1.5 ≠ 2^2-1 - 1 = 1

One possible explanation that comes to mind is that there might be floor rounding involved. Yet, this rounding concept introduces more questions. For instance, when evaluating slide 42 with B=2, e=2, and m=2, if I floor Beta to (2²-1)/2 = 1.5 ≈ 1 = Beta:

Attempting to convert [0|01|01] to decimal: Value Mantissa * B^1-m * B^{value exponent - Beta} = 1 * 2^1-2 * 2^1-1 = 0.5 (Wrong solution)

On the other hand, if I ceil Beta to 2 and reattempt the calculation:

1 * 2^1-2 * 2^1-2 = 0.25 (which is correct, and other examples in this table also work)

Now, as I proceeded to slide 50, I consistently followed the ceiling function to calculate Beta: (2³ - 1) / 2 = 3.5 ≈ 3 = Beta

Continuing with this approach, I attempted to convert [0|001|1000] to decimal: Value Mantissa * B^1-m * B^{value exponent - Beta} = 8 * 2^1-3 * 2^1-3 = 1/2 (wrong solution)

Now flooring Beta to 4:
8 * 2^1-3 * 2^1-4 = 1/4 (which is correct, and other examples in this table also work)

So, my question is: Which rounding approach should I apply when computing Beta?

Thanks in advance.

asked Aug 23, 2023 in * TF "Emb. Sys. and Rob." by User100 (290 points)

1 Answer

Best answer

Thanks for pointing out a couple of bugs!

First, the exponent is defined with a bias that should be *half of the maximum exponent*, which obviously requires rounding if the maximum exponent is an odd number (as usual). So we have to decide whether to round it up or down. This is usually (in DiRa always!) done using the integer quotient, i.e. using the floor function, which means that we end up rounding down to minus infinity.

Now let's look at the specific questions: First, there is an error on slide 27: If the exponent has e digits to base B, this means that we have beta = (B^e-1) div 2 = B^{e-1}*b-1 with b:= B div 2 (integer quotient). On page 27, the correct formula is beta = (B^e-1) div 2, but the simplification B^{e-1}-1 is only correct for B=2.

This can be proved as follows: For B=2b with b := B div 2, we get

beta = (B^e-1) div 2 = (B^{e-1}*2*b-1) div 2 = floor((B^{e-1}*b-(1/2)) = B^{e-1}*b-1

and for B=2b+1 with b := B div 2, we get:

beta = (B^e-1) div 2 = (B^{e-1}*(2*b+1)-1) div 2 = floor((B^{e-1}*b-(1/2)) = B^{e-1}*b-1

So, on slide 27, it should be beta = (B^e-1) div 2 = B^{e-1}*b-1 with b:= B div 2 (integer quotient). We then have the exponents 0-beta,...,B^e-1-beta = 1-B^{e-1}*b,...,B^{e-1}*(B-b).

Next, on slide 42, we look at floating point numbers with e=2 and B=2, so we get beta = (2^2-1) div 2 = 3 div 2 = 1 and the exponent range is -1,0,1,2. To convert the floating point number [0|01|01] to decimal, we do the following: We have m=2 and e=2, so the formula for evaluating the numbers given on page 18 results in M*2^{1-2} * 2^{E-beta} = M*2^{-1} * 2^{E-1} = M * 2^{E-2} = 1*2^{-1} = 0.5. Therefore, the decimal numbers listed on page 42 are not correct! I remember adding the extra column of decimal numbers at the very end and calculating it by hand, which I now realize was a mistake. In fact, the correct numbers need to be doubled. However, the lesson learned on page 42 is not affected by the decimal values, since we want to see which numbers have unique and multiple representations. This may explain why this error was overlooked.

Next, look at slide 50: The conversion of [0|001|1000] to decimal is done as follows: We have m=4 and e=3, so we get beta = (B^e-1) div 2 = (2^3-1) div 2 = 7 div 2 = 3 with exponents allowed -3,...,4. The formula for evaluating the numbers given on page 18 gives M*2^{1-4} * 2^{E-3} = 8*2^{-3} * 2^{1-3} = 2^{-2} = 0.25, which you can also find in the table on page 40. So that is correct!

On slide 49, you can still apply what has been written above.

On slide 50, however, the rules change because we are looking at IEEE754 numbers. To convert [0|001|1000] to decimal, we first need to check if the number is a normal or denormal number. Since it is a normal number, we use the formula on page 47 with the hidden bit 1, which is the 1 of the mantissa 1000 as mentioned on page 50. There are 3 bits for the exponent and 4 bits for the mantissa, so we have m:=4, m'=m-1=3, and e=3, so beta=(2^3-1) div 2 = 3. Since the numbers on page 50 are *listed with the hidden bit* (as mentioned there), we have the mantissa 000 when it comes to the formula on page 47. So the number is evaluated as (1.0+M*2^{1-m})*2^(E-beta) = (1.0+M*2^{-3})*2^(E-3) = (1.0+0*2^{-3})*2^(1-3) = (1.0+0.0)*2^(-2) = 1.0/4.0 = 0.25, so the values on pages 49 and 50 are correct.

So, in summary, you have to calculate beta := (B^e-1) div 2 using the integer quotient, i.e. rounding to zero. The simplification on page 27 of this formula is not correct for B other than 2, and the decimal values on page 42 should be twice as large. The rest is correct as far as I can tell.

answered Aug 23, 2023 by KS (170k points)
selected Aug 26, 2023 by User100

Most popular tags

Categories

Beta rounding floating point numbers

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions