This reference page gives various technical details
of floating-point (FP) numbers in Java. This information is quite
useful if you plan on doing extensive numerical calculations with
Java. We recommend that newcomers to Java should just scan the info
and come back to it later as needed.
Floating-Point
Representations
Floating-point values in Java, which follows most
of the standard IEEE 754 floating-point specifications, are represented
by two types: the float
and double.
As shown previously, the bit representation for float
goes as
1
bit |
8
bits |
23
bits |
Sign |
exponent |
significand |
and for double
type
1
bit |
11 |
52 |
Sign |
exponent |
significand |
For float
the 8 bits of the exponent give values in the range of 0-255.
However, 0 and 255 are special values (discussed below), so the
allow values range from 1 to 254. A bias of 127 is subtracted
to give an unbiased exponent range of -126 to 127.
Similarly, for double
the 11 bits of the exponent give values in the range of 0-2047.
In this case, 0 and 2047 are special values (discussed below),
so the allow values range from 1 to 2046. A bias of 1023 is subtracted
to give an unbiased exponent range of -1022 to 1023.
The float
representation gives 6 to 9 digits of decimal precision while
double
gives 15 to 17 digits of decimal precision.
When the exponent values are in their allowed unbiased
ranges, the representations are said to be nomalized. In the normalized
modes, the b0
value in
(-1)s
·(b0 + b1·2-1 + b2·2-2
+ b3·2-3 + ...+ bn-1·2-(n-1))·2exponent
is taken as 1 so that the effective number of bits
is increased to 24 for float
and 53 for double.
When the biased exponent is zero (i.e. all bits
are zero), the value is is denormalized and the b0
value is taken as 0. The exponent is taken to be -126 for float
and -1022 for double.
The denormalized mode allows for a "smoother approach to
zero" at the smallest value range.
The following shows the minimum and maximum values
possible with these types in the two different modes:
- float
- Normalized
-127 < exponent
< +128
min = 2-126 * 1.00000000000000000000000 = 1.17549435E-38
max = 2+127 * 1.11111111111111111111111 = 3.4028235E+38
- Denormalized
exponent = -126
min = 2-126 * 0.00000000000000000000001 = 1.4012985E-45
max = 2-126 * 0.11111111111111111111111 = 1.1754942E-38
- double
- Normalized
-1023
< exponent < +1024
min = 2-1022 * 1.0000000000000000000000000000000000000000000000000000
= 2.2250738585072014E-308
max = 2+1023 * 1.1111111111111111111111111111111111111111111111111111
= 1.7976931348623157E+308
- Denormalized
exponent
= -1022
min = 2-1022 * 0.0000000000000000000000000000000000000000000000000001
= 4.9E-324
max = 2-1022 * 0.1111111111111111111111111111111111111111111111111111
= 2.225073858507201E-308
The normalized/denormalized modes are not usually something the
programmer has to deal with but for numerical computing can be
of possible importance.
Next we look at the other special floating-point values.
Floating-Point
Special Values
Operations with floating-point never result in an
exception thrown. (Exceptions
are Java error conditions, to be discussed later.) For example,
even if an operation results in a divide by zero there
is no exception message. (An integer divided by zero does give
an exception.)
Instead of error messages for abnormal operations,
the floating-oint result is filled with one of several special
floating-point values:
The special floating-point cases include:
- +/- Zero : if the bits
in both the exponent and the significand all equal 0, then
the FP value is -0 or +0 depending on the sign bit.
- Positive zero is produced by underflow form the
positive direction, e.g.
x
= 2.0e-45 * 1.0e-10
- Negative zero is produced by underflow from the
negative direction, e.g.
x
= -2.0e-45 * 1.0e-10
-
+/-Infinity : if all the bits in the exponent
equal 1 and all the bits in the significand equal 0, then
the FP value is -Infinity
or +Infinity
depending on the sign
- Positive infinity is produced by overflow of
a positive value
- Negative infinity is produced by overflow of
a negative value
- NaN : if all the bits
in the exponent equal 1 and any of the bits in the significand
equal 1, then the FP value is Not-a-Number and the sign value
is ignored. Produced by operations such as a divide by zero
and square root of -1.
Overflows, underflows and divide by zero in Java
do not lead to error states. A division by zero leads to
the +/-Infinity
value unless the nominator equaled zero, in which case the NaN
value appears. You can test for such values using methods from
the floating-point wrapper classes (see Chapter
3: Java.) such as Double.isNaN(double
x). Also, the NaN
value can be checked for with the test if
( x != x) statement which will fail for NaN
values.
Finite floating-point numbers and the special values
are ordered from smallest to largest as follows:
The positive and negative zero values act as
-
Positive zero and negative zero compare as equal
-
1.0
/ (positive zero) ==> POSITIVE_INFINITY
-
1.0
/ (negative zero) ==> NEGATIVE_INFINITY
The NaN
values are unordered. This means that:
-
Numerical comparisons and tests for numerical
equality result in false if either or both operands are NaN.
-
A test for numerical equality of a value against
itself results in false if and only if the value is NaN.
-
A test for numerical inequality results in true
if either operand is NaN
Extended Exponents
The JVM
Specifications after version 1.1. allow for an implementation
to include extended exponent versions of either or both the
float and double types during intermediate calculations to
avoid over/under flows.
-
N = number bits in mantissa
-
K = number bits exponent
-
Emax = maximum value of exponent
-
Emin = minimum size of exponent.
The table maps the floating-oint specifications allowed for
the four types.
Parameter
| float
| float-extended-exponent
| double
| double-extended-exponent
|
N
| 24
| 24
| 53
| 53
|
K
| 8
| > 10
| 11
| > 14
|
Emax
| +127
| > +1022
| +1023
| > +16382
|
Emin
| -126
| < -1021
| -1022
| < -16381
|
The final accessible floating-point results will be in float
or double
types but intermediate floating-point values can use the larger
extended exponent representations if the platform processor allows
it. There is no access for the Java programmer to the extended exponent
types.
The JVM does not support either the official IEEE 754 single extended
or double extended format since these extended formats require extended
precision, i.e. longer significand, in addition to the extended
exponent ranges shown in the above table.
The documentation for a particular JVM should indicate whether
it allows for the extended exponent options.
The modifier strictfp
in front of a method will force the precision to remain at 64
bit for all calculations within that method. This is useful if
one wants to ensure exactly the same results regardless of the
platform or JVM implementation.
(This is not related to the strictMath
class discussed in the Math class section.)
Floating
Point Literals and Rounding Rules
Some more notes about
Java floating-point include:
Literals
Literals default to double
unless appended with f or F:
float
x=1.0; // compile time error
float x=1.0f; // OK
double x=1.0; // OK
Floating-point rounding:
The JVM uses IEEE 754 round-to-nearest mode: inexact
results are rounded to the nearest representable value, with
ties going to the value with a zero least-significant bit.
Instructions that convert values of floating-point types to
integer values will round towards zero.
Floating-Point Programming
Notes
In general, it is safest to do floating-point calculations in double
type. This helps to reduce round-off errors that can reduce precision
during intermediate calculations. (You can always cast the final
value to float if that is a more convenient size for I/O or storage.)
There can be some performance tradeoff, since double operations
involve more data transfer, but the size of the tradeoff depends
on the JVM and the platform. (In Chapter
12 we discuss techniques for measuring code performance.)
The representations of the primitives are the same on all machines
to insure the portability of the code. However, during calculations
involving floating-point values, intermediate values can exceed
the standard exponent ranges if allowed by the particular processor
(see table above).
The strictfp
modifier of classes or methods requires that the values remain within
the range allowed by the Java specifications throughout the calculation
to insure the same results on all platforms.
Floating-Point
Demo
Here we use an applet to display results of several
math expressions. To see outputs from the print
statements run with an appletviewer or look in the browser's
Java
console. You can also run it as
an application. Try to predict the results before looking at the
output.
|
import
java.applet.Applet;
import java.awt.*;
/** This applet tests various math expressions.
* Run with appletviewer to see print out on
* screen or with a browser Java console.
**/
public class FPSpecialValues extends Applet {
public void init() {
// FP literals are double type by
default.
// Append F or f to make float or
cast to float
float x = 5.1f;
float y = 0.0f;
float div_by_zero = x/y;
System.out.println ("Divide By Zero
= x/y = " + div_by_zero + "\n");
x = -1.0f;
div_by_zero = x/y;
System.out.println ("Divide negative
by zero = x/y = " + div_by_zero +
"\n");
x = 2.0e-45f;
y = 1.0e-10f;
float positive_underflow = x*y;
System.out.println ("Positive underflow
= " + positive_underflow +
"\n");
x = -2.0e-45f;
y = 1.0e-10f;
float negative_underflow = x*y;
System.out.println ("Negative underflow
= " + negative_underflow +
"\n");
x = 1.0f;
y = negative_underflow;
float div_by_neg_zero = x/y;
System.out.println ("Divide 1 by
negative zero = " + div_by_neg_zero +
"\n");
x = 0.0f;
y = 0.0f;
float div_zero_by_zero = x/y;
System.out.println ("Divide zero
by zero = " + div_zero_by_zero + "\n")
}
public void paint (Graphics g) {
g.drawString ("Math tests",20,20);
}
}
|
References & Web Resources
Latest update: Oct. 15, 2004
|