| This reference page gives various technical details 
              of floating-point (FP) numbers in Java. This information is quite 
              useful if you plan on doing extensive numerical calculations with 
              Java. We recommend that newcomers to Java should just scan the info 
              and come back to it later as needed. 
             Floating-Point 
              Representations Floating-point values in Java, which follows most 
              of the standard IEEE 754 floating-point specifications, are represented 
              by two types: the float 
              and double. 
              As shown previously, the bit representation for float 
              goes as   
              
                 
                  | 1 
                    bit | 8 
                    bits | 23 
                    bits |   
                  | Sign | exponent | significand |  and for double 
                type  
                 
                  | 1 
                    bit | 11 | 52 |   
                  | Sign | exponent | significand |  For float 
                the 8 bits of the exponent give values in the range of 0-255. 
                However, 0 and 255 are special values (discussed below), so the 
                allow values range from 1 to 254. A bias of 127 is subtracted 
                to give an unbiased exponent range of -126 to 127. Similarly, for double 
                the 11 bits of the exponent give values in the range of 0-2047. 
                In this case, 0 and 2047 are special values (discussed below), 
                so the allow values range from 1 to 2046. A bias of 1023 is subtracted 
                to give an unbiased exponent range of -1022 to 1023.  The float 
                representation gives 6 to 9 digits of decimal precision while 
                double 
                gives 15 to 17 digits of decimal precision.  When the exponent values are in their allowed unbiased 
                ranges, the representations are said to be nomalized. In the normalized 
                modes, the b0 
                value in   (-1)s 
                ·(b0 + b1·2-1 + b2·2-2 
                + b3·2-3 + ...+ bn-1·2-(n-1))·2exponent is taken as 1 so that the effective number of bits 
                is increased to 24 for float 
                and 53 for double. 
               When the biased exponent is zero (i.e. all bits 
                are zero), the value is is denormalized and the b0 
                value is taken as 0. The exponent is taken to be -126 for float 
                and -1022 for double. 
                The denormalized mode allows for a "smoother approach to 
                zero" at the smallest value range. The following shows the minimum and maximum values 
              possible with these types in the two different modes:   
              
                 float
 
 
                    Normalized
 -127 < exponent 
                      < +128
 
 min = 2-126 * 1.00000000000000000000000 = 1.17549435E-38
 max = 2+127 * 1.11111111111111111111111 = 3.4028235E+38
 
 
Denormalized
 exponent = -126
 
 min = 2-126 * 0.00000000000000000000001 = 1.4012985E-45
 max = 2-126 * 0.11111111111111111111111 = 1.1754942E-38
 
 
double 
                  
 
  
                      
                        Normalized
 -1023 
                          < exponent < +1024
 
 min = 2-1022 * 1.0000000000000000000000000000000000000000000000000000
 = 2.2250738585072014E-308
 
 max = 2+1023 * 1.1111111111111111111111111111111111111111111111111111
 = 1.7976931348623157E+308
 
 
Denormalized
 exponent 
                          = -1022
 
 min = 2-1022 * 0.0000000000000000000000000000000000000000000000000001
 = 4.9E-324
 
 max = 2-1022 * 0.1111111111111111111111111111111111111111111111111111
 = 2.225073858507201E-308
 
  
              The normalized/denormalized modes are not usually something the 
                programmer has to deal with but for numerical computing can be 
                of possible importance.  Next we look at the other special floating-point values. Floating-Point 
                Special Values Operations with floating-point never result in an 
                exception thrown. (Exceptions 
                are Java error conditions, to be discussed later.) For example, 
                even if an operation results in a divide by zero there 
                is no exception message. (An integer divided by zero does give 
                an exception.)  Instead of error messages for abnormal operations, 
                the floating-oint result is filled with one of several special 
                floating-point values:  The special floating-point cases include:   
                
                  +/- Zero : if the bits 
                    in both the exponent and the significand all equal 0, then 
                    the FP value is -0 or +0 depending on the sign bit. 
 
 
                      Positive zero is produced by underflow form the 
                        positive direction, e.g. x 
                        = 2.0e-45 * 1.0e-10
 
 
Negative zero is produced by underflow from the 
                        negative direction, e.g.x 
                        = -2.0e-45 * 1.0e-10
 
 
 
                    +/-Infinity : if all the bits in the exponent 
                    equal 1 and all the bits in the significand equal 0, then 
                    the FP value is -Infinity 
                    or +Infinity 
                    depending on the sign 
 
 
                      Positive infinity is produced by overflow of 
                        a positive value
 
Negative infinity is produced by overflow of 
                        a negative value
 
NaN : if all the bits 
                    in the exponent equal 1 and any of the bits in the significand 
                    equal 1, then the FP value is Not-a-Number and the sign value 
                    is ignored. Produced by operations such as a divide by zero 
                    and square root of -1. Overflows, underflows and divide by zero in Java 
                do not lead to error states. A division by zero leads to 
                the +/-Infinity 
                value unless the nominator equaled zero, in which case the NaN 
                value appears. You can test for such values using methods from 
                the floating-point wrapper classes (see Chapter 
                3: Java.) such as Double.isNaN(double 
                x). Also, the NaN 
                value can be checked for with the test  if 
                ( x != x) statement which will fail for NaN 
                values.  Finite floating-point numbers and the special values 
                are ordered from smallest to largest as follows: 
 The positive and negative zero values act as 
               
                Positive zero and negative zero compare as equal 
                 
                1.0 
                  / (positive zero) ==> POSITIVE_INFINITY  
                1.0 
                  / (negative zero) ==> NEGATIVE_INFINITY    
              The NaN 
                values are unordered. This means that:  
              Extended Exponents 
                Numerical comparisons and tests for numerical 
                  equality result in false if either or both operands are NaN. 
                  
 
 
                A test for numerical equality of a value against 
                  itself results in false if and only if the value is NaN. 
 
 
                A test for numerical inequality results in true 
                  if either operand is NaN  
                  The JVM 
                    Specifications after version 1.1. allow for an implementation 
                    to include extended exponent versions of either or both the 
                    float and double types during intermediate calculations to 
                    avoid over/under flows.  
               
                 N = number bits in mantissa 
                 K = number bits exponent 
                Emax = maximum value of exponent 
                Emin = minimum size of exponent.  
                  The table maps the floating-oint specifications allowed for 
                    the four types.
 
              
                 
                  | Parameter | float | float-extended-exponent | double | double-extended-exponent |   
                  | N | 24 | 24 | 53 | 53 |   
                  | K | 8 | > 10 | 11 | > 14 |   
                  | Emax | +127 | > +1022 | +1023 | > +16382 |   
                  | Emin | -126 | < -1021 | -1022 | < -16381 |    The final accessible floating-point results will be in float 
              or double 
              types but intermediate floating-point values can use the larger 
              extended exponent representations if the platform processor allows 
              it. There is no access for the Java programmer to the extended exponent 
              types.  The JVM does not support either the official IEEE 754 single extended 
              or double extended format since these extended formats require extended 
              precision, i.e. longer significand, in addition to the extended 
              exponent ranges shown in the above table.  
                  The documentation for a particular JVM should indicate whether 
                    it allows for the extended exponent options. The modifier strictfp 
                in front of a method will force the precision to remain at 64 
                bit for all calculations within that method. This is useful if 
                one wants to ensure exactly the same results regardless of the 
                platform or JVM implementation. (This is not related to the strictMath 
                class discussed in the Math class section.) Floating 
                Point Literals and Rounding Rules Some more notes about 
                Java floating-point include:  
              
                 
                  Literals 
 Literals default to double 
                    unless appended with f or F:
 
 float 
                    x=1.0;  // compile time error
 float x=1.0f; // OK
 double x=1.0; // OK
  
                Floating-point rounding: 
                   The JVM uses IEEE 754 round-to-nearest mode: inexact 
                  results are rounded to the nearest representable value, with 
                  ties going to the value with a zero least-significant bit. Instructions that convert values of floating-point types to 
                  integer values will round towards zero.  Floating-Point Programming 
              Notes In general, it is safest to do floating-point calculations in double 
              type. This helps to reduce round-off errors that can reduce precision 
              during intermediate calculations. (You can always cast the final 
              value to float if that is a more convenient size for I/O or storage.) 
              There can be some performance tradeoff, since double operations 
              involve more data transfer, but the size of the tradeoff depends 
              on the JVM and the platform. (In Chapter 
              12 we discuss techniques for measuring code performance.)  The representations of the primitives are the same on all machines 
              to insure the portability of the code. However, during calculations 
              involving floating-point values, intermediate values can exceed 
              the standard exponent ranges if allowed by the particular processor 
              (see table above).  The strictfp 
              modifier of classes or methods requires that the values remain within 
              the range allowed by the Java specifications throughout the calculation 
              to insure the same results on all platforms.  Floating-Point 
              Demo Here we use an applet to display results of several 
              math expressions. To see outputs from the print 
              statements run with an appletviewer or look in the browser's 
              Java 
              console. You can also run it as 
              an application. Try to predict the results before looking at the 
              output.   
              
                 
                  |  |   
                  | import 
                      java.applet.Applet; import java.awt.*;
 
 /** This applet tests various math expressions.
 * Run with appletviewer to see print out on
 * screen or with a browser Java console.
 **/
 public class FPSpecialValues extends Applet {
 
 public void init() {
 // FP literals are double type by 
                      default.
 // Append F or f to make float or 
                      cast to float
 float x = 5.1f;
 float y = 0.0f;
 
 float div_by_zero = x/y;
 System.out.println ("Divide By Zero 
                      = x/y = " + div_by_zero + "\n");
 
 x = -1.0f;
 div_by_zero = x/y;
 System.out.println ("Divide negative 
                      by zero = x/y = " + div_by_zero +
 "\n");
 
 x = 2.0e-45f;
 y = 1.0e-10f;
 float positive_underflow = x*y;
 System.out.println ("Positive underflow 
                      = " + positive_underflow +
 "\n");
 
 x = -2.0e-45f;
 y = 1.0e-10f;
 float negative_underflow = x*y;
 System.out.println ("Negative underflow 
                      = " + negative_underflow +
 "\n");
 
 x = 1.0f;
 y = negative_underflow;
 float div_by_neg_zero = x/y;
 System.out.println ("Divide 1 by 
                      negative zero = " + div_by_neg_zero +
 "\n");
 
 x = 0.0f;
 y = 0.0f;
 float div_zero_by_zero = x/y;
 System.out.println ("Divide zero 
                      by zero = " + div_zero_by_zero + "\n")
 }
 
 public void paint (Graphics g) {
 g.drawString ("Math tests",20,20);
 }
 }
 |  References & Web Resources 
              
              Latest update: Oct. 15, 2004 |