The floating-point types are float
and double
, which are conceptually概念 associated with the single-precision 32-bit and double-precision 64-bit format IEEE 754 values and operations as specified指定 in IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985 (IEEE, New York).html
浮點類型是float 和double,它們是概念性概念,與單精度32位和雙精度64位格式的IEEE 754的值和運算相關,這些是在這個標準中制訂的:IEEE標準二進制浮點運算 ANSI / IEEE標準 754-1985(IEEE,紐約)。
java
The IEEE 754 standard includes not only positive and negative numbers that consist of a sign and magnitude量級, but also positive and negative zeros, positive and negative infinities, and special Not-a-Number values (hereafter從此 abbreviated縮寫 NaN). A NaN value is used to represent the result of certain某些 invalid operations such as dividing zero by zero. NaN constants of both float
and double
type are predefined as Float.NaN
and Double.NaN
.express
IEEE 754標準不只包括正數和負數,它們包括符號和量級,還包括正零和負零,正負無窮大和特殊非數字值(如下簡稱爲NaN)。NaN值用於表示某些無效操做的結果,例如將零除零。float
和double
類型的NaN常數預約義爲Float.NaN
和Double.NaN
。
編程
Every implementation of the Java programming language is required to support two standard sets of floating-point values, called the float value set and the double value set. In addition, an implementation of the Java programming language may support either or both of two extended-exponent擴展指數 floating-point value sets, called the float-extended-exponent value set and the double-extended-exponent value set. These extended-exponent value sets may, under certain circumstances, be used instead of the standard value sets to represent the values of expressions of type float
or double
(§5.1.13, §15.4).oracle
Java編程語言的每一個實現都須要支持兩個標準的浮點值集合,稱爲float value set 和 double value set。此外,Java編程語言的實現能夠支持稱爲float擴展指數值集合和double擴展指數值集合的兩個擴展指數浮點值集合中的一個或二者。在某些狀況下,這些擴展指數值集能夠用來代替標準值集合來表示類型float
或double
(§5.1.13, §15.4)表達式的值。
app
The finite有限的 nonzero values of any floating-point value set can all be expressed in the form s · m · 2(e - N + 1), where s is +1 or -1, m is a positive integer less than 2N, and e is an integer between Emin = -(2K-1-2) and Emax = 2K-1-1, inclusive包含, and where N and K are parameters that depend on the value set. Some values can be represented in this form in more than one way; for example, supposing that a value v in a value set might be represented in this form using certain values for s, m, and e, then if it happened that m were even and e were less than 2K-1, one could halve m and increase e by 1 to produce a second representation for the same value v. A representation in this form is called normalized if m ≥ 2N-1; otherwise the representation is said to be denormalized. If a value in a value set cannot be represented in such a way that m ≥ 2N-1, then the value is said to be adenormalized value, because it has no normalized representation.less
任何浮點值集合中的【有限非零值】均可以用 s · m ·2 (e - N + 1)來表示,其中 s 是 +1 或 -1,m 是小於 2N 的正整數,E 是 [ Emin = -(2K-1-2) , Emax = 2K-1-1 ] 之間的整數,而且其中參數 N 和 K 是依賴於集合的值。一些值能夠以多種方式以這種形式表示; 例如,假設值集合中的值v可使用s,m和e的某些值以此形式表示 ,則若是發生m爲偶數且e小於2 K -1,則能夠將一半米和增長e 1以產生相同的值的第二表示v。在這種形式的表示被稱爲歸一化的,若是m ≥2 N -1 ; 不然表示被稱爲非規範化。若是在設定的值的值不能在這樣的方式來表示中號 ≥2 Ñ -1,則該值被認爲是一個非標準化的值,由於它沒有歸一化表示。
eclipse
The constraints on the parameters N and K (and on the derived parameters Emin and Emax) for the two required and two optional floating-point value sets are summarized in Table 4.1.編程語言
表4.1中總結了兩個必需和兩個可選浮點值集合的參數 N 和 K(以及派生參數Emin和Emax)的約束。
ide
S:符號位,E:指數位,M:尾數位 float:S1_E8_M23,指數位有8位,指數的取值範圍爲-2^7~2^7-1(即-128~127) float的取值範圍爲-2^128 ~ +2^127(10^38級別的數) double:S1_E11_M52,指數位有11位,指取的取值數範圍爲-2^10~2^10-1(即-1024~1023) double的取值範圍爲-2^1024 ~ +2^1023(10^308級別的數) PS:官方文檔中好像說float指數的取值範圍爲-126~127,double指取的取值數範圍爲-1022~1023
float:S1_E8_M23,尾數位有23位,2^23 = 83886_08,一共7位,這意味着最多能有7位有效數字,但能保證的爲6位,也即float的精度爲6~7位有效數字; double:S1_E11_M52,尾數位有52位,2^52 = 45035_99627_37049_6,一共16位,同理,double的精度爲15~16位有效數字。
計算過程:將該數字乘以2,取出整數部分做爲二進制表示的第1位(大於等於1爲1,小於1爲0);而後再將小數部分乘以2,將獲得的整數部分做爲二進制表示的第2位......以此類推,直到小數部分爲0。 特殊狀況: 小數部分出現循環,沒法中止,則用有限的二進制位沒法準確表示一個小數,這也是在編程語言中表示小數會出現偏差的緣由
0.6 * 2 = 1.2 ——————- 1 0.2 * 2 = 0.4 ——————- 0 0.4 * 2 = 0.8 ——————- 0 0.8 * 2 = 1.6 ——————- 1 0.6 * 2 = 1.2 ——————- 1 0.2 * 2 = 0.4 ——————- 0 …………
計算過程:從左到右,v[i] * 2^( - i ), i 爲從左到右的index,v[i]爲該位的值,直接看例子,很直接的
1 * 2^-1 + 0 * 2^-2 + 0 * 2^-3 + 1 * 2^-4 + 1 * 2^-5 + …… = 1 * 0.5 + 0 + 0 + 1 * 1/16 + 1 * 1/32 + …… = 0.5 + 0.0625 + 0.03125 + …… =無限接近0.6
float f = 0.6f; double d1 = 0.6d; double d2 = f; System.out.println((d1 == d2) + " " + f + " " + d2);//false 0.6 0.6000000238418579 double d = 0.6; System.out.println((float) d + " " + d);//0.6 0.6
//注意,如下案例是刻意挑選出來的,並【不是全部】狀況都會出現相似問題的 System.out.println(0.05+0.01); //0.060000000000000005 System.out.println(1.0-0.42); //0.5800000000000001 System.out.println(4.015*100); //401.49999999999994 System.out.println(123.3/100); //1.2329999999999999
double addValue = BigDecimal.valueOf(0.05).add(BigDecimal.valueOf(0.01)).doubleValue(); System.out.println("0.05+0.01=" + (0.05 + 0.01) + " " + addValue);//0.05+0.01=0.060000000000000005 0.06 double subtractValue = BigDecimal.valueOf(1.0).subtract(BigDecimal.valueOf(0.42)).doubleValue(); System.out.println("1.0-0.42=" + (1.0 - 0.42) + " " + subtractValue);//1.0-0.42=0.5800000000000001 0.58 double multiplyValue = BigDecimal.valueOf(4.015).multiply(BigDecimal.valueOf(100)).doubleValue(); System.out.println("4.015*100=" + (4.015 * 100) + " " + multiplyValue);//4.015*100=401.49999999999994 401.5 double divideValue = BigDecimal.valueOf(123.3).divide(BigDecimal.valueOf(100), 10, BigDecimal.ROUND_HALF_UP).doubleValue(); System.out.println("123.3/100=" + (123.3 / 100) + " " + divideValue);//123.3/100=1.2329999999999999 1.233
String pattern = "#,##0.00";//強制保留兩位小數,整數部分每三位以逗號分隔,整數部分至少一位 DecimalFormat format = new DecimalFormat(pattern); format.setRoundingMode(RoundingMode.HALF_UP);//默認不是四捨五入,而是HALF_EVEN System.out.println(format.format(0.05 + 0.01)); //0.06 System.out.println(format.format(1.0 - 0.42)); //0.58 System.out.println(format.format(4.015 * 100)); // 401.50 System.out.println(format.format(123.3 / 100)); //1.23
double d = 0.06;//Java當中默認聲明的小數是double類型的,其默認後綴"d"或"D"能夠省略 float f = 0.06f;//若是要聲明爲float類型,需顯示添加後綴"f"或"F" System.out.println((0.05 + 0.01) + " " + (0.05f + 0.01f));//0.060000000000000005 0.060000002 System.out.println((d == (0.05 + 0.01)) + " " + (f == (0.05f + 0.01f)));//false false System.out.println(d + " " + f + " " + (float) d + " " + (double) f);//0.06 0.06 0.06 0.05999999865889549 System.out.println((d == f) + " " + (d == (double) f) + " " + ((float) d == f));//false false true //雖然向下轉型後能夠保證相等,可是通常不會主動幹丟失精度的事的!