R讀入多位小數點問題

時間 2019-11-06

標籤讀入多位小數點問題简体版

原文原文鏈接

在讀入excel和csv的數據的時候老是回碰到小數點的問題，不能正確的顯示。早就該棄用 read.csv這個函數。git

如今來介紹兩個比較好的讀入數據的包，Hadley出品 ——readxl&readr函數

測試數據：
測試

函數介紹：this

readxl::read_excel("test.xlsx",col_names = F,col_types = rep("numeric",3))

col_types一共有四種模式可選： "blank", "numeric", "date" or "text"。blank就是skip這一列，其餘的三個都很好理解。spa

vignette("column-types") #參考這裏的文檔
readr::read_csv("test.csv",col_names = F,col_types = cols(X1="d",X2=col_skip(),X3="d"))

這裏的col_types 更爲豐富，3d

col_logical() [l], containing only T, F, TRUE or FALSE.excel
col_integer() [i], integers.code
col_double() [d], doubles.orm
col_character() [c], everything else.blog
col_date(format = "") [D]: Y-m-d dates.
col_datetime(format = "") [T]: ISO8601 date times
col_number() [n], finds the first number in the field. A number is defined
as a sequence of -, "0-9", decimal_mark and grouping_mark. This is useful for currencies and percentages.

decimal_mark這個是在locale()裏面設置的，具體見幫助文檔vignette("locales").

You can also manually specify other column types:

col_skip() [ _, -], don't import this column.
col_date(format), dates with given format.
col_datetime(format, tz), date times with given format. If the timezone is UTC (the default), this is >20x faster than loading then parsing with strptime().
col_time(format), times. Returned as number of seconds past midnight.
col_factor(levels, ordered), parse a fixed set of known values into a factor

例子

read_csv("iris.csv", col_types = cols(
  Sepal.Length = "d",
  Sepal.Width = "d",
  Petal.Length = "d",
  Petal.Width = "d",
  Species = col_factor(c("setosa", "versicolor", "virginica"))
))

讀入數據後，咱們每每會碰到這樣的東西

a$X3
[1] 3.000000e-06 1.237595e+06

解決辦法:

formattable::digits(a$X3,7)
[1] 0.0000030       1237594.5455460

這個formattable包 還有不少的用途，詳情見：http://renkun.me/formattable/

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。