在讀入excel和csv的數據的時候老是回碰到小數點的問題,不能正確的顯示。早就該棄用 read.csv
這個函數。git
如今來介紹兩個比較好的讀入數據的包,Hadley出品 ——readxl
&readr
函數
測試數據:
測試
函數介紹:this
readxl::read_excel("test.xlsx",col_names = F,col_types = rep("numeric",3))
col_types
一共有四種模式可選: "blank", "numeric", "date" or "text"。blank
就是skip這一列,其餘的三個都很好理解。spa
vignette("column-types") #參考這裏的文檔 readr::read_csv("test.csv",col_names = F,col_types = cols(X1="d",X2=col_skip(),X3="d"))
這裏的col_types 更爲豐富,3d
col_logical()
[l], containing only T
, F
, TRUE
or FALSE
.excel
col_integer()
[i], integers.code
col_double()
[d], doubles.orm
col_character()
[c], everything else.blog
col_date(format = "")
[D]: Y-m-d dates.
col_datetime(format = "")
[T]: ISO8601 date times
col_number()
[n], finds the first number in the field. A number is defined
as a sequence of -, "0-9", decimal_mark
and grouping_mark
. This is useful for currencies and percentages.
decimal_mark
這個是在locale()
裏面設置的,具體見幫助文檔vignette("locales")
.
You can also manually specify other column types:
col_skip()
[ _, -], don't import this column.
col_date(format)
, dates with given format.
col_datetime(format, tz)
, date times with given format. If the timezone is UTC (the default), this is >20x faster than loading then parsing with strptime()
.
col_time(format)
, times. Returned as number of seconds past midnight.
col_factor(levels, ordered)
, parse a fixed set of known values into a factor
例子
read_csv("iris.csv", col_types = cols( Sepal.Length = "d", Sepal.Width = "d", Petal.Length = "d", Petal.Width = "d", Species = col_factor(c("setosa", "versicolor", "virginica")) ))
讀入數據後,咱們每每會碰到這樣的東西
a$X3 [1] 3.000000e-06 1.237595e+06
解決辦法:
formattable::digits(a$X3,7) [1] 0.0000030 1237594.5455460
這個formattable包 還有不少的用途,詳情見:http://renkun.me/formattable/