7 minute read

参考文献:

  1. R Cookbook
  2. Using Dates and Times in R
  3. Date and Time Classes in R
  4. Handling date-times in R

1. Introduction

Some are date-only classes, some are datetime classes. All classes can handle calendar dates (e.g., March 15, 2010), but not all can represent a datetime (e.g., 11:45 AM on March 1, 2010).

The following classes are included in the base distribution of R:

  • Date: date-only.
    • General-purpose class for working with dates
  • POSIXct: datetime.
    • Internally, the datetime is stored as the number of seconds since January 1, 1970, and so is a very compact representation.
    • This class is recommended for storing datetime information (e.g., in data frames).
    • “ct” stands for “calender time”
    • 我觉得 stands for “compact time” 也说得通
  • POSIXlt: datetime.
    • Stored in a 9-element list.
    • Easy to extract date parts, such as the month or hour.
    • Obviously, this representation is much less compact than the POSIXct class; hence it is normally used for intermediate processing and not for storing data.
    • “lt” stands for “local time”
    • “lt” also helps one remember that POXIXlt objects are lists

POSIXlt 举例:

> tm1.lt <- as.POSIXlt(Sys.time())
> tm1.lt
[1] "2015-02-07 23:19:29 CST"
> unlist(tm1.lt) ## unlist: simplifies a list structure to produce a vector which contains all the atomic components
               sec                min               hour               mday 
"29.5144650936127"               "19"               "23"                "7" 
               mon               year               wday               yday 
               "1"              "115"                "6"               "37" 
             isdst               zone             gmtoff 
               "0"              "CST"            "28800" 

> unclass(tm1.lt) ## 效果类似 unlist。这里我们是把 POSIXlt 看做一个 object,然后 unclass 的作用是 returns (a copy of) its fields with its class attribute removed
> ... ## 输出省略

POSIXlt 实际是一个 11-element list,只是前 9 个相对常用一点:

  • sec: 0–61; seconds.
  • min: 0–59; minutes.
  • hour: 0–23; hours.
  • mday: 1–31; day of the month
  • mon: 0–11; months after the first of the year.
  • year: years since 1900.
  • wday: 0–6; day of the week, starting on Sunday.
  • yday: 0–365; day of the year.
  • isdst: Daylight Saving Time flag. Positive if in force, zero if not, negative if unknown.
  • zone: (Optional) The abbreviation for the time zone in force at that time; “” if unknown (but “” might also be used for UTC).
    • UTC: Coordinated Universal Time
    • CST: China Standard Time, i.e. UTC+08:00
  • gmtoff: (Optional) The offset in minutes from GMT; positive values are East of the meridian. Usually NA if unknown, but 0 could mean unknown.

The base distribution also provides functions for easily converting between representations: as.Date, as.POSIXct, and as.POSIXlt.

The following packages are available for downloading from CRAN:

  • chron:
    • No time zones and daylight savings time.
    • It’s therefore easier to use than Date but less powerful than POSIXct and POSIXlt.
    • It would be useful for work in econometrics or time series analysis.
  • lubridate:
    • a wrapper for POSIXct with more intuitive syntax
  • mondate:
    • This is a specialized package for handling dates in units of months in addition to days and years.
    • Such needs arise in accounting and actuarial work, for example, where month-by-month calculations are needed.
  • timeDate:
    • This is a high-powered package with well-thought-out facilities for handling dates and times, including date arithmetic, business days, holidays, conversions, and generalized handling of time zones.
    • It was originally part of the Rmetrics software for financial modeling, where precision in dates and times is critical.

Which class should you select? The article Date and Time Classes in R by Grothendieck and Petzoldt offers this general advice:

When considering which class to use, always choose the least complex class that will support the application. That is, use Date if possible, otherwise use chron and otherwise use the POSIX classes. Such a strategy will greatly reduce the potential for error and increase the reliability of your application.

2. Basic Date Operation

2.1 Getting the Current Date

> Sys.Date()
[1] "2015-02-08"
> class(Sys.Date()) ## Sys.Date() returns a Date object
[1] "Date"

Similarly Sys.time() returns the current date-time:

> Sys.time() ## 注意 Date() 是大写,time() 是小写
[1] "2015-02-08 11:28:02 CST"
> class(Sys.time())
[1] "POSIXct" "POSIXt" 
> unclass(Sys.time()) 
[1] 1423366132 ## 可见本身是个 POSIXct

这里 class(Sys.time()) 返回了两个值要特别说明一下:class(object) returns a character vector giving the names of the classes from which the object inherits.

2.2 Converting a String into a Date

The default format assumed by as.Date is the ISO 8601 standard format of yyyy-mm-dd:

> as.Date("1985-11-30")
[1] "1985-11-30"

Use format="%m/%d/%Y" if the date string is in American style mm/dd/yyyy:

> as.Date("11/30/1985")
Error in charToDate(x) : 
  character string is not in a standard unambiguous format
> as.Date("11/30/1985", format="%m/%d/%Y")
[1] "1985-11-30"

?strftime or ?strptime for details about allowed formats.

如果是 datetime 还要折腾 y, m, d 的话,用 lubridate 会更方便。

2.3 Converting a Date into a String

类似 as.Date("11/30/1985", format="%m/%d/%Y"),我们也可以用 format="%m/%d/%Y" 来指定输出的格式,比如:

> format(Sys.Date())
[1] "2015-02-08"
> format(Sys.Date(), format="%m/%d/%Y")
[1] "02/08/2015"

> as.character(Sys.Date())
[1] "2015-02-08"
> as.character(Sys.Date(), format="%m/%d/%Y")
[1] "02/08/2015"

2.4 Converting Year, Month, and Day into a Date

Use the ISOdate function:

> ISOdate(year, month, day)

The result is a POSIXct object that you can convert into a Date object:

> as.Date(ISOdate(year, month, day))

ISOdate can process entire vectors of years, months, and days, which is quite handy for mass conversion of input data.

> years
[1] 2010 2011 2012 2013 2014
> months
[1] 1 1 1 1 1
> days
[1] 15 21 20 18 17
> ISOdate(years, months, days)
[1] "2010-01-15 12:00:00 GMT" "2011-01-21 12:00:00 GMT"
[3] "2012-01-20 12:00:00 GMT" "2013-01-18 12:00:00 GMT"
[5] "2014-01-17 12:00:00 GMT"

Here vector of months is actually redundant and that the last expression can therefore be further simplified by invoking the Recycling Rule:

> as.Date(ISOdate(years, 1, days))
[1] "2010-01-15" "2011-01-21" "2012-01-20" "2013-01-18" "2014-01-17"

This recipe can also be extended to handle year, month, day, hour, minute, and second data by using the ISOdatetime function:

> ISOdatetime(year, month, day, hour, minute, second)

2.5 Getting the Julian Date

Julian date is the number of days since January 1, 1970.

Either convert the Date object to an integer or use the julian function:

> d = as.Date("1985-11-30")
> as.integer(d)
[1] 5812
> julian(d)
[1] 5812
attr(,"origin")
[1] "1970-01-01"

> as.integer(as.Date("1970-01-01"))
[1] 0
> as.integer(as.Date("1970-01-02"))
[1] 1

2.6 Extracting the Parts of a Date

> d <- as.Date("2010-03-15")
> p <- as.POSIXlt(d)
> p$mday 		# Day of the month
[1] 15
> p$mon 		# Month (0 = January)
[1] 2
> p$year + 1900 # Year
[1] 2010

简写形式:

> as.POSIXlt(d)$mday
[1] 15

2.7 Creating a Sequence of Dates

The seq function is a generic function that has a version for Date objects. It can create a Date sequence similarly to the way it creates a sequence of numbers.

> s <- as.Date("2012-01-01")
> e <- as.Date("2012-02-01")
> seq(from=s, to=e, by=1) # One month of dates
[1] "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06"
[7] "2012-01-07" "2012-01-08" "2012-01-09" "2012-01-10" "2012-01-11" "2012-01-12"
[13] "2012-01-13" "2012-01-14" "2012-01-15" "2012-01-16" "2012-01-17" "2012-01-18"
[19] "2012-01-19" "2012-01-20" "2012-01-21" "2012-01-22" "2012-01-23" "2012-01-24"
[25] "2012-01-25" "2012-01-26" "2012-01-27" "2012-01-28" "2012-01-29" "2012-01-30"
[31] "2012-01-31" "2012-02-01"

Another typical use specifies a starting date (from), increment (by), and number of dates (length.out):

> seq(from=s, by=1, length.out=7) # Dates, one week apart 
[1] "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06" 
[7] "2012-01-07"

The increment (by) is flexible and can be specified in days, weeks, months, or years:

> seq(from=s, by="month", length.out=12) # First of the month for one year
[1] "2012-01-01" "2012-02-01" "2012-03-01" "2012-04-01" "2012-05-01" "2012-06-01"
[7] "2012-07-01" "2012-08-01" "2012-09-01" "2012-10-01" "2012-11-01" "2012-12-01"

> seq(from=s, by="3 months", length.out=4) # Quarterly dates for one year
[1] "2012-01-01" "2012-04-01" "2012-07-01" "2012-10-01"

> seq(from=s, by="year", length.out=10) # Year-start dates for one decade
[1] "2012-01-01" "2013-01-01" "2014-01-01" "2015-01-01" "2016-01-01" "2017-01-01"
[7] "2018-01-01" "2019-01-01" "2020-01-01" "2021-01-01"

Be careful with by="month" near month-end. In this example, the end of February overflows into March, which is probably not what you wanted:

> seq(as.Date("2010-01-29"), by="month", len=3)
[1] "2010-01-29" "2010-03-01" "2010-03-29"

最后这个 len=3 我要说下。这个 len=3length.out=3 的缩写,缩写成 l=3 也可以;而且 seq 的参数基本都是可以缩写的,比如:

> seq(from=1, to=10)
[1]  1  2  3  4  5  6  7  8  9 10
> seq(f=1, t=10)
[1]  1  2  3  4  5  6  7  8  9 10

我最开始以为 seq(f=1, t=10) 是不是被识别成 seq(1, 10) 然后再根据参数的位置被识别成了 seq(from=1, to=10)。试验证明不是的:

> seq(t=10, f=1)
[1]  1  2  3  4  5  6  7  8  9 10
> seq(10, 1)
[1] 10  9  8  7  6  5  4  3  2  1

不知道是不是所有的 R function 都这样;而且万一两个参数缩写相同时该如何判断?有时间再研究。

2.8 Date Addition and Subtraction

> d1 = as.Date("1985-11-30")

> d1 + 10
[1] "1985-12-10"
> d1 - 10
[1] "1985-11-20"

> d2 = as.Date("1995-02-16") ## 松冈茉优

> d2 - d1
Time difference of 3365 days
> d2 + d1
Error in `+.Date`(d2, d1) : binary + is not defined for "Date" objects

> difftime(d2, d1, units="weeks")
Time difference of 480.7143 weeks

> d3 = as.Date("1997-03-15") ## 黑岛结菜

> diff(c(d1,d2,d3))
Time differences in days
[1] 3365  758

注意,这样得到的 difference 并不能直接拿来计算,具体的数值需要 as.numeric 来转一下,比如:

> d2 - d1
Time difference of 3365 days
> as.numeric(d2 - d1, units = "days")
[1] 3365
> as.numeric(d2 - d1, units = "secs")
[1] 290736000

Categories:

Updated:

Comments