Chapter 2 Data
2.1 Data Retrieval
The data is retrieved from the Federal Reserve website, under the discussion
series: The U.S. Treasury Yield Curve: 1961 to the Present. The link for that
site is: https://www.federalreserve.gov/pubs/feds/2006/200628/200628abs.html.
The specific data set downloaded was the XLS file included on the site. ratekit
provides the
download_rates_xls()
helper function for this.
The data was immediately opened in Excel, and resaved as an xlsx
file.
The format of the raw data is not a true xls
file, rather, it is some flavor of an xml
file. This does not play nicely with R’s packages for importing Excel data, so
a resave was necessary and is done manually.
2.2 Cleaning
Data is brought into R using the readxl
package and the ratekit
helper, read_rates()
.
This function sets any -999.99
values to NA
.
These are often found through the dataset, especially in the parameters columns,
and it is assumed that they represent missing values.
The column names in the data correspond to different types and lengths of rates used in the paper, along with the names of the parameters in the model. The key for understanding the column names is shown in Table 2.1.
Series | Compounding Convention | Key |
---|---|---|
Zero-coupon yield | Continuously Compounded | SVENYXX |
Par yield | Coupon-Equivalent | SVENPYXX |
Instantaneous forward rate | Continuously Compounded | SVENFXX |
One-year forward rate | Coupon-Equivalent | SVEN1FXX |
Parameters | NA | BETA0 to TAU2 |
Most of these columns are not important for this analysis. Only the parameter
columns and the date column are kept. To further examine the missing values,
the skimr
package was used, producing the report shown below. The TAU2
column
has a number of missing values (resulting from either being missing or from
being -999.99
values assumed to be missing). All of them occur before 1980,
and were removed from the data set. After that removal, no missing values remain,
and the values for the other parameters seemed to stabilize as well.
Skim summary statistics
n obs: 14163
n variables: 7
Variable type: Date
variable | missing | complete | n | min | max | median | n_unique |
---|---|---|---|---|---|---|---|
date | 0 | 14163 | 14163 | 1961-06-14 | 2018-03-29 | 1989-11-10 | 14163 |
Variable type: numeric
variable | missing | complete | n | mean | sd | p0 | p25 | p50 | p75 | p100 |
---|---|---|---|---|---|---|---|---|---|---|
BETA0 | 0 | 14163 | 14163 | 5.88 | 4.63 | 0 | 3.03 | 5.01 | 7.92 | 25 |
BETA1 | 0 | 14163 | 14163 | -0.82 | 5.05 | -39.73 | -3.07 | -1.02 | 1.41 | 97.18 |
BETA2 | 0 | 14163 | 14163 | -341.06 | 5684.56 | -340683.77 | -9.02 | -0.99 | 1.93 | 94.87 |
BETA3 | 0 | 14163 | 14163 | 343.72 | 5684.31 | -104.03 | 0 | 3.72 | 18.81 | 340681.5 |
TAU1 | 0 | 14163 | 14163 | 2.39 | 3.37 | 0.1 | 0.63 | 1.47 | 2.65 | 30 |
TAU2 | 4620 | 9543 | 14163 | 8.97 | 7.64 | 0.1 | 3.52 | 8.94 | 13.06 | 180.86 |
2.3 Monthly and Ascending
Monthly data is required for the report, but daily data is provided from the Federal
Reserve data set. The data is converted to monthly (end-of-month) using the tibbletime
package. This leaves 459 rows of data for the project, spanning 1980-01-31 to 2018-03-29.