Chapter 2 Data

2.1 Data Retrieval

The data is retrieved from the Federal Reserve website, under the discussion series: The U.S. Treasury Yield Curve: 1961 to the Present. The link for that site is: https://www.federalreserve.gov/pubs/feds/2006/200628/200628abs.html. The specific data set downloaded was the XLS file included on the site. ratekit provides the download_rates_xls() helper function for this.

The data was immediately opened in Excel, and resaved as an xlsx file. The format of the raw data is not a true xls file, rather, it is some flavor of an xml file. This does not play nicely with R’s packages for importing Excel data, so a resave was necessary and is done manually.

2.2 Cleaning

Data is brought into R using the readxl package and the ratekit helper, read_rates(). This function sets any -999.99 values to NA. These are often found through the dataset, especially in the parameters columns, and it is assumed that they represent missing values.

The column names in the data correspond to different types and lengths of rates used in the paper, along with the names of the parameters in the model. The key for understanding the column names is shown in Table 2.1.

Table 2.1: Rates data: Column key
Series Compounding Convention Key
Zero-coupon yield Continuously Compounded SVENYXX
Par yield Coupon-Equivalent SVENPYXX
Instantaneous forward rate Continuously Compounded SVENFXX
One-year forward rate Coupon-Equivalent SVEN1FXX
Parameters NA BETA0 to TAU2

Most of these columns are not important for this analysis. Only the parameter columns and the date column are kept. To further examine the missing values, the skimr package was used, producing the report shown below. The TAU2 column has a number of missing values (resulting from either being missing or from being -999.99 values assumed to be missing). All of them occur before 1980, and were removed from the data set. After that removal, no missing values remain, and the values for the other parameters seemed to stabilize as well.

Skim summary statistics
n obs: 14163
n variables: 7

Variable type: Date

variable missing complete n min max median n_unique
date 0 14163 14163 1961-06-14 2018-03-29 1989-11-10 14163

Variable type: numeric

variable missing complete n mean sd p0 p25 p50 p75 p100
BETA0 0 14163 14163 5.88 4.63 0 3.03 5.01 7.92 25
BETA1 0 14163 14163 -0.82 5.05 -39.73 -3.07 -1.02 1.41 97.18
BETA2 0 14163 14163 -341.06 5684.56 -340683.77 -9.02 -0.99 1.93 94.87
BETA3 0 14163 14163 343.72 5684.31 -104.03 0 3.72 18.81 340681.5
TAU1 0 14163 14163 2.39 3.37 0.1 0.63 1.47 2.65 30
TAU2 4620 9543 14163 8.97 7.64 0.1 3.52 8.94 13.06 180.86

2.3 Monthly and Ascending

Monthly data is required for the report, but daily data is provided from the Federal Reserve data set. The data is converted to monthly (end-of-month) using the tibbletime package. This leaves 459 rows of data for the project, spanning 1980-01-31 to 2018-03-29.