Getting the correct data is most of the challenge. You’ll need the ‘assessment roll’ and the ‘sales roll’. Every jurisdiction releases this data slightly differently (ranging from three thousand page pdfs to csv files on open data portals) so this guide tries to capture most situations.
In order to conduct a sales ratio study, data is required to have at least three columns: Sale Year, Sale Price, and Assessed Value. We want to compare the sale price to the assessed value at the time of sale. Here is an example:
head(cmfproperty::example_data)
#> PIN SALE_YEAR SALE_PRICE ASSESSED_VALUE
#> 1 17273100931118 2015 53000 33860
#> 2 18013090421010 2018 80000 60390
#> 3 12111190201042 2018 118000 108300
#> 4 13093160601015 2017 125500 87200
#> 5 14322110150000 2018 3705000 3670740
#> 6 27021200080000 2016 345000 267280
Each row in this dataset should match a sale and an assessment from the same year for a single property.
After gathering data in this form, we have to identify the three required columns (sales, assessments, and sale year) for the package using reformat_data
.
library(cmfproperty)
df <- cmfproperty::example_data
ratios <-
cmfproperty::reformat_data(
df,
sale_col = "SALE_PRICE",
assessment_col = "ASSESSED_VALUE",
sale_year_col = "SALE_YEAR",
)
#> [1] "Filtered out non-arm's length transactions"
#> [1] "Inflation adjusted to 2019"
head(as.data.frame(ratios))
#> PIN SALE_YEAR SALE_PRICE ASSESSED_VALUE TAX_YEAR RATIO
#> 1 17273100931118 2015 53000 33860 2015 0.6388679
#> 2 18013090421010 2018 80000 60390 2018 0.7548750
#> 3 12111190201042 2018 118000 108300 2018 0.9177966
#> 4 13093160601015 2017 125500 87200 2017 0.6948207
#> 5 14322110150000 2018 3705000 3670740 2018 0.9907530
#> 6 27021200080000 2016 345000 267280 2016 0.7747246
#> arms_length_transaction SALE_PRICE_ADJ ASSESSED_VALUE_ADJ
#> 1 1 59209.48 37827.04
#> 2 1 82313.03 62136.05
#> 3 1 121411.71 111431.26
#> 4 1 132854.54 92310.09
#> 5 1 3812122.01 3776871.45
#> 6 1 376080.37 291358.73
Our data is now in the appropriate form for further analysis. More information on the output of reformat_data
can be found on the Getting started page.
Let’s look at an example. The Cook County Assessor’s Office, which is the assessor for Cook County, Illinois, has an excellent open data portal which includes all the necessary information and is the source of the example_data
from this package.
Cook County releases both the sales roll and the assessment roll on their open data portal. Sales were found here and assessments were found here. These files can be downloaded manually or via RSocrata
.
Here we download the files manually:
library(data.table)
library(tidyverse)
sales <- fread("~/../Downloads/Cook_County_Assessor_s_Residential_Sales_Data.csv",
colClasses = "character") #from 2013 to 2019
assessments <- fread("~/../Downloads/Cook_County_Assessor_s_Residential_Assessments.csv",
colClasses = "character") #from 2015 to 2019
Then filter the data to select only the columns we need and to remove some sales which are clearly not between two independent parties:
sales <- sales %>% select(PIN, `Sale Year`, `Sale Price`, `Deed No.`) %>%
filter(`Sale Year` > 2014)
assessments <- assessments %>% select(PIN, YEAR, CERTIFIED)
# Filtering data to remove duplicate sales and low value sales
sales <- sales %>% distinct(`Deed No.`, .keep_all = TRUE) %>% select(-`Deed No.`)
sales <- sales %>% filter(as.numeric(`Sale Price`) > 2500)
Finally, we conduct a left join of assessments on sales matching sales to assessment by unique identifier (PIN) and year.
# Join assessments to sales based on PIN (a unique identifier) and Year.
joined <- sales %>% left_join(assessments, by=c("PIN"="PIN", "Sale Year"="YEAR"))
# Remove spaces from column names (not necessary)
joined <- joined %>% rename(SALE_YEAR = `Sale Year`, SALE_PRICE = `Sale Price`, ASSESSED_VALUE = CERTIFIED)
This dataset is then identical to example_data
:
head(cmfproperty::example_data)
#> PIN SALE_YEAR SALE_PRICE ASSESSED_VALUE
#> 1 17273100931118 2015 53000 33860
#> 2 18013090421010 2018 80000 60390
#> 3 12111190201042 2018 118000 108300
#> 4 13093160601015 2017 125500 87200
#> 5 14322110150000 2018 3705000 3670740
#> 6 27021200080000 2016 345000 267280