Moving from Excel to R-What Do I Need To Replicate in R?

This is the second post in a series describing my journey to move my residential appraisal business workflow from Microsoft Excel to R. Last time out, I made the case for why I’m making the change. This post will be a catalog of the ways I use Excel today to serve as a guide for where I need to go.

I use Excel a lot. Each appraisal I work on, I start a separate Excel file. Us appraisers are required to retain for each report a work file that supports our conclusions and allows for someone else auditing us to understand what we did.

Here’s what I do with Excel today:

  • Store my Neighborhood Market Data downloaded from MLS. I grab all sales in a competitive market area for at least five years back, sometimes ten years. This goes into a Market worksheet.
  • Store my Competitive Market Data downloaded from MLS. I grab all potential competitive comparable sales and listings going back at least 12 months but frequently further. This goes into a Comparables worksheet.
  • Create Neighborhood Price Per Square Foot and/or Sale Price trendlines from the Market data. If I have questions about trends, I’ll also look at changes in floor area over time. I create Charts for each data run. I then spend time formatting and labeling so I can include the charts in my reports.
  • Create Pivot Table summaries of the Neighborhood Market Data. My normal summary table includes all sales summarized by month in a neighborhood with homes sold, mean Days on Market, Low Price, High Price, Mean Sale Price, and Mean PSF. I use a template and replace my old market data with the new data, then refresh, so right now this is really fast in Excel. However, I can’t do Median summaries in Pivot Tables easily, an issue that I expect to be able to handle in R. For most of my reports, I include this pivot table summary.
  • I use the pivot table summaries to create a column chart showing 12 month change in mean PSF and/or mean sale price as another tool to understand and report changes in my neighborhoods. This is especially important in seasonal markets like Davis, California, where home selling revolves around the university schedule. I’ll include this chart in every Davis appraisal and in other appraisals where necessary to explain market trends.
  • I occasionally create histograms to show the shape of a market with regard to one variable (sale price and floor area primarily-great for showing where the subject lies in relation to the rest of the market). I’ve seen an example how easy it is to create a histogram in R. I have high expectations that R will be an improvement over Excel for histograms.
  • Create PSF and Sale Price scatter graphs of competitive sales. I use the trendline coefficients to determine daily price adjustments for market change. I’ll also look at floor area over time to see if my comparables are changing over time or not to help understand what my market is really doing. I include the scatter graphs in my reports so clients and intended users can understand the subject’s sub-market.
  • I use pivot tables linked to my comparable sales data for contrasting one variable. For example, I’ll use pivot tables to examine the difference between homes sold with pools and without pools, a significant factor in the Sacramento Valley. I’ll create a table that shows how many comparables sold with pools vs. without pools, the difference in mean sale price, the difference in mean PSF, and to understand my data, mean floor area and mean year built to see if I’m dealing with an apples-to-apples vs. apples-to-oranges comparison. If the homes with pools are relatively similar in size and age as the ones without pools, my adjustment is more likely to be strictly the pool. If homes with pools tend to be bigger than without, I have to consider covariance as part of the explanation for the differences noted. (Covariance is a significant issue in residential real estate markets)
  • I have a Calculators page that I use for random modeling and calculations I need to do by hand. The most significant calculators I have here that I’ll need to move to R are one I use to do the math to figure out the time adjustments for comparables plus others for modeling lot size adjustments. These should be painless to move over to R.

This is the bulk of what I do with Excel today. As I start to shift this workflow over to R, I plan to go into more detail about the special or not so special challenges I encounter. I also have high expectations that R will inspire me to come up with new ways of analyzing and presenting my data.

Reminder for appraiser readers in particular: R is a tool. Excel is a tool. Most of what I plan to discuss in this series is about changing tools. Occasionally, I’ll talk about modeling decisions (like covariance above). However, all of what I’m doing is rooted in the Stats, Graphs, and Data Sciences classes I’ve taken. You need to understand the theory so you can make informed decisions about your modeling choices.

Take classes from George, he’s very willing to help. https://georgedell.com/

Leave a Reply