Category Archives: Excel to R

Moving from Excel to R-What Software Do I Need?

Everyone knows what Microsoft Excel is, right? Either you have a copy that came with your PC or you’re on the Office 365 subscription model at $69/year for a personal copy $99/year for 5 users (my subscription of choice). Money flows into Microsoft coffers, satisfying shareholders and most of the greater Seattle area given Microsoft’s reach. Make more Seattleites happy by ordering your copy through Amazon!

R is very different (free). It is open source software available under a public license and is maintained by a group of volunteers (free). Get your free copy here.

R on its own is usable. However, it was designed from the ground up to allow for additions to make it more useful.

RStudio, an open source integrated development environment for R, makes using R much easier for folks like me who are not full-time programmers (also free). RStudio sits on top of R and extends usability significantly. RStudio offers the same basic terminal R does but also gives you additional really useful windows and information. I’ll discuss RStudio in the future but if you can’t wait, here’s a link with more information about RStudio.

Here’s an article: 9 Reasons to use RStudio. Or Top 6 Reasons you need to be using RStudio. Get your free copy of RStudio here.

You can extend the usability of R by adding packages. Packages are bundles of R code with explanation and data examples. Data Camp has beginner’s guide for R Packages here. Managing packages is one of RStudio’s strengths, making it easy to install packages. These are free too.

ggplot is a package for creating graphics and should be the first package you download. Two more packages of interest to appraisers just getting into R are tidyverse, a collection of R packages for data science, and rmarkdown, a package for adding R output to documents. You can learn more about all three here.

To summarize, download for free R, RStudio, and the ggplot2, tidyverse, and rmarkdown packages. I’ll talk more about packages in the future as I explore R’s functionality.

Download Pages and Instructions

 

Moving from Excel to R-What Do I Need To Replicate in R?

This is the second post in a series describing my journey to move my residential appraisal business workflow from Microsoft Excel to R. Last time out, I made the case for why I’m making the change. This post will be a catalog of the ways I use Excel today to serve as a guide for where I need to go.

I use Excel a lot. Each appraisal I work on, I start a separate Excel file. Us appraisers are required to retain for each report a work file that supports our conclusions and allows for someone else auditing us to understand what we did.

Here’s what I do with Excel today:

  • Store my Neighborhood Market Data downloaded from MLS. I grab all sales in a competitive market area for at least five years back, sometimes ten years. This goes into a Market worksheet.
  • Store my Competitive Market Data downloaded from MLS. I grab all potential competitive comparable sales and listings going back at least 12 months but frequently further. This goes into a Comparables worksheet.
  • Create Neighborhood Price Per Square Foot and/or Sale Price trendlines from the Market data. If I have questions about trends, I’ll also look at changes in floor area over time. I create Charts for each data run. I then spend time formatting and labeling so I can include the charts in my reports.
  • Create Pivot Table summaries of the Neighborhood Market Data. My normal summary table includes all sales summarized by month in a neighborhood with homes sold, mean Days on Market, Low Price, High Price, Mean Sale Price, and Mean PSF. I use a template and replace my old market data with the new data, then refresh, so right now this is really fast in Excel. However, I can’t do Median summaries in Pivot Tables easily, an issue that I expect to be able to handle in R. For most of my reports, I include this pivot table summary.
  • I use the pivot table summaries to create a column chart showing 12 month change in mean PSF and/or mean sale price as another tool to understand and report changes in my neighborhoods. This is especially important in seasonal markets like Davis, California, where home selling revolves around the university schedule. I’ll include this chart in every Davis appraisal and in other appraisals where necessary to explain market trends.
  • I occasionally create histograms to show the shape of a market with regard to one variable (sale price and floor area primarily-great for showing where the subject lies in relation to the rest of the market). I’ve seen an example how easy it is to create a histogram in R. I have high expectations that R will be an improvement over Excel for histograms.
  • Create PSF and Sale Price scatter graphs of competitive sales. I use the trendline coefficients to determine daily price adjustments for market change. I’ll also look at floor area over time to see if my comparables are changing over time or not to help understand what my market is really doing. I include the scatter graphs in my reports so clients and intended users can understand the subject’s sub-market.
  • I use pivot tables linked to my comparable sales data for contrasting one variable. For example, I’ll use pivot tables to examine the difference between homes sold with pools and without pools, a significant factor in the Sacramento Valley. I’ll create a table that shows how many comparables sold with pools vs. without pools, the difference in mean sale price, the difference in mean PSF, and to understand my data, mean floor area and mean year built to see if I’m dealing with an apples-to-apples vs. apples-to-oranges comparison. If the homes with pools are relatively similar in size and age as the ones without pools, my adjustment is more likely to be strictly the pool. If homes with pools tend to be bigger than without, I have to consider covariance as part of the explanation for the differences noted. (Covariance is a significant issue in residential real estate markets)
  • I have a Calculators page that I use for random modeling and calculations I need to do by hand. The most significant calculators I have here that I’ll need to move to R are one I use to do the math to figure out the time adjustments for comparables plus others for modeling lot size adjustments. These should be painless to move over to R.

This is the bulk of what I do with Excel today. As I start to shift this workflow over to R, I plan to go into more detail about the special or not so special challenges I encounter. I also have high expectations that R will inspire me to come up with new ways of analyzing and presenting my data.

Reminder for appraiser readers in particular: R is a tool. Excel is a tool. Most of what I plan to discuss in this series is about changing tools. Occasionally, I’ll talk about modeling decisions (like covariance above). However, all of what I’m doing is rooted in the Stats, Graphs, and Data Sciences classes I’ve taken. You need to understand the theory so you can make informed decisions about your modeling choices.

Take classes from George, he’s very willing to help. https://georgedell.com/