Category Archives: Excel to R

Add Histograms to Your Appraisal Tool Kit

It was my pleasure to speak at the Sacramento chapter of the Real Estate Appraisers Association last night at the Story of Value class with my good friend Ryan Lundquist. I discussed ways to explain markets in residential real estate appraisals and focused on using graphs and was surprised to see that maybe half of the crowd didn’t include graphs in their reports. This post is the first to offer advice and instructions on how to create meaningful graphs for residential appraisers.

Gross Living Area Histogram

Four appraisers out of 50 in the room reported using histograms. The histogram is a great tool for analyzing residential real estate markets that all appraisers should use.

What is a histogram?

For our needs, a histogram is a graph that shows the distribution of one continuous variable. The histogram splits the variable into equal-sized bins and counts the number of occurrences. It works well for important residential real estate variables like gross living area, lot size, age, and sale price.

Bin size is key to creating a useful histogram. Bins too wide loses meaning as your data is clumped together. Bins too narrow spreads the histogram out too much.

The graph above shows the sales of homes in a market area with homes of certain sizes. There is one sale less than 1000 sf and one more than 3500 sf. The most frequent size of home sold recently is around 2000 sf with the bulk of the homes in the 1400 sf to 2000 sf range.

Every report prepared for a lender asks the question “Is the subject conforming?” At a glance, any home sold in this neighborhood with between 1300 sf and 2600 sf is reasonably conforming in size. There are no sales in the 4000 sf to 5000 sf range so any homes in the neighborhood of that size are likely non-conforming. The two extreme sales, at 800 sf and 5000 sf, are unusual and likely non-conforming.

Sale Price Histogram

This next histogram examines the frequency of sale price in the market area. The most frequent sale price is in the $240,000 to $280,000 range with $360,000 to $400,000 the second most frequent range. A home in contract at $375,000 is fairly typical. A contract price of $700,000 is very unusual and is indicative of a non-conforming home.

The first two graphs were generated using ggplot2 in RStudio. Here’s an example from Excel 365 showing the year built for homes of sales in a market:

Year Built Histogram

Most homes sold in this market area (Placerville, CA small residential acreage) were built in 1970s and 1980s. A couple of homes were built in the 1800s and there are a couple of newly built homes.

The lender forms used by appraisers ask for similar information in a table format:

Table or Histogram, which is better?

Which describes the market better, the two histograms or the table?

Make A Histogram in Excel

Here’s how to use the latest version of Excel to make a histogram. This page has instructions for the latest version and older versions.

Start with your data in an Excel workbook with the top row field names and rows below sales data.

Starting with the field name, select the data to generate the histogram (ctl + shift + down arrow will select all consecutive data down)

Press F11 to insert a graph. Then chose Change Chart Type

Select Histogram then press Ok

You’ve made your first histogram!

First Histogram

However, it’s really ugly. Standard formatting for histograms is to have no space between the bins (columns). To fix that, double left click on one of the bins to activate the Data Series editor. Select the bars to active Series Option

Change the Gap Width to 0%. Notice how the columns come together. If you like having gaps between the columns, set the Gap Width to 6%.

First Not Ugly Histogram

To change the bin width, double left click on the x-axis labels (GLA in this case). Using the Format Axis Axis options, select the Bin Width control and type in what you want. Play with it until you’re happy with the shape of your histogram.

Default Bin Width

Excel defaulted to a bin width of 370 sf. Below is what the histogram looks like with bin width equal to 100 sf:

100 sf Bin Width

Here’s bin width equal to 500:

500 sf Bin Width

Here’s bin width equal to 200:

200 sf Bin Width

Which one appears most useful to you?

Context

I use histograms to understand some aspect of a market. How big are the homes? When were the homes built? How big are the lots in the neighborhood? What do homes sell for in the market area?

Then consider where the subject fits in the market. Is it bigger than typical? If so, you have support for concluding market value is higher than typical. Is it smaller? Well, now you can show a reason why the price is lower.

The Subject’s Position in a Market Area

Let’s consider the histogram above. The subject is one of the larger homes in the neighborhood but still relatively common in size. I would expect, without knowing anything else, that the subject’s market value is on the higher side for the neighborhood but with a reasonable number of homes larger than the subject. Take a look at the graph below.

The Subject is one of the bigger homes….

What if the subject was one of the largest homes in the neighborhood? The subject’s market value is likely on the upper end of the neighborhood range. Also, there are fewer directly competitive sales, implying market value may be less reliable in this market area than for a smaller home. Now let’s look at an extreme case.

Charge big bucks if you get this subject

I pity the appraiser asked to appraise a 6500 sf house in this market. However, you do have sales either smaller or larger. Here’s the time to really open your eyes to what is a competitive sale. Throw this graph in your report and your client will immediately see your data difficulty.

After you arrive at market value and as part of your reconciliation, consider using a histogram to support your market value.

Support for your market value conclusion

“The subject is newer than typical, above average quality custom home on a larger than typical lot. As shown above, the subject’s market value is on the higher side for the greater market area, as expected based on its superior characteristics.”

I hope you agree that histograms can be a powerful tool for appraisers.

Ways to use histograms:

  • Exploratory analysis to understand characteristics of a market area
  • Assist in determining reasonable search criteria for sales comparison
  • Visual representation of the subject’s position in a market area
  • Support for market value conclusions

I learned about histograms from George Dell. Thanks George. Get smart by taking his Stats, Graphs, and Data Science classes or at the very least, sign up for his blog. More info on George’s website.

Postscript: I am working towards moving away from Excel to using R, the data analysis package. I’ll link to the R code used for the two graphs as a separate post/update soon.

Moving from Excel to R-What Software Do I Need?

Everyone knows what Microsoft Excel is, right? Either you have a copy that came with your PC or you’re on the Office 365 subscription model at $69/year for a personal copy $99/year for 5 users (my subscription of choice). Money flows into Microsoft coffers, satisfying shareholders and most of the greater Seattle area given Microsoft’s reach. Make more Seattleites happy by ordering your copy through Amazon!

R is very different (free). It is open source software available under a public license and is maintained by a group of volunteers (free). Get your free copy here.

R on its own is usable. However, it was designed from the ground up to allow for additions to make it more useful.

RStudio, an open source integrated development environment for R, makes using R much easier for folks like me who are not full-time programmers (also free). RStudio sits on top of R and extends usability significantly. RStudio offers the same basic terminal R does but also gives you additional really useful windows and information. I’ll discuss RStudio in the future but if you can’t wait, here’s a link with more information about RStudio.

Here’s an article: 9 Reasons to use RStudio. Or Top 6 Reasons you need to be using RStudio. Get your free copy of RStudio here.

You can extend the usability of R by adding packages. Packages are bundles of R code with explanation and data examples. Data Camp has beginner’s guide for R Packages here. Managing packages is one of RStudio’s strengths, making it easy to install packages. These are free too.

ggplot is a package for creating graphics and should be the first package you download. Two more packages of interest to appraisers just getting into R are tidyverse, a collection of R packages for data science, and rmarkdown, a package for adding R output to documents. You can learn more about all three here.

To summarize, download for free R, RStudio, and the ggplot2, tidyverse, and rmarkdown packages. I’ll talk more about packages in the future as I explore R’s functionality.

Download Pages and Instructions

 

Moving from Excel to R-What Do I Need To Replicate in R?

This is the second post in a series describing my journey to move my residential appraisal business workflow from Microsoft Excel to R. Last time out, I made the case for why I’m making the change. This post will be a catalog of the ways I use Excel today to serve as a guide for where I need to go.

I use Excel a lot. Each appraisal I work on, I start a separate Excel file. Us appraisers are required to retain for each report a work file that supports our conclusions and allows for someone else auditing us to understand what we did.

Here’s what I do with Excel today:

  • Store my Neighborhood Market Data downloaded from MLS. I grab all sales in a competitive market area for at least five years back, sometimes ten years. This goes into a Market worksheet.
  • Store my Competitive Market Data downloaded from MLS. I grab all potential competitive comparable sales and listings going back at least 12 months but frequently further. This goes into a Comparables worksheet.
  • Create Neighborhood Price Per Square Foot and/or Sale Price trendlines from the Market data. If I have questions about trends, I’ll also look at changes in floor area over time. I create Charts for each data run. I then spend time formatting and labeling so I can include the charts in my reports.
  • Create Pivot Table summaries of the Neighborhood Market Data. My normal summary table includes all sales summarized by month in a neighborhood with homes sold, mean Days on Market, Low Price, High Price, Mean Sale Price, and Mean PSF. I use a template and replace my old market data with the new data, then refresh, so right now this is really fast in Excel. However, I can’t do Median summaries in Pivot Tables easily, an issue that I expect to be able to handle in R. For most of my reports, I include this pivot table summary.
  • I use the pivot table summaries to create a column chart showing 12 month change in mean PSF and/or mean sale price as another tool to understand and report changes in my neighborhoods. This is especially important in seasonal markets like Davis, California, where home selling revolves around the university schedule. I’ll include this chart in every Davis appraisal and in other appraisals where necessary to explain market trends.
  • I occasionally create histograms to show the shape of a market with regard to one variable (sale price and floor area primarily-great for showing where the subject lies in relation to the rest of the market). I’ve seen an example how easy it is to create a histogram in R. I have high expectations that R will be an improvement over Excel for histograms.
  • Create PSF and Sale Price scatter graphs of competitive sales. I use the trendline coefficients to determine daily price adjustments for market change. I’ll also look at floor area over time to see if my comparables are changing over time or not to help understand what my market is really doing. I include the scatter graphs in my reports so clients and intended users can understand the subject’s sub-market.
  • I use pivot tables linked to my comparable sales data for contrasting one variable. For example, I’ll use pivot tables to examine the difference between homes sold with pools and without pools, a significant factor in the Sacramento Valley. I’ll create a table that shows how many comparables sold with pools vs. without pools, the difference in mean sale price, the difference in mean PSF, and to understand my data, mean floor area and mean year built to see if I’m dealing with an apples-to-apples vs. apples-to-oranges comparison. If the homes with pools are relatively similar in size and age as the ones without pools, my adjustment is more likely to be strictly the pool. If homes with pools tend to be bigger than without, I have to consider covariance as part of the explanation for the differences noted. (Covariance is a significant issue in residential real estate markets)
  • I have a Calculators page that I use for random modeling and calculations I need to do by hand. The most significant calculators I have here that I’ll need to move to R are one I use to do the math to figure out the time adjustments for comparables plus others for modeling lot size adjustments. These should be painless to move over to R.

This is the bulk of what I do with Excel today. As I start to shift this workflow over to R, I plan to go into more detail about the special or not so special challenges I encounter. I also have high expectations that R will inspire me to come up with new ways of analyzing and presenting my data.

Reminder for appraiser readers in particular: R is a tool. Excel is a tool. Most of what I plan to discuss in this series is about changing tools. Occasionally, I’ll talk about modeling decisions (like covariance above). However, all of what I’m doing is rooted in the Stats, Graphs, and Data Sciences classes I’ve taken. You need to understand the theory so you can make informed decisions about your modeling choices.

Take classes from George, he’s very willing to help. https://georgedell.com/