Using the R Programming Language to Produce Correlation Matrices & Correlograms for Residential Appraisal Reports

As I mention in the bio for this blog, one of the most influential individuals in my professional development as an appraiser is George Dell. He is a nationally recognized valuation expert who teaches a method he calls “Evidence Based Valuation©.”

(If you’ve never taken a class from George, stop reading this blog post, go to his website, register for one of his classes—oh, it’s on the other side of the country? So what? Register!—and then resume reading this post. You’re back? Great.)

George’s classes emphasizes reproducible appraisal findings and deemphasizes subjectivity and guessing. Reports that follow George’s tenets are clearer, more logical, and more convincing to the end user. I can personally attest to a substantial increase in work quality.

George is on a bit of a crusade to get appraisers to use more sophisticated analytical tools. One such tool is the programming language R.

R can be a bit intimidating for appraisers—even for those who possess advanced Excel skills. However, one does not have to be a seasoned computer programmer to immediately start using it to produce charts and graphs that can aid in analysis and improve a report’s quality.

Let me walk you through the steps needed to produce this chart, known as a correlogram:

Rplot

 

Per the STHDA website, a website that shares information on statistical tools, a correlation test is “used to evaluate the association between two or more variables.” Typically, a number is assigned that can range from -1 to +1. If two variables, say “Close_Price” and “Total_SF” have a correlation of +0.74, that means there is a strong positive association between the two. (In reports, this is often expressed as a percentage.) If the number were negative, it would mean as one variable goes up the other goes in the opposite direction. A good example of this in my market is the relationship between “Year_Built” and “Acres.” Newer homes tend to be on smaller lots as often a larger lot is purchased and partitioned by a developer.

A correlation matrix, per STHDA, “is used to investigate the dependence between multiple variables at the same time. The result is a table containing the correlation coefficients between each variable and the others.”

A correlogram is a visualization of the correlation statistics.

Once you get the hang of the process, you can start producing correlograms in under 5 minutes using R. It allows for fantastic support in an appraisal report.

Correlation matrices do, however, have their limitations and need to be used carefully. Again, I strongly recommend taking George Dell’s “Stats, Graphs & Data Science¹” class to obtain a better understanding.

Without further ado, let’s begin:

Step 1:

Install the R programming language: https://cran.r-project.org/

R runs on Windows, Mac, and Linux.

Step 2:

Download R Studio (a graphical overlay to R) from this site.  R Studio is a free program. Make sure you choose the desktop option. (If the site is asking you to pay $30,000 a year, you accidentally clicked on the commercial license product . 😊)

Step 3:

Install the “PerformanceAnalytics” package.

Click the “Packages” tab in the bottom-right pane. Click “Install.” Type in the name of the package (it will autocomplete based on what is available on CRAN).

2

Note: You can customize the look of R Studio, I’ve chosen a darker theme as it is easier on the eyes, so don’t be concerned if my screenshots don’t exactly match what you see.

Step 4:

Open the R script file that contains the code.

You have a number of options here. To save yourself time, you can simply click this link to download a prepared R script file I am hosting on Dropbox. On Dropbox, click the download button on the upper right-hand corner and you will get a file that looks something like this, depending on your folder view settings:

2018-12-07_13-30-07

When you double-click on the R script file R Studio will automatically open if it is closed or, if already open, simply add the code to the top-left pane. The code is just three lines long and will look something like this on your display:

zzzzz

The first 5 lines in blue are just comments I’ve added and are not code. By putting a “#” sign in front of comments, I am telling the program not to treat the text as code.

If you prefer not to download the prepared file, simply copy the text from the Dropbox display and paste it into a new R script.

Step 5:

Import the data from an Excel file. Click on the “Import Dataset” tab in the top right-hand pane, and click “From Excel…”:

a

If you try to import a CSV file you’ll get this error:

CSV

Future versions of R Studio may be able to handle CSV files directly, but for now, just make sure the file is an actual Excel file.

You’ll get a preview of your data that looks like this:

b

For appraisal work, you’ll want to start off working with variables that can be put in the “double” type format (a computer variable type that permits greater numerical precision). The type of variables we will want in our correlation matrix are, in brief, ones that are interval or ratio variables like “Close_Price” where it makes sense to say that a home is twice as expensive as another. (Working with a class of variables known as “categorical variables” is beyond the scope of this blog post—a Q3 home is not twice as nice as a Q6 home.)

I’ve made some custom Excel templates that allows me to take the data exported from my local MLS and it put it into the format I want. I then save just that tab’s information to a separate file labeled “Correlation Data.”

Pro Tips: a) For column headers, use the underscore to separate words rather than a blank space. So, “Total_SF” rather than “Total SF” is safer. b) Don’t use the “#” sign for any headers as that can mess up some R package’s ability to interpret. So rather than “#_Acres” put “No_Acres” or just “Acres.” c) Make sure “Close_Date” is in Excel “serial date” format.

Once, you’ve looked the data over, simply click the “Import” button at the bottom right to bring it into R.

Step 5:

Highlight the three lines of code and click the “Run” button:

c

Well, that part was easy!

(If you highlighted the comments above the code, it will have no effect on the outcome.)

Your graph will appear in the “Plots” tab at the bottom-right pane. Simpy click the “Export” option and you can save as an image or PDF and stick it in your report:

d

Once your templates are all set up, you can produce this chart in just minutes.

Correlograms can be very powerful. Reading from the top row, “Close_Price,” and reading across allows you to see how all the other variables correlate with “Close_Price.” If, “Acres” and “Close_Price” show virtually no correlation, you can point to the correlogram as proof that no lot size adjustment is needed.

There are many different types of correlograms that you can do with R. In a future post, I’ll review a number of them!

Let me know your thoughts in the comments section. If you have a question, don’t hesitate to ask!

And, finally, take George Dell’s class!

 

 

How Cars Can Help Explain Fannie Mae’s UAD Quality and Condition Codes

Explanation of Fannie Mae Quality & Condition codes using cars as an analogy.

Real estate agents and homeowners who read an appraisal report prepared on a “1004 UAD” Fannie Mae form will typically look at what is known as the “adjustment grid” and see that the subject property and the comparables used are categorized under quality and condition codes. The codes use a 1-6 scale, where 1 is the best rating and 6 the worst. So, for example, a C1 home would be a home in new condition. 

12

The 1004 Fannie Mae form is used to present appraisal reports for most lending transactions. Millions of these documents have been generated over the years. The UAD version of the 1004 became mandatory for most lending reports back in September of 2011. While Fannie Mae has provided an addendum that defines the codes, a lot of confusion regarding the interpretation of the codes stills exists. Therefore, I thought a good way to explain the meaning of the codes would be to use cars. There are ~260 million automobiles in the United States and everyone is familiar with them. So, let’s use cars to illustrate the UAD quality and condition codes. 

A little housekeeping: all analogies gloss over some differences while highlighting areas of similarity. The explanations below are just rules-of-thumb that can help someone quickly parse the various codes. Some explanations may have technical exceptions.

With that out of the way, let’s start with:

Condition:

car

C1: new car.

C2: new car driven off lot or virtually new car (or car restored to almost like new condition).

C3: car a few years old but well maintained (good resale or trade-in value).

C4: car with some miles on it. Some minor dents, some scratches, tire tread is a bit worn; some components may soon need to be replaced. Been around the block, but it works. If you bought it you likely wouldn’t spend a fortune fixing it.

C5: car with some serious issues. Transmission may need to be replaced. Alternator is marginal. Doesn’t always start. Bald tires. Interior is not pleasantly aromatic. Bumper or fender may be damaged, etc. You buy this car and it will likely go straight to the shop and cost a bit to repair. A junkyard would pay you parts and scrap value.

C6: junker. Car has so many problems a mechanic tells you to call a junkyard and put it in its final resting place. The junkyard charges you to take it. This represents a car that probably won’t start at all, would be dangerous to operate if it did start, and whose repair costs may equal or far exceed the actual value of the vehicle. Few people would opt to repair.

Quality:

car-40241_960_720

Q1: extreme high-end luxury car. We’re talking vehicles like a Ferrari LaFerrari, Lamborghini Veneno, or a Mercedes-Benz Maybach Exelero. Highly customized and all aspects of ownership are expensive. If you have to ask “how much?” move on down the scale. These cars are not for you. 😊

Q2: the more “affordable” luxury market. You’ll be in the 6-digits for cars in this class. You’ll see Lamborghinis, Ferraris, Bentleys, and Porsches.

Q3: Mid-higher-end mass market. You got your Lexi, BMWs, Land Rovers, Cadillacs, Audis, and some Teslas squeezing in here. If you drive these cars, assumptions will be made you are in a good socio-economic bracket!

Q4: mid-range mass market. Price is definitely a major factor but not the only concern. Vehicles of this type may have some upgrades. Your Camrys, Accords, Tauruses, and Hyundai Sonatas all live here.

Q5: low-mid range mass market. Price is the overwhelming major concern. Practicality for personal needs a close second. Some of the base models of cars in the Q4 range will show up here with all stock components. You will likely not be bragging about or street racing anything in this bracket.

Q6: very low-end of the car market. The Ford Pinto, were it still around (and not exploding), would feel at home here. A person buys this car for one reason and one reason only: to get from point A to point B. Cars in this range sell new for ~14k. Say hello to the lime-green Ford Fiesta or the Kia Rio.

So, there you have it, a quick-and-dirty interpretation of Fannie Mae’s Q & C codes using automobiles. 

Do you have any other analogies you use or emendations to the list above you’d like to suggest? Please share your thoughts in the comments section!