**UPDATE 2016.02.05:** Added ggplot2 code.

**UPDATE 2016.10.26:** Added code to colour the line in ggplot2.

**NOTE:** I’m having some issues with WordPress randomly deleting or changing some of the code text now and then. I don’t know why this happens, but I’ve pasted the entire code into a comment at the bottom as a backup. Also, if you’re having issues with copy/paste of the code on this page, you can now download the R source file here.

The area under the curve (AUC) of the receiver operating characteristic (ROC) is often used (for better or worse) as a validation statistic for species distribution models. In short, it compares predicted values to true values of binary classification (e.g. model predictions to observed presence-absence values for a species presence). It is a threshold-independent measure, so one needs not define a cutoff predicted value at which to define a predicted “presence” or “absence”. The ROC is created by plotting false presences against true presences for a continuum of threshold values (conceptually an infinite number of values, though this is obviously not necessary to calculate the AUC). There’s a decent Cross-Validated thread on ROC/AUC here.

Assumptions and issues with this approach aside, the ROCR package for R provides some nice commands to process this. But it took some time to figure out an efficient way to extract all the statistics that I wanted from the ROCR commands. With the help of some of Laura Gray‘s original code, I put together a (not so) quick function that produces some basic ROC/AUC statistics, as well as a quick AUC plot.

# Build some example data # Observed (truth) data as presence-absence (1-0) observations <- predicted data as values ranging from 0 to 1 # Predicted data as values ranging from 0 to 1 predictions <- abs(c(rnorm(250,mean=0),rnorm(250,mean=1))) predictions <- round((predictions-min(predictions))/max(predictions),3) # Install the ROCR package install.packages('ROCR') library('ROCR') # AUC function fun.auc <- function(pred,obs){ # Run the ROCR functions for AUC calculation ROC_perf <- performance(prediction(pred,obs),"tpr","fpr") ROC_sens <- performance(prediction(pred,obs),"sens","spec") ROC_err <- performance(prediction(pred, labels=obs),"err") ROC_auc <- performance(prediction(pred,obs),"auc") # AUC value AUC <- ROC_auc@y.values[[1]] # AUC # Mean sensitivity across all cutoffs x.Sens <- mean(as.data.frame(ROC_sens@y.values)[,1]) # Mean specificity across all cutoffs x.Spec <- mean(as.data.frame(ROC_sens@x.values)[,1]) # Sens-Spec table to estimate threshold cutoffs SS <- data.frame(SENS=as.data.frame(ROC_sens@y.values)[,1],SPEC=as.data.frame(ROC_sens@x.values)[,1]) # Threshold cutoff with min difference between Sens and Spec SS_min_dif <- ROC_perf@alpha.values[[1]][which.min(abs(SS$SENS-SS$SPEC))] # Threshold cutoff with max sum of Sens and Spec SS_max_sum <- ROC_perf@alpha.values[[1]][which.max(rowSums(SS[c("SENS","SPEC")]))] # Min error rate Min_Err <- min(ROC_err@y.values[[1]]) # Threshold cutoff resulting in min error rate Min_Err_Cut <- ROC_err@x.values[[1]][which(ROC_err@y.values[[1]]==Min_Err)][1] # Kick out the values round(cbind(AUC,x.Sens,x.Spec,SS_min_dif,SS_max_sum,Min_Err,Min_Err_Cut),3) } # Run the function with the example data fun.auc(predictions, observations) AUC x.Sens x.Spec SS_min_dif SS_max_sum Min_Err Min_Err_Cut [1,] 0.637 0.518 0.622 0.232 0.32 0.382 0.32

Now to make the AUC plot. Note that this function also runs the AUC calculations, so the above function (auc.fun) need not be run prior (in case you just want the plot and not the numbers).

# AUC plot function fun.aucplot # Run the AUC calculations ROC_perf # Spawn a new plot window (Windows OS) graphics.off(); x11(h=6,w=6) # Plot the curve plot(ROC_perf,colorize=T,print.cutoffs.at=seq(0,1,by=0.1),lwd=3,las=1,main=title) # Add some statistics to the plot text(1,0.25,labels=paste("Npres = ",sum(obs==1),sep=""),adj=1) text(1,0.20,labels=paste("Nabs = ",sum(obs==0),sep=""),adj=1) text(1,0.15,labels=paste("AUC = ",round(ROC_auc@y.values[[1]],digits=2),sep=""),adj=1) text(1,0.10,labels=paste("Sens = ",round(mean(as.data.frame(ROC_sens@y.values)[,1]),digits=2),sep=""),adj=1) text(1,0.05,labels=paste("Spec = ",round(mean(as.data.frame(ROC_sens@x.values)[,1]),digits=2),sep=""),adj=1) } # Run the function fun.aucplot(predictions, observations, "My AUC Plot")

Et viola! A lovely and colourful plot of a likely-deficient validation statistic that no doubt will be taken as gospel when you present your models.

At some point, I would like to dissect the plot function a bit to rebuild it in ggplot and make it, well, better. But that’s a task for another day and another pot of coffee. (But please let me know if you have already done this.)

Notes:

- The Joy of Data has a nice post on AUC plotting here (reposted by R-bloggers here, if you prefer their layout).

**UPDATE (Feb. 5, 2016):**

So I recently updated the plotting function to use ggplot, because all the cool kids are doing that these days.

# Install the ggplot2 package install.packages('ggplot2') library('ggplot2') # New plot window (Windows OS) graphics.off(); x11(h=6,w=6) # AUC ggplot function fun.auc.ggplot <- function(pred, obs, title){ # pred = predicted values # obs = observed values (truth) # title = plot title # Run the AUC calculations ROC_perf <- performance(prediction(pred,obs),"tpr","fpr") ROC_sens <- performance(prediction(pred,obs),"sens","spec") ROC_auc <- performance(prediction(pred,obs),"auc") # Make plot data plotdat <- data.frame(FP=ROC_perf@x.values[[1]],TP=ROC_perf@y.values[[1]],CUT=ROC_perf@alpha.values[[1]],POINT=NA) plotdat[unlist(lapply(seq(0,1,0.1),function(x){which.min(abs(plotdat$CUT-x))})),"POINT"] <- seq(0,1,0.1) # Plot the curve ggplot(plotdat, aes(x=FP,y=TP)) + geom_abline(intercept=0,slope=1) + geom_line(lwd=1) + geom_point(data=plotdat[!is.na(plotdat$POINT),], aes(x=FP,y=TP,fill=POINT), pch=21, size=3) + geom_text(data=plotdat[!is.na(plotdat$POINT),], aes(x=FP,y=TP,fill=POINT), label=seq(1,0,-0.1), hjust=1, vjust=0) + scale_fill_gradientn("Threhsold Cutoff",colours=rainbow(14)[1:11]) + scale_x_continuous("False Positive Rate", limits=c(0,1)) + scale_y_continuous("True Positive Rate", limits=c(0,1)) + geom_polygon(aes(x=X,y=Y), data=data.frame(X=c(0.7,1,1,0.7),Y=c(0,0,0.29,0.29)), fill="white") + annotate("text",x=0.97,y=0.25,label=paste("Npres = ",sum(obs==1),sep=""),hjust=1) + annotate("text",x=0.97,y=0.20,label=paste("Nabs = ",sum(obs==0),sep=""),hjust=1) + annotate("text",x=0.97,y=0.15,label=paste("AUC = ",round(ROC_auc@y.values[[1]],digits=2),sep=""),hjust=1) + annotate("text",x=0.97,y=0.10,label=paste("Sens = ",round(mean(as.data.frame(ROC_sens@y.values)[,1]),digits=2),sep=""),hjust=1) + annotate("text",x=0.97,y=0.05,label=paste("Spec = ",round(mean(as.data.frame(ROC_sens@x.values)[,1]),digits=2),sep=""),hjust=1) + theme(legend.position="none", plot.title=element_text(vjust=2)) + ggtitle(title) } # Run the function fun.auc.ggplot(predictions, observations, "My AUC Plot")

It will give you something like this. You can clean it up as you like.

**UPDATE (Oct. 26, 2016):**

To colour the line in ggplot (akin to the ROCR output), you can simply add a colour call to the aesthetics of the first ggplot line, such as to colour the plot by True Positives:

`ggplot(plotdat, aes(x=FP,y=TP,col=TP)) + `

You can then define the colour gradient as you like:

`scale_colour_gradientn("",colours=rainbow(14)[1:11]) +`

Note that you also have to add the inherits.aes=F option to the geom_polygon() line, otherwise it also searches for a TP column (or whatever you colour by) in the data frame. Also, you may want to add col=”black” to your geom_point() and geom_text() lines, otherwise they’ll also be in the same gradient colours:

```
geom_point(data=plotdat[!is.na(plotdat$POINT),], aes(x=FP,y=TP,fill=POINT), pch=21, size=3, col="black") +
geom_text(data=plotdat[!is.na(plotdat$POINT),], aes(x=FP,y=TP,fill=POINT), label=seq(1,0,-0.1), hjust=1, vjust=0, col="black") +
geom_polygon(aes(x=X,y=Y), inherit.aes=F, data=data.frame(X=c(0.7,1,1,0.7),Y=c(0,0,0.29,0.29)), fill="white") +
```

The ROCR package colours by True Presence rate (TP), but you can use whichever variable you consider more important, I suppose.

Hi, looks like you’ve got some coding problems in the

plotdat[unlist(lapply(seq(0,1,0.1),function(x){which.min(abs(plotdat$CUT-x))})),”POINT seq(0,1,0.1)

line

LikeLike

Thanks Sam, but I couldn’t find an error here (works when I try it, even line by line). Are you using your own data? If you can’t solve it, let me know what the error message says.

LikeLike

Hello! I have tried your updated function and I get some errors. I thought I would let you know. Adelaide

> # AUC ggplot function

> fun.auc.ggplot <- function(pred, obs, title){

+ # pred = predicted values

+ # obs = observed values (truth)

+ # title = plot title

+

+ # Run the AUC calculations

+ ROC_perf <- performance(prediction(pred,obs),"tpr","fpr")

+ ROC_sens <- performance(prediction(pred,obs),"sens","spec")

+ ROC_auc <- performance(prediction(pred,obs),"auc")

+

+ # Spawn a new plot window (Windows OS)

+ graphics.off(); x11(h=6,w=6)

+

+ # Make plot data

+ plotdat scale_x_continuous(“False Positive Rate”, limits=c(0,1)) +

+ scale_y_continuous(“True Positive Rate”, limits=c(0,1)) +

+ geom_polygon(aes(x=X,y=Y), data=data.frame(X=c(0.7,1,1,0.7),Y=c(0,0,0.29,0.29)), fill=”white”) +

+ annotate(“text”,x=0.97,y=0.25,label=paste(“Npres = “,sum(obs==1),sep=””),hjust=1) +

+ annotate(“text”,x=0.97,y=0.20,label=paste(“Nabs = “,sum(obs==0),sep=””),hjust=1) +

+ annotate(“text”,x=0.97,y=0.15,label=paste(“AUC = “,round(ROC_auc@y.values[[1]],digits=2),sep=””),hjust=1) +

+ annotate(“text”,x=0.97,y=0.10,label=paste(“Sens = “,round(mean(as.data.frame(ROC_sens@y.values)[,1]),digits=2),sep=””),hjust=1) +

+ annotate(“text”,x=0.97,y=0.05,label=paste(“Spec = “,round(mean(as.data.frame(ROC_sens@x.values)[,1]),digits=2),sep=””),hjust=1) +

+ theme(legend.position=”none”, plot.title=element_text(vjust=2)) +

+ ggtitle(title)

Erreur dans scale_x_continuous(“False Positive Rate”, limits = c(0, 1)) + :

argument non numérique pour un opérateur binaire

> }

Erreur : ‘}’ inattendu(e) dans “}”

>

LikeLike

Thanks Adelaide, but I couldn’t get the code to reproduce this error. I did, however, find another error, which I have now corrected. Let me know if you can’t sort this out. Sorry!

LikeLike

Hi David, I’ve been using your original code from 22 Sept 2015, and it’s been working great. However yesterday suddenly the fun.auc function stopped working. Any ideas?

> fun.auc(predictions, d$Delichon_urbica)

Error: could not find function “fun.auc”

Thanks, Kylie.

LikeLike

Hi Kylie, hard to say what’s going on, but the error message would suggest that you don’t have the function loaded into your R session. Are you running the actual function code (start with fun.auc <- …) before trying to run the function? Alternatively, you can set up a script to run whenever you open R that loads all the packages and functions that you regularly use. Here's a post about how to do that: https://climateecology.wordpress.com/2012/10/02/loading-packages-and-functions-automatically-in-r/

LikeLike

Hi David, thanks for sharing your code!

But the code for the plots is not working like you posted it. Aside from missing “<- function(…){" some things are missing. I could make the function fun.aucplot work with replacing ROC_perf with some lines from the first function. But I have no idea, how plotdat should be build in the fun.auc.ggplot function, which I'd love to use as well. I'd appreciate a hint.

Thanks, Markus

LikeLike

Thanks Markus. I’ve been having some issues with WordPress randomly deleting lines of code in this post. I seem to have to replace the code every so often just to keep it correct. No idea what’s going on, maybe reverting to an autosave version or something? Anyway, I’ll have a look and fix it up today if I can.

LikeLike

Should be all fixed now. Sorry about that.

LikeLike

Almost working… 😉

I still had to add

“] <-

near the end of the second line where you build plotdat.

Additionally, I changed some details, because my data does not allow for 10 cut-offs, which resulted in errors.

I added

max_threshold <- max(plotdat[!is.na(plotdat$POINT), 'POINT'])

min_threshold <- min(plotdat[!is.na(plotdat$POINT), 'POINT'])

and changed

…

scale_colour_gradientn("",colours=rainbow(14)[1:sum(!is.na(plotdat$POINT))]) +

geom_text(data=plotdat[!is.na(plotdat$POINT),],

aes(x=FP,y=TP),

label=seq(max_threshold, min_threshold,-0.1),

hjust=1, vjust=0, col="black")

Now, it's working for me. Thanks again.

LikeLike

Hi David,

I also would like to thank you for sharing your code!

I understand you are having difficulties with WordPress deleting your code.

If possible, please reply to this message with the appropriate line of code for the second line of plotdat, which is supposed to supply the cut-off points. It would be much appreciated.

LikeLike

Thanks Stephanie. I’ve updated the page AGAIN. But here’s the entire code in case it still doesn’t work. (Time for a nasty email to WordPress.)

# Build some example data

# Observed (truth) data as presence-absence (1-0)

observations <- c(rep(0,250),rep(1,250))

# predicted data as values ranging from 0 to 1

predictions <- abs(c(rnorm(250,mean=0),rnorm(250,mean=1)))

predictions <- round((predictions-min(predictions))/max(predictions),3)

# Install the ROCR package

install.packages('ROCR')

library('ROCR')

### AUC function

fun.auc <- function(pred,obs){

# Run the ROCR functions for AUC calculation

ROC_perf <- performance(prediction(pred,obs),"tpr","fpr")

ROC_sens <- performance(prediction(pred,obs),"sens","spec")

ROC_err <- performance(prediction(pred, labels=obs),"err")

ROC_auc <- performance(prediction(pred,obs),"auc")

# AUC value

AUC <- ROC_auc@y.values[[1]] # AUC

# Mean sensitivity across all cutoffs

x.Sens <- mean(as.data.frame(ROC_sens@y.values)[,1])

# Mean specificity across all cutoffs

x.Spec <- mean(as.data.frame(ROC_sens@x.values)[,1])

# Sens-Spec table to estimate threshold cutoffs

SS <- data.frame(SENS=as.data.frame(ROC_sens@y.values)[,1],SPEC=as.data.frame(ROC_sens@x.values)[,1])

# Threshold cutoff with min difference between Sens and Spec

SS_min_dif <- ROC_perf@alpha.values[[1]][which.min(abs(SS$SENS-SS$SPEC))]

# Threshold cutoff with max sum of Sens and Spec

SS_max_sum <- ROC_perf@alpha.values[[1]][which.max(rowSums(SS[c("SENS","SPEC")]))]

# Min error rate

Min_Err <- min(ROC_err@y.values[[1]])

# Threshold cutoff resulting in min error rate

Min_Err_Cut <- ROC_err@x.values[[1]][which(ROC_err@y.values[[1]]==Min_Err)][1]

# Kick out the values

round(cbind(AUC,x.Sens,x.Spec,SS_min_dif,SS_max_sum,Min_Err,Min_Err_Cut),3)

}

# Run the function with the example data

fun.auc(predictions, observations)

### AUC plot function

fun.aucplot <- function(pred, obs, title){

# Run the AUC calculations

ROC_perf <- performance(prediction(pred,obs),"tpr","fpr")

ROC_sens <- performance(prediction(pred,obs),"sens","spec")

ROC_auc <- performance(prediction(pred,obs),"auc")

# Spawn a new plot window (Windows OS)

graphics.off(); x11(h=6,w=6)

# Plot the curve

plot(ROC_perf,colorize=T,print.cutoffs.at=seq(0,1,by=0.1),lwd=3,las=1,main=title)

# Add some statistics to the plot

text(1,0.25,labels=paste("Npres = ",sum(obs==1),sep=""),adj=1)

text(1,0.20,labels=paste("Nabs = ",sum(obs==0),sep=""),adj=1)

text(1,0.15,labels=paste("AUC = ",round(ROC_auc@y.values[[1]],digits=2),sep=""),adj=1)

text(1,0.10,labels=paste("Sens = ",round(mean(as.data.frame(ROC_sens@y.values)[,1]),digits=2),sep=""),adj=1)

text(1,0.05,labels=paste("Spec = ",round(mean(as.data.frame(ROC_sens@x.values)[,1]),digits=2),sep=""),adj=1)

}

# Run the function

fun.aucplot(predictions, observations, "My AUC Plot")

### AUC ggplot function

# install.packages('ggplot2')

library('ggplot2')

library('grid')

# Spawn a new plot window (Windows OS)

graphics.off(); x11(h=6,w=6)

fun.auc.ggplot <- function(pred, obs, title){

# pred = predicted values

# obs = observed values (truth)

# title = plot title

# Run the AUC calculations

ROC_perf <- performance(prediction(pred,obs),"tpr","fpr")

ROC_sens <- performance(prediction(pred,obs),"sens","spec")

ROC_auc <- performance(prediction(pred,obs),"auc")

# Make plot data

plotdat <- data.frame(FP=ROC_perf@x.values[[1]],TP=ROC_perf@y.values[[1]],CUT=ROC_perf@alpha.values[[1]],POINT=NA)

plotdat[unlist(lapply(seq(0,1,0.1),function(x){which.min(abs(plotdat$CUT-x))})),"POINT"] <- seq(0,1,0.1)

# Plot the curve

ggplot(plotdat, aes(x=FP,y=TP)) +

geom_abline(intercept=0,slope=1) +

geom_line(lwd=1) +

geom_point(data=plotdat[!is.na(plotdat$POINT),], aes(x=FP,y=TP,fill=POINT), pch=21, size=3) +

geom_text(data=plotdat[!is.na(plotdat$POINT),], aes(x=FP,y=TP,fill=POINT), label=seq(1,0,-0.1), hjust=1, vjust=0) +

scale_fill_gradientn("Threhsold Cutoff",colours=rainbow(14)[1:11]) +

scale_x_continuous("False Positive Rate", limits=c(0,1)) +

scale_y_continuous("True Positive Rate", limits=c(0,1)) +

geom_polygon(aes(x=X,y=Y), data=data.frame(X=c(0.7,1,1,0.7),Y=c(0,0,0.29,0.29)), fill="white") +

annotate("text",x=0.97,y=0.25,label=paste("Npres = ",sum(obs==1),sep=""),hjust=1) +

annotate("text",x=0.97,y=0.20,label=paste("Nabs = ",sum(obs==0),sep=""),hjust=1) +

annotate("text",x=0.97,y=0.15,label=paste("AUC = ",round(ROC_auc@y.values[[1]],digits=2),sep=""),hjust=1) +

annotate("text",x=0.97,y=0.10,label=paste("Sens = ",round(mean(as.data.frame(ROC_sens@y.values)[,1]),digits=2),sep=""),hjust=1) +

annotate("text",x=0.97,y=0.05,label=paste("Spec = ",round(mean(as.data.frame(ROC_sens@x.values)[,1]),digits=2),sep=""),hjust=1) +

theme(legend.position="none", plot.title=element_text(vjust=2)) +

ggtitle(title)

}

fun.auc.ggplot(predictions, observations, "My AUC Plot")

LikeLike

Thank you for the above! One question relating to: plotdat[unlist(lapply(seq(0,1,0.1),function(x){which.min(abs(plotdat$CUT-x))})),”POINT seq(0,1,0.1)

Is there supposed to be another set of quotes and bracket? If so, and where?

It appears my program is failing at this point.

LikeLike

Ugh, it’s not your program that is failing Stephanie. It’s the WordPress block quote formatting for the code. >:-| Oddly, when I look at the page and copy/paste from it, everything works fine, but many people have commented that characters are missing when they do it.

So, a new approach. I’m adding a link at the top of the page to an external file with the *working* code in it. If you have any other issues, please comment again or email me. Sorry for the hassle.

LikeLike