While working on a spatial autocorrelation analysis, I came to need an R function that sub-sampled spatial data points (i.e. an xyz table, not a raster), retaining only those at least a given distance from one another. But I am also a bit data limited, so I wanted to make sure to keep as much data as possible. I’m sure there are good packages for this, but Google couldn’t really turn up what I was looking for. So I wrote my own inelegant, brute-force version.

Here’s the function:

## Function to buffer points in XY space: ## Returns the original data table with buffered points removed. ## Runs numerous iterations, as the random point selection can result in more/fewer output points. # 1) Randomly select a single point # 2) Remove points within 50km of that point # 3) Randomly select of the remaining points # 4) ... # foo - a data.frame to select from with columns x, y # buffer - the minimum distance between output points # reps - the number of repetitions for the points selection buffer.f <- function(foo, buffer, reps){ # Make list of suitable vectors suitable <- list() for(k in 1:reps){ # Make the output vector outvec <- as.numeric(c()) # Make the vector of dropped (buffered out) points dropvec <- c() for(i in 1:nrow(foo)){ # Stop running when all points exhausted if(length(dropvec)<nrow(foo)){ # Set the rows to sample from if(i>1){ rowsleft <- (1:nrow(foo))[-c(dropvec)] } else { rowsleft <- 1:nrow(foo) } # Randomly select point outpoint <- as.numeric(sample(as.character(rowsleft),1)) outvec[i] <- outpoint # Remove points within buffer outcoord <- foo[outpoint,c("x","y")] dropvec <- c(dropvec, which(sqrt((foo$x-outcoord$x)^2 + (foo$y-outcoord$y)^2)<buffer)) # Remove unnecessary duplicates in the buffered points dropvec <- dropvec[!duplicated(dropvec)] } } # Populate the suitable points list suitable[[k]] <- outvec } # Go through the iterations and pick a list with the most data best <- unlist(suitable[which.max(lapply(suitable,length))]) foo[best,] }

And here’s a working example:

# Create some fake spatial data in, say, kilometres. mydat <- data.frame(x=sample(1:10000,1000), y=sample(1:10000,1000)) plot(mydat) # The x and y points here are, on average, 10km apart. # We want to remove points so that no two are closer than 500km. # But we want to maintain as much data as we can. # Run the function on the spatial data, buffer of 500km, 1000 iterations mydat2 <- buffer.f(mydat, 500, 1000) # Have a look at the buffered data plot(mydat2)

Yes, the iterative process should (given enough iterations) find a combination of points that retains a *pretty good* number of points. Of course, there’s no way to ensure that it’s the *maximum* number of points (unless you run all possible combinations), but close enough for my purposes.

As usual, if there’s an egregious error here, or if there’s an easier way (i.e. a good package) please let me know.