Not rocket science. Just parking this here for easy future access.

R has the very useful scale() command for scaling vectors/matrices. This command takes a group of numbers, re-centring the mean to 0 and standard deviation to 1. Essentially it’s a z-score conversion. But I often want a tool that simply redistributes the range values between 0 and 1. I also needed a quick way to scale a second data set based on the scalings of the first.

**(PART 1)** The first task, straight re-scaling between 0 and 1, can be completed easily with this function (which came mostly out of this thread on StackOverflow):

# Redistribution function redist.fun <- function(x){(x-min(x))/diff(range(x))} # Sample data foo <- data.frame(VAR1=rnorm(10), VAR2=rnorm(10)*2+50) foo VAR1 VAR2 1 -0.1629179 47.32685 2 -0.1142152 50.40980 3 -0.4446594 50.07057 4 0.2569592 49.12217 5 -1.1001371 50.80081 # Used on a single vector redist.fun(foo$VAR1) [1] 0.6906063 0.7264937 0.4830002 1.0000000 0.0000000 # Used with apply (across columns) apply(foo,2,redist.fun) VAR1 VAR2 [1,] 0.6906063 0.0000000 [2,] 0.7264937 0.8874443 [3,] 0.4830002 0.7897972 [4,] 1.0000000 0.5167949 [5,] 0.0000000 1.0000000

**(PART 2) **The second thing I need to do now and then is scale a second set of values based on the scaling rules of a first. For this, you can use the scale() command on the first data, then use a linear model with lm() to apply the same scalings to a second data set.

# Sample data with different means and SDs foo1 <- data.frame(VAR=rnorm(100)*10+50) foo2 <- data.frame(VAR=rnorm(100)*12+65) # Scaled data for foo1 foo1$SCALED <- c(scale(foo1$VAR)) summary(foo1) VAR SCALED Min. :25.93 Min. :-2.18614 1st Qu.:43.46 1st Qu.:-0.56339 Median :49.31 Median :-0.02175 Mean :49.55 Mean : 0.00000 3rd Qu.:57.00 3rd Qu.: 0.68975 Max. :78.10 Max. : 2.64345

So the first data has the mean centred on 0 and the SD scaled to 1. The second data set should have the same adjustments applied, but the mean and SD will be slightly different because they come from different distributions in the first place. To apply the same scaling to foo2 as in foo1, we can make a quick linear model of foo1 and use the coefficients to define foo2$SCALED.

# Make the linear model lm.scale <- lm(foo1$SCALED~foo1$VAR) lm.scale Call: lm(formula = foo1$SCALED ~ foo1$VAR) Coefficients: (Intercept) foo1$VAR -4.58653 0.09257 # Scale foo2 based on the coefficients from the model foo2$SCALED <- foo2$VAR*lm.scale$coefficients[2]+lm.scale$coefficients[1] summary(foo2) VAR SCALED Min. :38.15 Min. :-1.0545 1st Qu.:58.15 1st Qu.: 0.7965 Median :63.79 Median : 1.3182 Mean :63.97 Mean : 1.3357 3rd Qu.:70.89 3rd Qu.: 1.9761 Max. :88.19 Max. : 3.5771

And so it goes.

**Update Feb. 19, 2016:**

I recently also found the squish() command from the {scales} package, which can be used to squish values into a range. But it doesn’t do quite what I expected. I found it when looking for a way to assign legend colours to z-values outside the z-limits in a ggplot (like if you want a cropped continuous legend, using this command will assign the highest/lowest colour to the out of bounds values).