R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate -
whenever want "map"py in r
, try use function in apply
family. (side question: still haven't learned plyr
or reshape
-- plyr
or reshape
replace of these entirely?)
however, i've never quite understood differences between them [how {sapply
, lapply
, etc.} apply function input/grouped input, output like, or input can be], go through them until want.
can explain how use 1 when?
[my current (probably incorrect/incomplete) understanding is...
sapply(vec, f)
: input vector. output vector/matrix, elementi
f(vec[i])
[giving matrix iff
has multi-element output]lapply(vec, f)
: samesapply
, output list?apply(matrix, 1/2, f)
: input matrix. output vector, elementi
f(row/col of matrix)tapply(vector, grouping, f)
: output matrix/array, element in matrix/array value off
@ groupingg
of vector, ,g
gets pushed row/col namesby(dataframe, grouping, f)
: letg
grouping. applyf
each column of group/dataframe. pretty print grouping , value off
@ each column.aggregate(matrix, grouping, f)
: similarby
, instead of pretty printing output, aggregate sticks dataframe.]
r has many *apply functions ably described in files (e.g. ?apply
). there enough of them, though, beginning users may have difficulty deciding 1 appropriate situation or remembering them all. may have general sense "i should using *apply function here", can tough keep them straight @ first.
despite fact (noted in other answers) of functionality of *apply family covered extremely popular plyr
package, base functions remain useful , worth knowing.
this answer intended act sort of signpost new users direct them correct *apply function particular problem. note, not intended regurgitate or replace r documentation! hope answer helps decide *apply function suits situation , research further. 1 exception, performance differences not addressed.
apply - when want apply function rows or columns of matrix (and higher-dimensional analogues); not advisable data frames coerce matrix first.
# 2 dimensional matrix m <- matrix(seq(1,16), 4, 4) # apply min rows apply(m, 1, min) [1] 1 2 3 4 # apply max columns apply(m, 2, max) [1] 4 8 12 16 # 3 dimensional array m <- array( seq(32), dim = c(4,4,2)) # apply sum across each m[*, , ] - i.e sum across 2nd , 3rd dimension apply(m, 1, sum) # result one-dimensional [1] 120 128 136 144 # apply sum across each m[*, *, ] - i.e sum across 3rd dimension apply(m, c(1,2), sum) # result two-dimensional [,1] [,2] [,3] [,4] [1,] 18 26 34 42 [2,] 20 28 36 44 [3,] 22 30 38 46 [4,] 24 32 40 48
if want row/column means or sums 2d matrix, sure investigate highly optimized, lightning-quick
colmeans
,rowmeans
,colsums
,rowsums
.lapply - when want apply function each element of list in turn , list back.
this workhorse of many of other *apply functions. peel code , find
lapply
underneath.x <- list(a = 1, b = 1:3, c = 10:100) lapply(x, fun = length) $a [1] 1 $b [1] 3 $c [1] 91 lapply(x, fun = sum) $a [1] 1 $b [1] 6 $c [1] 5005
sapply - when want apply function each element of list in turn, want vector back, rather list.
if find typing
unlist(lapply(...))
, stop , considersapply
.x <- list(a = 1, b = 1:3, c = 10:100) #compare above; named vector, not list sapply(x, fun = length) b c 1 3 91 sapply(x, fun = sum) b c 1 6 5005
in more advanced uses of
sapply
attempt coerce result multi-dimensional array, if appropriate. example, if our function returns vectors of same length,sapply
use them columns of matrix:sapply(1:5,function(x) rnorm(3,x))
if our function returns 2 dimensional matrix,
sapply
same thing, treating each returned matrix single long vector:sapply(1:5,function(x) matrix(x,2,2))
unless specify
simplify = "array"
, in case use individual matrices build multi-dimensional array:sapply(1:5,function(x) matrix(x,2,2), simplify = "array")
each of these behaviors of course contingent on our function returning vectors or matrices of same length or dimension.
vapply - when want use
sapply
perhaps need squeeze more speed out of code.for
vapply
, give r example of sort of thing function return, can save time coercing returned values fit in single atomic vector.x <- list(a = 1, b = 1:3, c = 10:100) #note since advantage here speed, # example illustration. we're telling r # returned length() should integer of # length 1. vapply(x, fun = length, fun.value = 0l) b c 1 3 91
mapply - for when have several data structures (e.g. vectors, lists) , want apply function 1st elements of each, , 2nd elements of each, etc., coercing result vector/array in
sapply
.this multivariate in sense function must accept multiple arguments.
#sums 1st elements, 2nd elements, etc. mapply(sum, 1:5, 1:5, 1:5) [1] 3 6 9 12 15 #to rep(1,4), rep(2,3), etc. mapply(rep, 1:4, 4:1) [[1]] [1] 1 1 1 1 [[2]] [1] 2 2 2 [[3]] [1] 3 3 [[4]] [1] 4
map - a wrapper
mapply
simplify = false
, guaranteed return list.map(sum, 1:5, 1:5, 1:5) [[1]] [1] 3 [[2]] [1] 6 [[3]] [1] 9 [[4]] [1] 12 [[5]] [1] 15
rapply - for when want apply function each element of nested list structure, recursively.
to give idea of how uncommon
rapply
is, forgot when first posting answer! obviously, i'm sure many people use it, ymmv.rapply
best illustrated user-defined function apply:#append ! string, otherwise increment myfun <- function(x){ if (is.character(x)){ return(paste(x,"!",sep="")) } else{ return(x + 1) } } #a nested list structure l <- list(a = list(a1 = "boo", b1 = 2, c1 = "eeek"), b = 3, c = "yikes", d = list(a2 = 1, b2 = list(a3 = "hey", b3 = 5))) #result named vector, coerced character rapply(l,myfun) #result nested list l, values altered rapply(l, myfun, how = "replace")
tapply - for when want apply function subsets of vector , subsets defined other vector, factor.
the black sheep of *apply family, of sorts. file's use of phrase "ragged array" can bit confusing, quite simple.
a vector:
x <- 1:20
a factor (of same length!) defining groups:
y <- factor(rep(letters[1:5], each = 4))
add values in
x
within each subgroup definedy
:tapply(x, y, sum) b c d e 10 26 42 58 74
more complex examples can handled subgroups defined unique combinations of list of several factors.
tapply
similar in spirit split-apply-combine functions common in r (aggregate
,by
,ave
,ddply
, etc.) hence black sheep status.
Comments
Post a Comment