Question

RNA-seq: two factor anova - how to find the variability explained by each of the factor?

1

Entering edit mode

nooshin ▴ 300

@nooshin-5239

Last seen 5.4 years ago

Hi all,

I have to do two way anova on the RNA-seq data and find out the variability that can be explained by each factor:

Time <- factor(rep(1:3,4),levels=3:1)

Sex <- factor(rep(c("Female","Male"),each=6),levels=c("Male","Female"))

design <- model.matrix(~Sex*Time)

I want to calculate the amount of the variance can be explained by factor Sex, the amount of the variance can be explained by factor Time, the amount of the variance can be explained by interaction between Sex and Time.

Would this be possible to do it using edgeR, DESeq, or limma? and if yes, how?

Thanks a lot and looking forward.

N,

rnaseq two factor anova limma DEseq2 edgeR • 3.0k views

ADD COMMENT • link 7.7 years ago nooshin ▴ 300

Gordon Smyth · Answer 1 · 2016-08-17

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 46 minutes ago

WEHI, Melbourne, Australia

When you say "the amount of variance", I assume you actually mean the sum of squares (SS) and mean square (MS) quantities attributable to each term, as for any analysis of variance.

Well, yes, it can be done. But the answer will be different for each gene, making the results very hard to interpret. Are you sure that is what you need?

ADD COMMENT • link 7.7 years ago Gordon Smyth 50k

0

Entering edit mode

Thanks a lot for your response.

Yes, I want the SS and MS for each gene separately like in anova. I need to calculate the variabilities of each factor and their interactions separately for each gene, like the following example by using anova:

res <- anova(lm(values ~ Sex*Time,data))
    
res_ss <- res$"Sum Sq"
res_ms <- res$"Mean Sq"
res_df <- res$"Df"
    
v1 <- (res_ss)[1]/sum(res_ss)

v2 <- ((res_ss-res$"Df"*res_ms[4])/(sum(res_ss)+res_ms[4]))[1]

Would this be possible that I do voom normalization on my data and then do exactly the same analysis as above on the voom-normalized results?

however I would like to do it also with edgeR to check for the similarities at least between two methods like limma and edgeR, or edgeR and DESeq.

Thanks.

ADD REPLY • link updated 7.7 years ago by Gordon Smyth 50k • written 7.7 years ago by nooshin ▴ 300

0

Entering edit mode

Do what analysis? Your quantity v1 is just the proportion of SS explained by Sex, but v2 looks like a nonsense quantity. It doesn't seem to me to measure anything.

ADD REPLY • link 7.7 years ago Gordon Smyth 50k

0

Entering edit mode

v2 is a less biased indicator of variance explained in the population by a predictor variable:

(SS_factor - df*(MS_residual))/ (SS_total + MS_residual)

It's basically the same as V1 but the normalized version if it's possible to call it so :)

ADD REPLY • link 7.7 years ago nooshin ▴ 300

0

Entering edit mode

OK, v2 seems to be a ratio of estimated variance components.

None of the packages will estimate variance components for you. In fact, fitting variance component models is problematic with weights (voom) or in a generalized linear model context (edgeR, DESeq).

You can simply compute a matrix of logCPM values, then repeat your anova calculation for each row.

ADD REPLY • link 7.7 years ago Gordon Smyth 50k

0

Entering edit mode

Then this means that I can do the same anova analysis on logCPM values for each gene.

Thanks a lot.

ADD REPLY • link 7.7 years ago nooshin ▴ 300

0

Entering edit mode

would you mind please guide me on how I can do it?

tnx

ADD REPLY • link 7.7 years ago nooshin ▴ 300