Find if a covariate has effect on gene expression level using edgeR and DESeq2
2
0
Entering edit mode
ta65 • 0
@ta65-11452
Last seen 7.5 years ago

I have RNAseq data of patient and control samples with different age and gender.

I would like to know whether these covariate affect gene expression levels. If they do not have any effect remove them from model and create simpler model for differential expression analysis using edgeR and DESeq2.

Would you please let me know if there is a function in these packages to find the answer?

 

edger deseq2 model matrix differential expression • 2.5k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 40 minutes ago
United States

There aren't any functions per se, but it's not an uncommon thing to do. Do note however that you are fitting (tens of) thousands of models simultaneously, and age and/or gender (especially) will have an effect on at least some of those genes. One aspect of this sort of statistics is that you are doing things in bulk, and you don't have the ability to do lots of model checking, etc, and sometimes have to use an over-specified model because it is useful for some subset of the genes you care about (and you are thereby 'wasting' degrees of freedom on those genes for which the extra parameter(s) are not needed).

One thing you can do is a MDS or PCA plot to see if there are large sex or age related differences. You can also fit the 'full' model and then see how many genes have a significant coefficient for age or sex. If there are not too many such genes (for some definition of 'not too many'), you could decide to drop one or more coefficients. Or you could take the opposite approach and just fit the model with age and sex and burn the two degrees of freedom. If you have enough replication it might not matter that much.

ADD COMMENT
0
Entering edit mode

Thanks James. Would you please explain about what is " significant coefficient  ". sorry If this question is unrelevant. 

ADD REPLY
0
Entering edit mode

So let's say you did something like this, where you have some data (called dglst, in my example) and a design matrix that includes sex and age.

fit <- glmQLFit(dglst, design)

testForSex <- glmQLFTest(fit, "sex")

topGenesForSex <- topTags(testForSex, Inf, p.value = 0.05)

nrow(topGenesForSex$table)

So now we fit a model that includes sex and age, and then extracted all the genes for which the 'sex' coefficient is significant at an FDR of 5%. We then see how many genes fulfill that criterion. You can then decide if that number is large enough for you to think that the sex coefficient is needed or not.

 

ADD REPLY
1
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 27 days ago
United States

I agree with Jim that for just 2 possible co-variates, I would just include them both as coefficients in the model. Even if there are only a few genes that one or both affect, it is better for them and the remaining genes likely won't be affected that much by losing 2 DF. However, another approach I recently used with some human data with 7 possible covariates was to first run WGCNA on the genes to divide the them into modules based on similar expression patterns, then use the eigengene values for each module in a BIC stepwise regression to select the best covariates for each module, and was pretty pleased with this broad level approach. I'm playing around with then running a separate limma/edgeR for the set of genes in each module using the best model to get gene-level results, but I'm not sure if it's a good idea or not because of the separate variance estimates for each gene set. I may broach this topic in a separate post...

Jenny

ADD COMMENT
0
Entering edit mode

Thanks Jenny. Could you please share the codes that you have used for your analysis?

 

ADD REPLY

Login before adding your answer.

Traffic: 562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6