DGE analysis (statistics)
2
0
Entering edit mode
@sandravelasco911012-23703
Last seen 4.3 years ago

Hi. I have some doubts about statistics of DGE analysis (RNAseq data).

  1. Some packages use the likelihood ratio test (LRT) and other use quasi-likelihood (QL) F-test. What is better and why? Can I use qlFtest if I have replicates?

Please, share with me all the bibliography references to understand it.

  1. I have the following experiment: Two populations (North and South) and in each population, I have my conditions (3 Host Plants). Each Population x HostPlant 'subgroup' has two biological replicates, giving a total of 12 samples (two replicates for each population x host plant).

My interest is determinate the DGE between host plants, but I wondering if the population has and effect in DGE. We are thinking of doing a two-way ANOVA to test the effect of the two factors, but I'm not sure if it is the right approach.

My question is, Is it correct to do a two-way ANOVA? How could I do it? I read some blogs to DEseq and edgeR with analysis like ANOVA, but I'm not sure to understand it. I think that my model could be: ~population+hostplant and using models to batch effects. Again, if you have bibliography references, please share it with me.

Thanks you so much for your help Sandra

edger limma • 787 views
ADD COMMENT
0
Entering edit mode

I removed the DESeq2 tag, as it seems like you are interested in using edgeR here from reading over your question. Note that when you add a package tag it directly emails the developers.

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 23 minutes ago
WEHI, Melbourne, Australia

You've included both limma and edgeR tags to your question. You could use either package to analyse the data but I will assume that edgeR is your focus.

Some packages use the likelihood ratio test (LRT) and other use quasi-likelihood (QL) F-test. What is better and why?

edgeR offers both LRT and QL tests. We recommend the QL approach because it gives the most rigorous FDR control.

Can I use qlFtest if I have replicates?

Yes of course. What would stop you?

Please, share with me all the bibliography references to understand it.

Just type ?glmQLFTest and follow the references. Or follow the workflow:

https://bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html

My interest is determinate the DGE between host plants, but I wondering if the population has and effect in DGE. We are thinking of doing a two-way ANOVA to test the effect of the two factors, but I'm not sure if it is the right approach.

There's no need for complicated twoway ANOVA models, it's really simpler than that. If you think that the differences between the host plants could be different for the North and South populations, then fit a single factor with six levels (North.Host1, North.Host2, .North.Host3, South.Host1, South.Host2.South.Host3). If instead you only want to correct for baseline differences between the two populations, then use ~population+hostplant.

using models to batch effects.

Why would need to correct for batch effects? Your description of the experiment does not suggest any need for batch correction. You have only 2 replicates per group, so your ability to correct for batch effects is pretty limited.

ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 minutes ago
United States

Have you read the edgeR User's Guide? There are 11 citations listed in section 1.2. Plus there are any number of examples that should be, um, explanatory.

In addition, this support site is intended for people who have technical questions about the usage of Bioconductor packages. It's beyond the scope of this site to act as a bibliographic reference generator for those who seem not to want to do the legwork themselves.

ADD COMMENT
0
Entering edit mode

Dear Dr. MacDonald.

I'm so sorry that my questions sounded silly or like I didn't want to do the job. I really feel like I've read quite about it, but I still have some doubts about the appropriate models to use in my analyzes. It is my first time at Bioconductor and I may have failed to explain well my doubts or perhaps this is not the place for this type of doubt. Anyway, I will try to clarify the point.

I have read the edgeR User's Guide. Although I am not an expert, I thought that my experiment could be considered as in section 3.4.3. Batch effects. I ran the example 4.2. with the data example and with my own data. My results make sense biologically and show a correlation between samples from the same populations (greater than the treatment effect). However, I still have doubts about if I use the correct model for the design (~ Population + HostPlant) and if it is the correct approach or if there is a more appropriate approach that allows considering the possible effect of populations on the DGE when my interest is the differences by the host plant and implies having covariates or something similar.

As for the bibliographic references, I'm sorry again, I just thought that someone who knows more about the subject could have some key elements that allow me to understand more deeply some points that I still do not understand, or explain it for dummies. I am not an expert on the subject and I am only seeking help by all possible means, I have read some of the cites that appear in the edgeR User's Guide, and most of them I keep rereading them to really understand what I am doing.

Finally, I included the DEseq tag because I found that batch effects can be taken into account with similar models (http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html)

Again sorry for the inconvenience, it was not my intention to seem lazy about the work, I just seek to understand and be sure of my approaches and analyzes. I will continue reading but all suggestions are welcome :)

Best wishes, Sandra

ADD REPLY

Login before adding your answer.

Traffic: 595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6