Nested Interaction in edgeR and composition bias
2
0
Entering edit mode
@chapdelainev-22259
Last seen 4.4 years ago

Hello,

The question I would like to ask is : Using edgeR's GLM methode nested interaction between two explaining variable, one continous variable (unlinked to genotype), the second genotype, is this interaction affected by the compositional biases ?

The logic of my reasoning is as follow : Any potential compositional biases (such as differing sequences, slightly different length and normalisation), should not be affected differently by the continuous variable.

Are these assumption correct ?

experimental design

row            Condition                     Species
1                       0                                   0
2                       0                                   1
3                       0                                   0                            
4                       0                                   1
... 
10                      1.5                                   1
11                       1.5                                  0
12                      1.5       1
edger rna-seq • 670 views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

I assume the technical biases you are asking about are differences in GC content or gene length for the same gene between the two species.

Yes the technical biases should, in principle, cancel out of any interaction term. That would be so whether the interaction is species x factor or species x covariate.

ADD COMMENT
1
Entering edit mode
Yunshun Chen ▴ 840
@yunshun-chen-5451
Last seen 5 weeks ago
Australia

I am not sure what your question is.

If you are concerned about the composition biases between samples, then the edgeR scale normalization (eg. TMM) would take care of it. It has nothing to do with the interaction between your explanatory variables.

If your question is about how to incorporate a continuous variable into the design matrix, then it depends on the number of time points you have in your data. If your data only has two time points, as shown in your design, then you can simply treat it as a two-level factor and proceed with the standard edgeR DE analysis pipeline.

ADD COMMENT
0
Entering edit mode

Thank you for your speedy answer.

As clarification, my question is more about the edgeR assumptions being broken in this situation. From my understanding, edgeR's process assumes the equality of such things as CG and length of genes (my use of compositional biases was indeed wrong). In this situation, varying species are being compared these assumption are thus somewhat broken.

However, in the case of the analysis of an interaction (as described in my initial question), my reasoning is that such biases does not influence the outcome : Since the CG and length does not vary between one of the explaining variable, such biases's influence should not change based on that explaining variable. By this logic, the analysis of that interaction is not affected by those biases.

P.S. The design matrix provide is indeed incomplete as there is three value for the condition. However I don't think this has any bearing on the answer.

Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6