DESEQ2 : Analysing PolyA+ & Ribo0 RNASEQ together with batch effect per year
1
0
Entering edit mode
ZheFrench ▴ 60
@zhefrench-11689
Last seen 3 months ago
France

Hi

I have one Ribo0 RNASEQ dataset done in 2015 & 2016 . I treat them using design ~ year + condition. (there is a strong batch effect per year)

I have another new PolyA+ RNASeq dataset done in 2017.  I treat them using design ~ condition

For now I treat them separately and try to intersect/merge results depending on the situation/question.

1 - I was wondering if It could be a good idea to analyse them together using : 

design ~ year + condition + typeExperiment

and taking account the fact that there is Ribo0 and PolyA+ with TypeExperiment.

2 - Also in the design does the order of factors have an importance ? 

Thanks

deseq2 • 860 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 6 hours ago
United States

If you search the site here I have a few answers on this. Let me know if you have further questions after reading those posts.

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Sorry but I' m not sure to understand the take home message. I know there is an infinity of questions about design for DESEQ2, sorry to bother you again with that. In fact , I more interesting about doing PCA with all that samples. I want to check they fit well together per condition. Usually I'm using this command : 

rld <- rlog(dds, blind=FALSE) 

But here as I'm not sure about the design formula so it's a better idea to use  rld <- rlog(dds, blind=TRUE) before doing the heatmap / PCA from DESEQ2.

Back to my question , so from what I understood : 

Use type Experiment & year in another variable...and then you apply contrast between condition 

dds$group <- factor(paste0(dds$year, dds$typeExperiment))

design ~ condition + group

The results will depend of number of replicates.  If there are not enough residual degrees of freedom to estimate the dispersion,  Seq2 will automatically use a design of ~ 1 for estimating the dispersion (and will print a message about this). This means that the analysis is underpowered to detect differences...

In my case :

Year 2015 -> Ribo0 -> 3 conditions  -> 1 biological replicate per condition

Year 2016 -> Ribo0 -> 3 conditions  -> 2 biological replicates per condition

Year 2017 -> PolyA+ -> 3 conditions  -> 3 biological replicates per condition

ADD REPLY
0
Entering edit mode

hi, 

If I understand, you have a strong year effect, but you also have all the PolyA samples from one year and all the Ribo0 samples from two other years? This makes it really difficult to include year in the design, without adding stronger assumptions to the model. (It also means that any differences due to ribo / polyA are confounded with any technical differences in 2017 vs the other two years.) To think about how this is a problem for the statistical model: if the two years are discrepant, should the numerator of the ribo / polyA ratio from 2015 or 2016 observations be used?

ADD REPLY
0
Entering edit mode

Yep that's it. . So I did analyse Ribo0 and PolyA separately... For first analyse with Ribo0, I used year in the design and they I did contrast between conditions. For polya , I removed year. And then I merge/intersect list of DE genes. I'm cool with that.

But I just would like to have a PCA plot with all this samples...but I don't know if design can influence the plot . Does it ?

rld <- rlog(dds, blind=TRUE) 

blind True means you don't make any assumptions on the design. So it should not affect normalization and pca plot should be not biaised by prior information on samples.

ADD REPLY
1
Entering edit mode

blind=TRUE ignores the design, all differences between samples are taken as part of the biological and technical variation.

ADD REPLY

Login before adding your answer.

Traffic: 938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6