Question

DESEQ2 : Analysing PolyA+ & Ribo0 RNASEQ together with batch effect per year

0

Entering edit mode

ZheFrench ▴ 50

@zhefrench-11689

Last seen 16 months ago

France

Hi

I have one Ribo0 RNASEQ dataset done in 2015 & 2016 . I treat them using design ~ year + condition. (there is a strong batch effect per year)

I have another new PolyA+ RNASeq dataset done in 2017. I treat them using design ~ condition

For now I treat them separately and try to intersect/merge results depending on the situation/question.

1 - I was wondering if It could be a good idea to analyse them together using :

design ~ year + condition + typeExperiment

and taking account the fact that there is Ribo0 and PolyA+ with TypeExperiment.

2 - Also in the design does the order of factors have an importance ?

Thanks

deseq2 • 728 views

ADD COMMENT • link updated 6.9 years ago by Michael Love 41k • written 6.9 years ago by ZheFrench ▴ 50

score 1 · Accepted Answer · 2017-05-29

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 37 minutes ago

United States

If you search the site here I have a few answers on this. Let me know if you have further questions after reading those posts.

ADD COMMENT • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Here's a post I made on the topic

DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling)

ADD REPLY • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Sorry but I' m not sure to understand the take home message. I know there is an infinity of questions about design for DESEQ2, sorry to bother you again with that. In fact , I more interesting about doing PCA with all that samples. I want to check they fit well together per condition. Usually I'm using this command :

rld <- rlog(dds, blind=FALSE)

But here as I'm not sure about the design formula so it's a better idea to use rld <- rlog(dds, blind=TRUE) before doing the heatmap / PCA from DESEQ2.

Back to my question , so from what I understood :

Use type Experiment & year in another variable...and then you apply contrast between condition

dds$group <- factor(paste0(dds$year, dds$typeExperiment))

design ~ condition + group

The results will depend of number of replicates. If there are not enough residual degrees of freedom to estimate the dispersion, Seq2 will automatically use a design of ~ 1 for estimating the dispersion (and will print a message about this). This means that the analysis is underpowered to detect differences...

In my case :

Year 2015 -> Ribo0 -> 3 conditions -> 1 biological replicate per condition

Year 2016 -> Ribo0 -> 3 conditions -> 2 biological replicates per condition

Year 2017 -> PolyA+ -> 3 conditions -> 3 biological replicates per condition

ADD REPLY • link 6.9 years ago ZheFrench ▴ 50

0

Entering edit mode

hi,

If I understand, you have a strong year effect, but you also have all the PolyA samples from one year and all the Ribo0 samples from two other years? This makes it really difficult to include year in the design, without adding stronger assumptions to the model. (It also means that any differences due to ribo / polyA are confounded with any technical differences in 2017 vs the other two years.) To think about how this is a problem for the statistical model: if the two years are discrepant, should the numerator of the ribo / polyA ratio from 2015 or 2016 observations be used?

ADD REPLY • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Yep that's it. . So I did analyse Ribo0 and PolyA separately... For first analyse with Ribo0, I used year in the design and they I did contrast between conditions. For polya , I removed year. And then I merge/intersect list of DE genes. I'm cool with that.

But I just would like to have a PCA plot with all this samples...but I don't know if design can influence the plot . Does it ?

rld <- rlog(dds, blind=TRUE)

blind True means you don't make any assumptions on the design. So it should not affect normalization and pca plot should be not biaised by prior information on samples.

ADD REPLY • link 6.9 years ago ZheFrench ▴ 50

1

Entering edit mode

blind=TRUE ignores the design, all differences between samples are taken as part of the biological and technical variation.

ADD REPLY • link 6.9 years ago Michael Love 41k