Hi
I have one Ribo0 RNASEQ dataset done in 2015 & 2016 . I treat them using design ~ year + condition
. (there is a strong batch effect per year)
I have another new PolyA+ RNASeq dataset done in 2017. I treat them using design ~ condition
For now I treat them separately and try to intersect/merge results depending on the situation/question.
1 - I was wondering if It could be a good idea to analyse them together using :
design ~ year + condition + typeExperiment
and taking account the fact that there is Ribo0 and PolyA+ with TypeExperiment.
2 - Also in the design does the order of factors have an importance ?
Thanks
Here's a post I made on the topic
DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling)
Sorry but I' m not sure to understand the take home message. I know there is an infinity of questions about design for DESEQ2, sorry to bother you again with that. In fact , I more interesting about doing PCA with all that samples. I want to check they fit well together per condition. Usually I'm using this command :
rld <- rlog(dds, blind=FALSE)
But here as I'm not sure about the design formula so it's a better idea to use rld <- rlog(dds, blind=TRUE) before doing the heatmap / PCA from DESEQ2.
Back to my question , so from what I understood :
Use type Experiment & year in another variable...and then you apply contrast between condition
dds$group <- factor(paste0(dds$year, dds$typeExperiment))
design ~ condition + group
The results will depend of number of replicates. If there are not enough residual degrees of freedom to estimate the dispersion, Seq2 will automatically use a design of ~ 1 for estimating the dispersion (and will print a message about this). This means that the analysis is underpowered to detect differences...
In my case :
Year 2015 -> Ribo0 -> 3 conditions -> 1 biological replicate per condition
Year 2016 -> Ribo0 -> 3 conditions -> 2 biological replicates per condition
Year 2017 -> PolyA+ -> 3 conditions -> 3 biological replicates per condition
hi,
If I understand, you have a strong year effect, but you also have all the PolyA samples from one year and all the Ribo0 samples from two other years? This makes it really difficult to include year in the design, without adding stronger assumptions to the model. (It also means that any differences due to ribo / polyA are confounded with any technical differences in 2017 vs the other two years.) To think about how this is a problem for the statistical model: if the two years are discrepant, should the numerator of the ribo / polyA ratio from 2015 or 2016 observations be used?
Yep that's it. . So I did analyse Ribo0 and PolyA separately... For first analyse with Ribo0, I used year in the design and they I did contrast between conditions. For polya , I removed year. And then I merge/intersect list of DE genes. I'm cool with that.
But I just would like to have a PCA plot with all this samples...but I don't know if design can influence the plot . Does it ?
rld <- rlog(dds, blind=TRUE)
blind True means you don't make any assumptions on the design. So it should not affect normalization and pca plot should be not biaised by prior information on samples.
blind=TRUE ignores the design, all differences between samples are taken as part of the biological and technical variation.