Potential outlier/high variance
1
0
Entering edit mode
@ama86t-13745
Last seen 23 days ago
New Zealand

Hi

I been analysing some bulk rnaseq data of a non-model plant, looking at two treatments and two tissue stages. We used transcripts and salmon counts for looking at DEGs. Looking through the pca, cooks and dispersion plot (see below) – I been wondering if there is a concern of potential outlier samples/high variance as I notice one sample from each stage is away from the other two samples. The biological replicates are from three different trees – so there will be an element of biological variance due to that. So to account for that I have tried to fit two surrogate variables using sva (Note: I have also filtered some of the low count samples). Another concern was the dispersion plot – noticed the fitted line was one but this improves when a ‘local fit’ has been carried out. So I been wondering if it’s safe to go ahead and analyse the DEGs as is and potential bias I may be carrying forward. So was wondering if anyone could advice on this. Any thoughts would be much appreciated. Thanks in advance

#Create deseq object and run deseq
ddsTxi <- DESeqDataSetFromTximport(txi,colData = samples, design = ~ trt + Tissue)
ddsTxi
dds <- DESeq(ddsTxi)
#check data with plots
vsd <- vst(dds,blind=FALSE)
plotPCA(vsd,intgroup=c("trt","Tissue"))
#look at cooks distances
cooks <- log10(assays(dds)[["cooks"]])
boxplot(cooks, range=0, las=2)
nrow(dds)
keep <- rowSums(counts(dds) >= 10) >= 3 #at least3 samples with a count of 10 or higher
dds_filt <- dds[keep,]
dds_filt_loc <- estimateDispersions(dds_filt, fitType = "local")
nrow(dds_filt)
#percentage removed
nrow(dds_filt)/ nrow(dds)
vsd <- vst(dds_filt,blind=FALSE)
plotPCA(vsd,intgroup=c("trt","Tissue"))
plotDispEsts(dds_filt, main="Dispersn plot")
plotDispEsts(dds_filt_loc,main="local fit dispersion")

#SVA
dat  <- counts(dds_filt, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ trt + Tissue, colData(dds_filt))
mod0 <- model.matrix(~   1, colData(dds_filt))
svseq <- svaseq(dat, mod, mod0, n.sv = 2)
svseq$sv par(mfrow = c(2, 1), mar = c(3,5,3,1)) for (i in 1:2) { stripchart(svseq$sv[, i] ~ dds_filt$trt, vertical = TRUE, main = paste0("SV", i)) abline(h = 0) }  deseq DESeq2 • 236 views ADD COMMENT 0 Entering edit mode @mikelove Last seen 5 hours ago United States Sorry, for some reason I missed this post last week. "The biological replicates are from three different trees" Do you have the information about the tree? Are the three trees driving the three groups in the PCA plot above? If you have the tree information, you should include in the design as a covariate. ADD COMMENT 0 Entering edit mode Thanks for the reply, no worries. There is a possibility the trees are driving some of the grouping (though ideally we would like to see two groupings not three) - the plants are of wild population - non clonal. Though they are of same species, we are suspecting they may be of different genotypes. Hence trying to remove some the unwanted variation using sva. But through that I didn't see a major improvement (see stripchart below). Any suggestions would be much appreciated ADD REPLY 0 Entering edit mode Did you try refitting the model while controlling for the SVs? Eg put the two SVs in the design, and test on your variables of interest. This should help you focus your tests on the explainable part of the count variation. ADD REPLY 0 Entering edit mode Yep, so far that's how I have carried forward with the analysis - I also used a local fit based on the dispersion plot. However I haven't seen any improvement with the groupings seen in PCA plot. Is there anything more I could do? ddssva <- dds_filt ddssva$SV1 <- svseq$sv[,1] ddssva$SV2 <- svseq\$sv[,2]
#do design with D_T combined - treatment and tissue
design(ddssva) <- ~ SV1 + SV2 + D_T
dds_sva_locF <- DESeq(ddssva,fitType = "local")
dds_sva_locF
resultsNames(dds_sva_locF)

1
Entering edit mode

Haven't seen improvement in PCA plot -- see the FAQ in vignette on this topic.