Hi,
I have been using DEXSeq with transcript counts from salmon (as suggested in e.g. : https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0862-3 ) to investigate DTU between conditions case vs control.
I followed a workflow described in: https://f1000research.com/articles/7-952/v3 .
When I plotted a volcano plot with the p-values from the DEXSeq result object and its estimated log2fold change (from DEXSeq::estimateExonFoldChange
), i saw some extreme p-value outliers.
When I plotted the counts (fitted and observed) I saw that there was no visible change at all between conditions. When I looked at the pvalues of DRIMSeq for those transcripts, they were far from significant (~0.5).
I saw that those transcripts (among others) were labelled by DEXSeq as dispOutliers
(which I assume means dispersion outliers?!).
I attached the volcano plots and a rds
object which holds the 6 outlier transcripts and the mcols from DEXSeq.
What could be the reason for these extreme p-values when there clearly seems to be no significance.
I aimed at remodelling everything by hand for those outliers to see if I get similar lrt statistics but I am not completely sure which values to use from mcols(DEXSeqobj)
to retrieve correct variable coefficients for the samples.
volcano Plot: https://drive.google.com/file/d/12ohJQfxL15W7R11K8ew-qR_J-Y4Tf4vJ/view?usp=sharing
rds obj: https://drive.google.com/file/d/1M9mm5AqsWG2tuhK1Vbb5cBjHpDbLhex9/view?usp=sharing
SessionInfo: https://drive.google.com/file/d/1h-WgD8BLep1DuKfZPrbLF-iY9R4ybAAD/view?usp=sharing
countplot for most extreme pvalue: https://drive.google.com/file/d/1h0Ch_5v-OMLkypU9KdotPDt-Ei0NEQCl/view?usp=sharing
( I extracted the fitted counts as follows:
fitted.common.scale = as.data.frame( t(t(assays(DEXSeqObj)[["mu"]][,1:49])/DEXSeqObj$sizeFactor))
colnames(fitted.common.scale) <- sampleAnnotation(DEXSeqObj)$sample_id
Im not sure if this is the right way to do it, I referred to this post : https://support.bioconductor.org/p/60567/ (which was about DESeq2))
DEXSeq workflow:
covariates<-lapply(covariates,function(c){paste0("+ ",as.character(c),":exon")})
design<-formula(paste("~sample + exon ",stringi::stri_flatten(covariates)," + condition:exon"))
print("DESGIN:")
print(design)
reduced_design<-formula(paste("~sample + exon ",stringi::stri_flatten(covariates)))
print("DESGIN reduced:")
print(reduced_design)
sample.data <- DRIMSeq::samples(d)
count.data <- round(as.matrix(DRIMSeq::counts(d)[,-c(1:2)]))
dx <- DEXSeqDataSet(countData=count.data,
sampleData=sample.data,
design=design,
featureID=counts(d)$feature_id,groupID=counts(d)$gene_id)
#~sample + exon + rin:exon + condition:exon,
system.time({
dx <- estimateSizeFactors(dx)
dx <- estimateDispersions(dx,quiet=FALSE,maxit=500)
dx <- testForDEU(dx,reduced=reduced_design)
dx <- estimateExonFoldChanges(dx, fitExpToVar="condition")
[1] "DESGIN:" ~sample + exon + rin:exon + ageyears:exon + sex:exon + cohort:exon + oligodendrocytes:exon + condition:exon [1] "DESGIN red:" ~sample + exon + rin:exon + ageyears:exon + sex:exon + cohort:exon + oligodendrocytes:exon converting counts to integer mode the design formula contains a numeric variable with integer values,
specifying a model with increasing fold change for higher values.
did you mean for this to be a factor? if so, first convert this variable to a factor using the factor() function
Can you try without e.g.
rin:exon
? I don't typically put RIN in as a covariate.yes, I did try. I also changed the volcano plot to use the condition coefficient as l2fc instead of the result of DEXSeq::estimeExonFoldChange. There still remain some outliers but its not the same transcripts. Here you can find both volcano plots. The one resulting from analysis with and one without rin as covariate. We found that our main axis of variation is significantly associated with RIN and adding it as a covariate has been suggested by other studies at least for DGE analyisis (https://www.biostars.org/p/312942/https://bmcbiol.biomedcentral.com/articles/10.1186/1741-7007-12-42 , https://www.pnas.org/content/pnas/114/27/7130.full.pdf )