Hi there,
I have been stuck with this analysis, that in first place seems a quite simple experiment but the results that gave me are weird and i dont know how to resolve this.
Its not my first time that i have problems with outliers that the DESeq2 not filter, i tried have a solid answer for my problem, but with no sucess.
So the experiment, or more likely the model that i use is ~RIN+group.
I know about the continuous variable the automatic filtering is off (ref: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#approach-to-count-outliers) in those cases Love et al recommend to use the assays(dds)[["cooks"]]
"perform a manual visualization and filtering if necessary" but i am suck!
My experiment is:
RIN group
DRG-PNI-1 7.7 SN_Injury
DRG-PNI-2 7.9 SN_Injury
DRG-PNI-4 6.9 SN_Injury
DRG-PNI-5 6.8 SN_Injury
DRG-PNIsh-1 6.9 SN_Sham
DRG-PNIsh-2 8.2 SN_Sham
DRG-PNIsh-3 7 SN_Sham
DRG-PNIsh-4 8.2 SN_Sham
MN-PNI-2 8.7 MN_Injury
MN-PNI-3 8 MN_Injury
MN-PNI-4 7.5 MN_Injury
MN-PNI-5 8.1 MN_Injury
MN-PNIsh-1 7.2 MN_Sham
MN-PNIsh-2 7.2 MN_Sham
MN-PNIsh-3 7.4 MN_Sham
My code is:
Code should be placed in three backticks as shown below
data$group<-relevel(data$group, ref="SN_Sham")
dds<-DESeqDataSetFromMatrix(countData = table, colData = data, design= ~RIN + group)
dds$RIN <- dds$RIN / sd(dds$RIN)
dds<-estimateSizeFactors(dds, controlGenes=index)
keep <- rowSums(counts(dds) >= 10) >= 4
ddsClean <- dds[keep,]
ddsClean <- estimateDispersions(ddsClean)
dds <- nbinomWaldTest(ddsClean, maxit=10000)
#table(mcols(dds)$betaConv)
keep2<-which(mcols(dds)$betaConv=="TRUE")
dds<-dds[keep2,]
dds
res <- results(dds, name=c("group_SN_Sham_vs_MN_Sham"), pAdjustMethod="fdr", alpha=0.05)
In this comparison, i have 2 genes that are considered significative:
log2 fold change (MLE): group SN Injury vs SN Sham
Wald test p-value: group SN Injury vs SN Sham
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSRNOT00000007365|ENSRNOG00000005286|Coch 13.6013 -11.7896 3.68177 -3.20215 0.00136407 0.0342551
ENSRNOT00000019133|ENSRNOG00000014233|Krt19 21.8576 -15.3058 3.79164 -4.03673 5.42012e-05 0.00283855
But looking for the matrix of counts all samples from the this comparison have zero counts:
DRG-PNI-1 DRG-PNI-2 DRG-PNI-4 DRG-PNI-5 DRG-PNIsh-1 DRG-PNIsh-2 DRG-PNIsh-3 DRG-PNIsh-4 MN-PNI-2 MN-PNI-3 MN-PNI-4 MN-PNI-5 MN-PNIsh-1 MN-PNIsh-2 MN-PNIsh-3
ENSRNOT00000007365|ENSRNOG00000005286|Coch 0 0 0 0 0 0 0 0 109 13 10 7 0 0 289
ENSRNOT00000019133|ENSRNOG00000014233|Krt19 0 0 0 0 0 0 0 0 14 195 138 50 0 0 246
I will be very appreciate if anyone help me.
Thanks in advance, Andreia
Thanks Michael for your reply.
I checked the colData and everything is ok.
I tested the approach from an older post [DESeq2] Cook's distance flagging for continuous variables and i couldnt get any idea which was the best threshold to apply, so i end up after talking with my boss that i will filtering manually after the results() the genes that have rowsums < 5 (for the comparison samples) and remove the genes that have counts in only one sample.
Not very secure of this approach, but i really need to move on...
So when you make plotCounts, what do you see? DESeq2 will always give you a 0 LFC for a gene when two groups that have 0, unless there is a sample assignment bug. What version of DESeq2 do you have, see sessionInfo().