Entering edit mode
mary.cho753
▴
10
@marycho753-16544
Last seen 6.4 years ago
Suppose you have 4 treated and 4 untreated samples.
On avg if all 8 columns are zero Deseq2 will output NA across the results for, baseMean log2FoldChange lfcSE stat pvalue.
However, in a few cases I can see numbers for baseMean log2FoldChange lfcSE stat pvalue (though padj will be NA) when all 8 columns are zero.
Is this possible?
Thanks.
|
Thanks for your reply. I figured out at least what has happened.
If I have 16 samples and I normalize them together. (At least I think I am normalizing them together)
Then when I contrast samples ac (1,2,3,4) and an (5,6,7,8)
If some rows have zeros for raw counts 0,0,0,0,0,0,0,0 - I expected NA for baseMean log2FoldChange lfcSE stat pvalue. This is true the majority of the time.
However, if sample 15 has a normalized value of 3.6 (example 0,0,0,0,0,0,0,0,0,0,0,0,0,3.6,0, then I will get values for baseMean log2FoldChange lfcSE stat pvalue even
when comparing ac and an with the contrast function. Here I thought I was comparing (0,0,0,0,0,0,0,0) and not (0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.6,0) Please see code below. Please let me know if I was not clear of what happened. Is there a mistake in my code or design table? or is this just something that I ignore. Thanks for your quick reply.
Design table
####################################################################
library( "DESeq2" )
setwd ("C:/dseq/data")
getwd()
countData <-read.table("C:/dseq/data/my_data.csv",header=T,sep=",")
head(countData)
colData1 <-read.table("C:/dseq/data/cond.csv",header=T,sep=",")
head (colData1)
colData<-data.frame(colData1)[,c("id","condition")]
dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData,design=~condition)
dds$condition<-factor(dds$condition,levels=c("ac","an","zc","zn"))
ddsresults<-DESeq(dds)
test <- results(ddsresults, contrast=c("condition","an","ac"))
plotMA(test,main="DESeq2",ylim=c(-6,6))
write.csv (test, "results_ac_vs_an.csv")
I think you can safely ignore this. Unless the counts are all zero then DESeq2 does compute statistics and the mean is not equal to zero. You can filter these genes out manually if you like.
Hi Michael,
I got a similar situation. Let's say I use test <- results(ddsresults, contrast=c("condition","an","ac")), I got a very low padj (<0.0001) value for this gene. Normally, I think a gene is differently expressed if its padj < 0.05, however, this dose not make sense in this case. Could you please give some suggestions how to deal with this?
Thanks!
What is the issue again?
Are you using an LRT or Wald test here?
If you use a Wald test the LFC will be zero and the pvalue equal to 1.
Hi Michael,
My experiment has two genotypes, two tissues, five time points and three biological replicates, so in total 60 samples. And I combined the genotype, tissue and time point as one factor (group in the design) for comparison.
sample_info
my code for DE analysis is
The txi object is created by tximport package, and Salmon was used for reads quantification. compare the first two groups
The padj values of these genes are smaller than 0.05, however, when I look at the count, you can see that all the counts values of these six genes in the first six samples are zero, so they should not be DEGs in this comparison.
Could you please give a clue what's happening here? I can send the example data and code to you if you need. Thanks
If you specify the comparison as a
contrast
inresults
it will just set these Wald test pvalues to 1.Or you can filter out these genes based on small shrunken LFC, e.g. filtering with
x[abs(x$log2FoldChange > 0.1),]
. Or you could usesvalue=TRUE
in the lfcShrink function as well which would then give large s-value (aggregate false sign rate) to these genes.For example:
Hi Michael,
Thank you very much for the quick reply!
All your suggestions works, and I choose the first one because it is the easiest one for me.
One thing I noticed is if I use
coef
in the lfcShrink function, I have to specify theres=
argument in the meantime, otherwise it still give very lowpadj
value (which I think also make sense, but normally I don't specify this if I already providecoef
).Yes this is because lfcShrink only produces LFC, posterior SD and svalues (optionally). The table with pvalue has to be either passed in as res, or built internally with a call to results().