Question

FDR in DESeq2

0

Entering edit mode

Kisun Pokharel ▴ 10

@kisun-pokharel-5960

Last seen 9.6 years ago

Hello Mike, I tried to modify your code for my data but ended up with FDR values which are always same i.e 0.9999609. Could you please tell me where I am doing wrong in the following code? You have explanined the pasila package but how will it be when I have two plain text files, where first column contains ensembl ids and second column contains counts? I have no solid R background and your help would be appreciated. Thanks! directory<-"~/Kisun/Data/Untrimmed/Deseq/extdata" sampleFiles<-list.files(directory) sampleCondition<-c("Untreated","Treated") sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition) ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition) colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("Untreated", "Treated")) ddsHTSeq dds<-DESeq(ddsHTSeq) res<-results(dds) res<-res[order(res$FDR),] head(res) Kind Regards Kisun -- Kisun Pokharel, M.Sc. MTT Agrifood Research Finland Biotechnology and Food Research Myllytie 1, FI-31600 Jokioinen +358503614763 mtt.fi/experts/kisun_pokharel[1] twitter.com/kisunpokharel[2] -------- [1] mtt.fi/experts/kisun_pokharel [2] twitter.com/kisunpokharel [[alternative HTML version deleted]]

• 2.7k views

ADD COMMENT • link updated 10.9 years ago by Michael Love 41k • written 10.9 years ago by Kisun Pokharel ▴ 10

score 2 · Answer 1 · 2013-05-28

hi Kisun, The short answer is that an experiment with no replicates is underpowered to find differences. Under the null hypothesis, the p-values â â should be distributed uniformly from 0 to 1. So if your p-values are distributed â â approximately â â uniformly from 0 to 1, which you can check with: hist(res$pvalue) then it makes sense that the FDR will be near 1. The fact that many FDRs can be all the same is a consequence of the Benjamini Hochberg method, see the discussion here: http://permalink.gmane.org/gmane.science.biology.informatics.conductor /47450 With only two samples (control and treatment), the dispersion is estimated by putting the two samples together as if they were replicates. So true differences, if there are any, between control and treatment will drive up the dispersion estimates and reduce the likelihood of rejection of the null hypothesis. This is mentioned in ?estimateDispersions: â â estimateDispersions checks for the case of an analysis with as many samples as the number â â of coefï¬cients to ï¬t, and will temporarily substitute a design formula ~ 1 for the purposes of â â dispersion estimation. This treats the samples as replicates for the purpose of dispersion estimation. â â â â As mentioned in the DESeq paper: "While one may not want to draw strong conclusions from such â â an analysis, it may still be useful for exploration and hypothesis generation." ââ âYou can also check ?plotMA which allows you to visualize the log2 fold changes over the mean of read counts.â âA side note, the "/extdata" part is only a necessary part of the R package directory structure, where the data is stored which is processed by the vignettes. You can have your count files in any directory you wish.â Mike On Tue, May 28, 2013 at 1:42 PM, Kisun Pokharel <kisun.pokharel@helsinki.fi> wrote: > > Hello Mike, > > > > I tried to modify your code for my data but ended up with FDR values which are > > always same i.e 0.9999609. Could you please tell me where I am doing wrong in > > the following code? You have explanined the pasila package but how will it be > > when I have two plain text files, where first column contains ensembl ids and > > second column contains counts? I have no solid R background and your help > > would be appreciated. > > Thanks! > > > > > > directory<-"~/Kisun/Data/Untrimmed/Deseq/extdata" > > sampleFiles<-list.files(directory) > > sampleCondition<-c("Untreated","Treated") > > sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, > > condition=sampleCondition) > > ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, > > directory=directory, design=~condition) > > colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, > > levels=c("Untreated", "Treated")) > > ddsHTSeq > > dds<-DESeq(ddsHTSeq) > > res<-results(dds) > > res<-res[order(res$FDR),] > > head(res) > > > > Kind Regards > > Kisun > > > > -- > > Kisun Pokharel, M.Sc. > > MTT Agrifood Research Finland > > Biotechnology and Food Research > > Myllytie 1, FI-31600 Jokioinen > > +358503614763 > > mtt.fi/experts/kisun_pokharel > > twitter.com/kisunpokharel [[alternative HTML version deleted]]