Search
Question: FDR in DESeq2
0
gravatar for Kisun Pokharel
5.5 years ago by
Kisun Pokharel10 wrote:
Hello Mike, I tried to modify your code for my data but ended up with FDR values which are always same i.e 0.9999609. Could you please tell me where I am doing wrong in the following code? You have explanined the pasila package but how will it be when I have two plain text files, where first column contains ensembl ids and second column contains counts? I have no solid R background and your help would be appreciated. Thanks! directory<-"~/Kisun/Data/Untrimmed/Deseq/extdata" sampleFiles<-list.files(directory) sampleCondition<-c("Untreated","Treated") sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition) ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition) colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("Untreated", "Treated")) ddsHTSeq dds<-DESeq(ddsHTSeq) res<-results(dds) res<-res[order(res$FDR),] head(res) Kind Regards Kisun -- Kisun Pokharel, M.Sc. MTT Agrifood Research Finland Biotechnology and Food Research Myllytie 1, FI-31600 Jokioinen +358503614763 mtt.fi/experts/kisun_pokharel[1] twitter.com/kisunpokharel[2] -------- [1] mtt.fi/experts/kisun_pokharel [2] twitter.com/kisunpokharel [[alternative HTML version deleted]]
ADD COMMENTlink modified 5.5 years ago by Michael Love20k • written 5.5 years ago by Kisun Pokharel10
2
gravatar for Michael Love
5.5 years ago by
Michael Love20k
United States
Michael Love20k wrote:
hi Kisun, The short answer is that an experiment with no replicates is underpowered to find differences. Under the null hypothesis, the p-values ​ ​ should be distributed uniformly from 0 to 1. So if your p-values are distributed ​ ​ approximately ​ ​ uniformly from 0 to 1, which you can check with: hist(res$pvalue) then it makes sense that the FDR will be near 1. The fact that many FDRs can be all the same is a consequence of the Benjamini Hochberg method, see the discussion here: http://permalink.gmane.org/gmane.science.biology.informatics.conductor /47450 With only two samples (control and treatment), the dispersion is estimated by putting the two samples together as if they were replicates. So true differences, if there are any, between control and treatment will drive up the dispersion estimates and reduce the likelihood of rejection of the null hypothesis. This is mentioned in ?estimateDispersions: ​ ​ estimateDispersions checks for the case of an analysis with as many samples as the number ​ ​ of coefficients to fit, and will temporarily substitute a design formula ~ 1 for the purposes of ​ ​ dispersion estimation. This treats the samples as replicates for the purpose of dispersion estimation. ​ ​ ​ ​ As mentioned in the DESeq paper: "While one may not want to draw strong conclusions from such ​ ​ an analysis, it may still be useful for exploration and hypothesis generation." ​​ ​You can also check ?plotMA which allows you to visualize the log2 fold changes over the mean of read counts.​ ​A side note, the "/extdata" part is only a necessary part of the R package directory structure, where the data is stored which is processed by the vignettes. You can have your count files in any directory you wish.​ Mike On Tue, May 28, 2013 at 1:42 PM, Kisun Pokharel <kisun.pokharel@helsinki.fi> wrote: > > Hello Mike, > > > > I tried to modify your code for my data but ended up with FDR values which are > > always same i.e 0.9999609. Could you please tell me where I am doing wrong in > > the following code? You have explanined the pasila package but how will it be > > when I have two plain text files, where first column contains ensembl ids and > > second column contains counts? I have no solid R background and your help > > would be appreciated. > > Thanks! > > > > > > directory<-"~/Kisun/Data/Untrimmed/Deseq/extdata" > > sampleFiles<-list.files(directory) > > sampleCondition<-c("Untreated","Treated") > > sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, > > condition=sampleCondition) > > ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, > > directory=directory, design=~condition) > > colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, > > levels=c("Untreated", "Treated")) > > ddsHTSeq > > dds<-DESeq(ddsHTSeq) > > res<-results(dds) > > res<-res[order(res$FDR),] > > head(res) > > > > Kind Regards > > Kisun > > > > -- > > Kisun Pokharel, M.Sc. > > MTT Agrifood Research Finland > > Biotechnology and Food Research > > Myllytie 1, FI-31600 Jokioinen > > +358503614763 > > mtt.fi/experts/kisun_pokharel > > twitter.com/kisunpokharel [[alternative HTML version deleted]]
ADD COMMENTlink written 5.5 years ago by Michael Love20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 360 users visited in the last hour