Getting DESEq results on just one gene
1
0
Entering edit mode
@fong-chun-chan-5706
Last seen 9.6 years ago
Hi, This is somewhat of a odd request to run DESeq. Basically, I am interested finding whether a single gene is differentially expressed between two groups of RNA-seq libraries. I could run DESeq just to get the differential expression of that one gene. But that seems to be computationally expensive just to find whether a single gene is differentially expressed. I understand that DESeq needs to estimateSizeFactors() and estimateDispersions() on the entire library using all the reads. But is there a way in the nbinomTest() to just restrict the analyses to just one gene? One way I was thinking was to create another CountDataSet with just that one gene, and then fill in the slots of sizeFactors and DIspersions with the data from the whole CountDataSet. Has anyone else tried to do this? Thanks, Fong [[alternative HTML version deleted]]
DESeq DESeq • 1.6k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States
Hi, On Friday, February 1, 2013, Fong Chun Chan wrote: > Hi, > > This is somewhat of a odd request to run DESeq. Basically, I am interested > finding whether a single gene is differentially expressed between two > groups of RNA-seq libraries. I could run DESeq just to get the differential > expression of that one gene. But that seems to be computationally expensive > just to find whether a single gene is differentially expressed. Have you tried doing it on all of your data? I suspect it won't take much time at all. Is there another reason that you want to be blind to the rest of the changes that are happening in your data? > I understand that DESeq needs to estimateSizeFactors() and > estimateDispersions() on the entire library using all the reads. But is > there a way in the nbinomTest() to just restrict the analyses to just one > gene? One way I was thinking was to create another CountDataSet with just > that one gene, and then fill in the slots of sizeFactors and DIspersions > with the data from the whole CountDataSet. > > Has anyone else tried to do this? > > Thanks, > > Fong > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <javascript:;> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Steve, A little bit of background as to the biological question would actually explain this odd request. Basically, I am trying to see if copy number aberrations for a given gene across n number of samples is affecting its expression level (defined as cis-correlated). And so for each gene, I have x number of cases that are deleted/amplification and y number of cases that are neutral. So essentially I just want to know if the cases that have deletion/amplification have expression different than those that are neutral. It's a classical differential expression test between two groups, but I am only interested in the expression of that gene in the context of cases with aberration vs. neutral. So while I could run DESeq for just those two groups to find the expression of one gene, what if I wanted to actually investigate the cis-correlation across all genes (say my search space is 15000). Would I have to run DESeq 15000 times with a different grouping of samples each time just to get the cis-correlation status of these 15000 genes? Seems computational expensive. This is why I was wondering if I could run DESeq for just one gene, but all the information across all samples to estimateSizeFactors, estimateDispersions, etc. This would save on complexity and significant computational time. Fong On Fri, Feb 1, 2013 at 6:37 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi, > > > On Friday, February 1, 2013, Fong Chun Chan wrote: > >> Hi, >> >> This is somewhat of a odd request to run DESeq. Basically, I am interested >> finding whether a single gene is differentially expressed between two >> groups of RNA-seq libraries. I could run DESeq just to get the >> differential >> expression of that one gene. But that seems to be computationally >> expensive >> just to find whether a single gene is differentially expressed. > > > Have you tried doing it on all of your data? I suspect it won't take much > time at all. > > Is there another reason that you want to be blind to the rest of the > changes that are happening in your data? > > > >> I understand that DESeq needs to estimateSizeFactors() and >> estimateDispersions() on the entire library using all the reads. But is >> there a way in the nbinomTest() to just restrict the analyses to just one >> gene? One way I was thinking was to create another CountDataSet with just >> that one gene, and then fill in the slots of sizeFactors and DIspersions >> with the data from the whole CountDataSet. >> >> Has anyone else tried to do this? >> >> Thanks, >> >> Fong >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Fong 1. Yes, you can subset to just a single gene. You would run estimateDispersion on the whole data set, with method="blind", because you cannot divide up your data yet (as the copy-number genotype changes from gene to gene). For the test, just subset to a single gene with something like: cds1 <- cds[ mygene, ] conditions(cds1) <- copy_number_genotype_for_mygene nbinomTest( cds1, "A", "B" ) I haven't tested this. So if subseting to a _single_ row does not work, ask again, because these "drop=FALSE" extra options tend to often be missing, when you need them. 2. As you know, DESeq is optimized for working with small sample numbers. Once you have many samples, a non-parametric, permutation- based test often gives better result. You didn't say how many samples you have but given that you won't have much power otherwise, I guess it will be rather more than a dozen. And then, I would tend to also use a permutation-based test rather than DESeq. In fact, I did a very similar analysis a while ago: We determined copy numbers of genes in the HapMap subjects and then wanted to know whether there is dosage compensation or not, i.e., whether subjects with a duplicate copy of a given gene have or have not twice the expression, too. Maybe have a look at our paper: "Relating CNVs to transcriptome data at fine resolution: Assessment of the effect of variant size, type, and overlap with functional regions" Andreas Schlattl, Simon Anders, Sebastian M. Waszak, Wolfgang Huber, Jan O. Korbel. Genome Research 21 (2011) 2004-2013; doi: 10.1101/gr.122614.111 Simon
ADD REPLY

Login before adding your answer.

Traffic: 899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6