Can DESeq do this?
1
0
Entering edit mode
Pet Chiang ▴ 30
@pet-chiang-6154
Last seen 7.7 years ago
I am working on my metagenomic data sets. I have annotated my metagenome against COG database. I would like to use DESeq to look for the overabundant genes in my site. Here is the problem, I only have one site (one metagenome). I would like to compare this one to different sites (each of these site has no replication too) the count data set looks like this: function name my site site1 (from US) site 2 (from Japan) site (from Iceland) ..... COG1 2(counts) 6 9 9 COG2 5 5 8 8 COG3 7 9 8 0 ..... I want to find if any of COG functions in my site is over- representative, which means the functional gene counts are overabundant across other sites. However, I am not sure DESeq can do this or not? If it can do this, how can I set the groups. Best regards, Ben [[alternative HTML version deleted]]
DESeq DESeq • 1.3k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States
hi Ben, One problem with this comparison is that, without replicates from at least one site, the statistical methods have no way of assessing the biological and technical variability of the experiment. Just like with a t-test, the question, is 2 < 6 depends on how much variability we expect from sampling again and again. For more information, read the paragraph "Experiments without replicates..." in ?DESeq. were the samples from the different sites prepared and sequenced at the same facility? as far as technical aspects of using DESeq/edgeR for metagenomics, Joey McMurdie has comprehensive instructions here: http://joey711.github.io/phyloseq/ Mike On Wed, Aug 6, 2014 at 1:42 PM, Pet Chiang <sdpapet at="" gmail.com=""> wrote: > I am working on my metagenomic data sets. > > I have annotated my metagenome against COG database. I would like to use > DESeq to look for the overabundant genes in my site. > > Here is the problem, I only have one site (one metagenome). I would like to > compare this one to different sites (each of these site has no replication > too) > > the count data set looks like this: > > function name my site site1 (from US) site 2 (from > Japan) site (from Iceland) ..... > COG1 2(counts) 6 > 9 9 > COG2 5 > 5 8 8 > COG3 7 > 9 8 0 > ..... > > I want to find if any of COG functions in my site is over- representative, > which means the functional gene counts are overabundant across other sites. > > However, I am not sure DESeq can do this or not? > > If it can do this, how can I set the groups. > > Best regards, > Ben > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Michael, No, they were sequenced from different places. So, you don't recommend to use DESeq to do the analysis? Ben On Wed, Aug 6, 2014 at 12:03 PM, Michael Love <michaelisaiahlove@gmail.com> wrote: > hi Ben, > > One problem with this comparison is that, without replicates from at > least one site, the statistical methods have no way of assessing the > biological and technical variability of the experiment. Just like with > a t-test, the question, is 2 < 6 depends on how much variability we > expect from sampling again and again. For more information, read the > paragraph "Experiments without replicates..." in ?DESeq. > > were the samples from the different sites prepared and sequenced at > the same facility? > > as far as technical aspects of using DESeq/edgeR for metagenomics, > Joey McMurdie has comprehensive instructions here: > http://joey711.github.io/phyloseq/ > > Mike > > On Wed, Aug 6, 2014 at 1:42 PM, Pet Chiang <sdpapet@gmail.com> wrote: > > I am working on my metagenomic data sets. > > > > I have annotated my metagenome against COG database. I would like to use > > DESeq to look for the overabundant genes in my site. > > > > Here is the problem, I only have one site (one metagenome). I would like > to > > compare this one to different sites (each of these site has no > replication > > too) > > > > the count data set looks like this: > > > > function name my site site1 (from US) site 2 > (from > > Japan) site (from Iceland) ..... > > COG1 2(counts) 6 > > 9 9 > > COG2 5 > > 5 8 8 > > COG3 7 > > 9 8 0 > > ..... > > > > I want to find if any of COG functions in my site is over- representative, > > which means the functional gene counts are overabundant across other > sites. > > > > However, I am not sure DESeq can do this or not? > > > > If it can do this, how can I set the groups. > > > > Best regards, > > Ben > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
it's not that I don't recommend DESeq, it's that I don't recommend a DE analysis for anything but exploration / hypothesis generation (we say so much in the paragraph I referred to above) especially considering that the sequencing facilities are different for the different samples. This is referred to as a batch effect, and in this case it is perfectly confounded with the condition of interest: the site of sampling. Suppose, you were to take the same sample, and prepare different libraries or perform sequencing at different facilities in the US, Japan, etc., and then perform statistical testing: you will often find many significant differences with such an analysis. So your dataset is a mix of batch effects and biologically interesting differences, which cannot be disentangled because they are perfectly confounded. Mike On Wed, Aug 6, 2014 at 2:08 PM, Pet Chiang <sdpapet at="" gmail.com=""> wrote: > Hi Michael, > > No, they were sequenced from different places. So, you don't recommend to > use DESeq to do the analysis? > > Ben > > > On Wed, Aug 6, 2014 at 12:03 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> > wrote: >> >> hi Ben, >> >> One problem with this comparison is that, without replicates from at >> least one site, the statistical methods have no way of assessing the >> biological and technical variability of the experiment. Just like with >> a t-test, the question, is 2 < 6 depends on how much variability we >> expect from sampling again and again. For more information, read the >> paragraph "Experiments without replicates..." in ?DESeq. >> >> were the samples from the different sites prepared and sequenced at >> the same facility? >> >> as far as technical aspects of using DESeq/edgeR for metagenomics, >> Joey McMurdie has comprehensive instructions here: >> http://joey711.github.io/phyloseq/ >> >> Mike >> >> On Wed, Aug 6, 2014 at 1:42 PM, Pet Chiang <sdpapet at="" gmail.com=""> wrote: >> > I am working on my metagenomic data sets. >> > >> > I have annotated my metagenome against COG database. I would like to use >> > DESeq to look for the overabundant genes in my site. >> > >> > Here is the problem, I only have one site (one metagenome). I would like >> > to >> > compare this one to different sites (each of these site has no >> > replication >> > too) >> > >> > the count data set looks like this: >> > >> > function name my site site1 (from US) site 2 >> > (from >> > Japan) site (from Iceland) ..... >> > COG1 2(counts) 6 >> > 9 9 >> > COG2 5 >> > 5 8 8 >> > COG3 7 >> > 9 8 0 >> > ..... >> > >> > I want to find if any of COG functions in my site is >> > over-representative, >> > which means the functional gene counts are overabundant across other >> > sites. >> > >> > However, I am not sure DESeq can do this or not? >> > >> > If it can do this, how can I set the groups. >> > >> > Best regards, >> > Ben >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY

Login before adding your answer.

Traffic: 733 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6