Differential expresson in more than 2 samples using NGS?
5
0
Entering edit mode
Xiaohui Wu ▴ 280
@xiaohui-wu-4141
Last seen 9.6 years ago
Hi all, I have about 30 libraries of SBS data (millions of 20nt tags) to analyze the differences between or among different libraries, and lots of these tags are in intergenic regions. For gene regions, I think I can use DESeq or EdgeR to analyze the DE genes. But it seems that DESeq or EdgeR can only deal with two samples, is there any package to compare multiple samples one time. For example, to find genes expressed highly in one or some libraries but not in other libs. But for intergenic tags, I think first I should use some peak detection package to find peak in intergenic, then treat these peaks as genes to find DE regions. Is there any peak detection package for NGS? and package for DE analysis among multiple libs? Thank you! Regards, Xiaohui [[alternative HTML version deleted]]
edgeR DESeq edgeR DESeq • 1.6k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 days ago
United States
On 08/24/2010 09:49 AM, Xiaohui Wu wrote: > Hi all, > > > I have about 30 libraries of SBS data (millions of 20nt tags) to > analyze the differences between or among different libraries, and > lots of these tags are in intergenic regions. > > For gene regions, I think I can use DESeq or EdgeR to analyze the DE > genes. But it seems that DESeq or EdgeR can only deal with two > samples, is there any package to compare multiple samples one time. > For example, to find genes expressed highly in one or some libraries > but not in other libs. > > But for intergenic tags, I think first I should use some peak > detection package to find peak in intergenic, then treat these peaks > as genes to find DE regions. > > Is there any peak detection package for NGS? and package for DE > analysis among multiple libs? If your starting point is BAM files of ungapped alignments and you're looking for flexibility in peak calling, you might start with Rsamtools::scanBam() to extract the position and width of each alignment, manipulate that into a GRanges object, use IRanges::coverage() and IRanges::slice() and friends to identify and summarize peaks. It's unclear whether you mean more than two samples (handled by edgeR and DESeq, I think) or more than one factor with two levels; in the latter an approach is to use the normalization and transformation methods offered by either of the packages (e.g., getVarianceStabilizedData from DESeq, I think), and to analyze these with standard R methods on the hopes that the data is normal and homoscedastic enough. Hopefully others will answer with better advice. Martin > > Thank you! > > Regards, Xiaohui > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Xiaohui Wu ▴ 280
@xiaohui-wu-4141
Last seen 9.6 years ago
Hi Martin, Thank you very much for your response. I'm reading the chipseq mannual now, it introduces peak detection process as you suggested like slice(). What I mean multiple samples is: for example, I have 8 libs for 4 tissues, each tissue has two replicates. And I want to know what DE genes are among these 4 tissues. If I need to compare two tissues each time to find DE gene between these two tissues, then for 4 tissues, I need to compare C(4,2)=6 times to get any DE genes between each two of the 4 tissues. So I want to know whether there is any tool can compare many samples one time. Xiaohui ------------------------------------------------------------- On 08/24/2010 09:49 AM, Xiaohui Wu wrote: > Hi all, > > > I have about 30 libraries of SBS data (millions of 20nt tags) to > analyze the differences between or among different libraries, and > lots of these tags are in intergenic regions. > > For gene regions, I think I can use DESeq or EdgeR to analyze the DE > genes. But it seems that DESeq or EdgeR can only deal with two > samples, is there any package to compare multiple samples one time. > For example, to find genes expressed highly in one or some libraries > but not in other libs. > > But for intergenic tags, I think first I should use some peak > detection package to find peak in intergenic, then treat these peaks > as genes to find DE regions. > > Is there any peak detection package for NGS? and package for DE > analysis among multiple libs? If your starting point is BAM files of ungapped alignments and you're looking for flexibility in peak calling, you might start with Rsamtools::scanBam() to extract the position and width of each alignment, manipulate that into a GRanges object, use IRanges::coverage() and IRanges::slice() and friends to identify and summarize peaks. It's unclear whether you mean more than two samples (handled by edgeR and DESeq, I think) or more than one factor with two levels; in the latter an approach is to use the normalization and transformation methods offered by either of the packages (e.g., getVarianceStabilizedData from DESeq, I think), and to analyze these with standard R methods on the hopes that the data is normal and homoscedastic enough. Hopefully others will answer with better advice. Martin > > Thank you! > > Regards, Xiaohui > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 .
ADD COMMENT
0
Entering edit mode
Hi Xiaohui You could look at the segmentSeq package as an alternative to a peak finding package, particularly if your intergenic data are flat or flatter than ChIP-seq data. baySeq (Hardcastle and Kelly, BMC Bioinformatics 2010, 11:422) will allow you to look for differential expression in more complex designs with multiple samples. Cheers Krys -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Xiaohui Wu Sent: 24 August 2010 21:28 To: Martin Morgan Cc: bioconductor Subject: Re: [BioC] Differential expresson in more than 2 samples using NGS? Hi Martin, Thank you very much for your response. I'm reading the chipseq mannual now, it introduces peak detection process as you suggested like slice(). What I mean multiple samples is: for example, I have 8 libs for 4 tissues, each tissue has two replicates. And I want to know what DE genes are among these 4 tissues. If I need to compare two tissues each time to find DE gene between these two tissues, then for 4 tissues, I need to compare C(4,2)=6 times to get any DE genes between each two of the 4 tissues. So I want to know whether there is any tool can compare many samples one time. Xiaohui ------------------------------------------------------------- On 08/24/2010 09:49 AM, Xiaohui Wu wrote: > Hi all, > > > I have about 30 libraries of SBS data (millions of 20nt tags) to > analyze the differences between or among different libraries, and > lots of these tags are in intergenic regions. > > For gene regions, I think I can use DESeq or EdgeR to analyze the DE > genes. But it seems that DESeq or EdgeR can only deal with two > samples, is there any package to compare multiple samples one time. > For example, to find genes expressed highly in one or some libraries > but not in other libs. > > But for intergenic tags, I think first I should use some peak > detection package to find peak in intergenic, then treat these peaks > as genes to find DE regions. > > Is there any peak detection package for NGS? and package for DE > analysis among multiple libs? If your starting point is BAM files of ungapped alignments and you're looking for flexibility in peak calling, you might start with Rsamtools::scanBam() to extract the position and width of each alignment, manipulate that into a GRanges object, use IRanges::coverage() and IRanges::slice() and friends to identify and summarize peaks. It's unclear whether you mean more than two samples (handled by edgeR and DESeq, I think) or more than one factor with two levels; in the latter an approach is to use the normalization and transformation methods offered by either of the packages (e.g., getVarianceStabilizedData from DESeq, I think), and to analyze these with standard R methods on the hopes that the data is normal and homoscedastic enough. Hopefully others will answer with better advice. Martin > > Thank you! > > Regards, Xiaohui > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 . _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Xiaohui Wu ▴ 280
@xiaohui-wu-4141
Last seen 9.6 years ago
Hi Krys, Thank you very much! It seems segmentSeq and baySeq are good to solve my problem, I'll have a try. Xiaohui ------------------------------------------------------------- ????Krys Kelly ?????2010-08-25 11:26:04 ????Wu, Xiaohui Ms. ???'bioconductor' ???RE: [BioC] Differential expresson in more than 2 samples using NGS? Hi Xiaohui You could look at the segmentSeq package as an alternative to a peak finding package, particularly if your intergenic data are flat or flatter than ChIP-seq data. baySeq (Hardcastle and Kelly, BMC Bioinformatics 2010, 11:422) will allow you to look for differential expression in more complex designs with multiple samples. Cheers Krys -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Xiaohui Wu Sent: 24 August 2010 21:28 To: Martin Morgan Cc: bioconductor Subject: Re: [BioC] Differential expresson in more than 2 samples using NGS? Hi Martin, Thank you very much for your response. I'm reading the chipseq mannual now, it introduces peak detection process as you suggested like slice(). What I mean multiple samples is: for example, I have 8 libs for 4 tissues, each tissue has two replicates. And I want to know what DE genes are among these 4 tissues. If I need to compare two tissues each time to find DE gene between these two tissues, then for 4 tissues, I need to compare C(4,2)=6 times to get any DE genes between each two of the 4 tissues. So I want to know whether there is any tool can compare many samples one time. Xiaohui ------------------------------------------------------------- On 08/24/2010 09:49 AM, Xiaohui Wu wrote: > Hi all, > > > I have about 30 libraries of SBS data (millions of 20nt tags) to > analyze the differences between or among different libraries, and > lots of these tags are in intergenic regions. > > For gene regions, I think I can use DESeq or EdgeR to analyze the DE > genes. But it seems that DESeq or EdgeR can only deal with two > samples, is there any package to compare multiple samples one time. > For example, to find genes expressed highly in one or some libraries > but not in other libs. > > But for intergenic tags, I think first I should use some peak > detection package to find peak in intergenic, then treat these peaks > as genes to find DE regions. > > Is there any peak detection package for NGS? and package for DE > analysis among multiple libs? If your starting point is BAM files of ungapped alignments and you're looking for flexibility in peak calling, you might start with Rsamtools::scanBam() to extract the position and width of each alignment, manipulate that into a GRanges object, use IRanges::coverage() and IRanges::slice() and friends to identify and summarize peaks. It's unclear whether you mean more than two samples (handled by edgeR and DESeq, I think) or more than one factor with two levels; in the latter an approach is to use the normalization and transformation methods offered by either of the packages (e.g., getVarianceStabilizedData from DESeq, I think), and to analyze these with standard R methods on the hopes that the data is normal and homoscedastic enough. Hopefully others will answer with better advice. Martin > > Thank you! > > Regards, Xiaohui > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 . _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor .
ADD COMMENT
0
Entering edit mode
Xiaohui Wu ▴ 280
@xiaohui-wu-4141
Last seen 9.6 years ago
Hi Gordon, Thank you for your response. Yes, you are right, maybe I was unclear about the sample or condition. What I said 'one sample' means one condition with multiple replicates. Do you mean I need to compare two conditions each time using edgeR, and then make the conclusions about the final DE genes among all conditions? Xiaohui ·¢¼þÈË£º Gordon K Smyth ·¢ËÍʱ¼ä£º 2010-08-25 19:56:19 ÊÕ¼þÈË£º Bioconductor mailing list ³­ËÍ£º Wu, Xiaohui Ms. Ö÷Ì⣺ [BioC] Differential expresson in more than 2 samples using NGS? Dear Xiaohui, I suspect you mean more than 2 groups or conditions rather than 2 samples. edgeR already handles any number of groups. If you want to find genes highly expressed in one condition but not in the others, surely you need to make pairwise comparisons between the conditions, and that is exactly what edgeR does. In the next month, we will be adding linear model capabilities to edgeR, but it sounds to me as if the package will already address your problem as it is. Best wishes Gordon > Date: Tue, 24 Aug 2010 12:49:08 -0400 > From: "Xiaohui Wu" <wux3@muohio.edu> > To: "bioconductor" <bioconductor@stat.math.ethz.ch> > Subject: [BioC] Differential expresson in more than 2 samples using > NGS? > > Hi all, > > > I have about 30 libraries of SBS data (millions of 20nt tags) to analyze > the differences between or among different libraries, and lots of these > tags are in intergenic regions. > > For gene regions, I think I can use DESeq or EdgeR to analyze the DE > genes. But it seems that DESeq or EdgeR can only deal with two samples, > is there any package to compare multiple samples one time. For example, > to find genes expressed highly in one or some libraries but not in other > libs. > > But for intergenic tags, I think first I should use some peak detection > package to find peak in intergenic, then treat these peaks as genes to > find DE regions. > > Is there any peak detection package for NGS? and package for DE analysis > among multiple libs? > > Thank you! > > Regards, > Xiaohui ______________________________________________________________________ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. ______________________________________________________________________ . [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Dear Xiaohui, Yes. Suppose you want to show that a particular gene is up-regulated in condition 3 relativve to conditions 1, 2 and 4. It seems to me that you have to compare conditions 3 vs 1, 3 vs 2 and 3 vs 4, in order to establish this. Best wishes Gordon On Wed, 25 Aug 2010, Xiaohui Wu wrote: > Hi Gordon, > > Thank you for your response. > Yes, you are right, maybe I was unclear about the sample or condition. > What I said 'one sample' means one condition with multiple replicates. > Do you mean I need to compare two conditions each time using edgeR, and > then make the conclusions about the final DE genes among all conditions? > > Xiaohui > > > > > ???????? Gordon K Smyth > ?????????? 2010-08-25 19:56:19 > ???????? Bioconductor mailing list > ?????? Wu, Xiaohui Ms. > ?????? [BioC] Differential expresson in more than 2 samples using NGS? > > Dear Xiaohui, > I suspect you mean more than 2 groups or conditions rather than 2 samples. > edgeR already handles any number of groups. If you want to find genes > highly expressed in one condition but not in the others, surely you need > to make pairwise comparisons between the conditions, and that is exactly > what edgeR does. > In the next month, we will be adding linear model capabilities to edgeR, > but it sounds to me as if the package will already address your problem as > it is. > Best wishes > Gordon >> Date: Tue, 24 Aug 2010 12:49:08 -0400 >> From: "Xiaohui Wu" <wux3 at="" muohio.edu=""> >> To: "bioconductor" <bioconductor at="" stat.math.ethz.ch=""> >> Subject: [BioC] Differential expresson in more than 2 samples using >> NGS? >> >> Hi all, >> >> >> I have about 30 libraries of SBS data (millions of 20nt tags) to analyze >> the differences between or among different libraries, and lots of these >> tags are in intergenic regions. >> >> For gene regions, I think I can use DESeq or EdgeR to analyze the DE >> genes. But it seems that DESeq or EdgeR can only deal with two samples, >> is there any package to compare multiple samples one time. For example, >> to find genes expressed highly in one or some libraries but not in other >> libs. >> >> But for intergenic tags, I think first I should use some peak detection >> package to find peak in intergenic, then treat these peaks as genes to >> find DE regions. >> >> Is there any peak detection package for NGS? and package for DE analysis >> among multiple libs? >> >> Thank you! >> >> Regards, >> Xiaohui > ______________________________________________________________________ > The information in this email is confidential and intended solely for the addressee. > You must not disclose, forward, print or use it without the permission of the sender. > ______________________________________________________________________ > . >
ADD REPLY
0
Entering edit mode
Xiaohui Wu ▴ 280
@xiaohui-wu-4141
Last seen 9.6 years ago
Thank you Gordon, now I see. Regards, Xiaohui ------------------------------------------------------------- Dear Xiaohui, Yes. Suppose you want to show that a particular gene is up-regulated in condition 3 relativve to conditions 1, 2 and 4. It seems to me that you have to compare conditions 3 vs 1, 3 vs 2 and 3 vs 4, in order to establish this. Best wishes Gordon On Wed, 25 Aug 2010, Xiaohui Wu wrote: > Hi Gordon, > > Thank you for your response. > Yes, you are right, maybe I was unclear about the sample or condition. > What I said 'one sample' means one condition with multiple replicates. > Do you mean I need to compare two conditions each time using edgeR, and > then make the conclusions about the final DE genes among all conditions? > > Xiaohui > > > > > ???????? Gordon K Smyth > ?????????? 2010-08-25 19:56:19 > ???????? Bioconductor mailing list > ?????? Wu, Xiaohui Ms. > ?????? [BioC] Differential expresson in more than 2 samples using NGS? > > Dear Xiaohui, > I suspect you mean more than 2 groups or conditions rather than 2 samples. > edgeR already handles any number of groups. If you want to find genes > highly expressed in one condition but not in the others, surely you need > to make pairwise comparisons between the conditions, and that is exactly > what edgeR does. > In the next month, we will be adding linear model capabilities to edgeR, > but it sounds to me as if the package will already address your problem as > it is. > Best wishes > Gordon >> Date: Tue, 24 Aug 2010 12:49:08 -0400 >> From: "Xiaohui Wu" <wux3 at="" muohio.edu=""> >> To: "bioconductor" <bioconductor at="" stat.math.ethz.ch=""> >> Subject: [BioC] Differential expresson in more than 2 samples using >> NGS? >> >> Hi all, >> >> >> I have about 30 libraries of SBS data (millions of 20nt tags) to analyze >> the differences between or among different libraries, and lots of these >> tags are in intergenic regions. >> >> For gene regions, I think I can use DESeq or EdgeR to analyze the DE >> genes. But it seems that DESeq or EdgeR can only deal with two samples, >> is there any package to compare multiple samples one time. For example, >> to find genes expressed highly in one or some libraries but not in other >> libs. >> >> But for intergenic tags, I think first I should use some peak detection >> package to find peak in intergenic, then treat these peaks as genes to >> find DE regions. >> >> Is there any peak detection package for NGS? and package for DE analysis >> among multiple libs? >> >> Thank you! >> >> Regards, >> Xiaohui > ______________________________________________________________________ > The information in this email is confidential and intended solely for the addressee. > You must not disclose, forward, print or use it without the permission of the sender. > ______________________________________________________________________ > . >.
ADD COMMENT

Login before adding your answer.

Traffic: 580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6