Analysing RNA-Seq data using DESeq package

0

Entering edit mode

@kayilai-suryavadhan-mu-student-4856

Last seen 9.6 years ago

I downloaded the DESeq package for the RNA seq analysis of the Soybean genes. The package is really helpful and easy to use. Thanks! I have a small doubt and it would be kind of you, if could help me figure out the same. The package works fine for the gene data with whole number or integer values. How can I run the analysis for decimal data as the class newCountDataset does not allow me to input decimal data. It would be great if you could help me through this. Thanks Regards Suryavadhan [[alternative HTML version deleted]]

DESeq DESeq • 1.8k views

ADD COMMENT • link updated 12.6 years ago by Steve Lianoglou ★ 13k • written 12.6 years ago by Kayilai, Suryavadhan MU-Student ▴ 20

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi Suryavadhan, On Tue, Sep 13, 2011 at 12:41 PM, Kayilai, Suryavadhan (MU-Student) <skhx5 at="" mail.missouri.edu=""> wrote: > I downloaded the DESeq package for the RNA seq analysis of the Soybean genes. The package is really helpful and easy to use. Thanks! I have a small doubt and it would be kind of you, if could help me figure out the same. > ? ? ? ? ? ? The package works fine for the gene data with whole number or integer values. How can I run the analysis for decimal data as the class newCountDataset does not allow me to input decimal data. It would be great if you could help me through this. It doesn't let you put in non-integer data, because the models DESeq uses to test for significance assumes count data -- as in, the number of reads that align to a given region, which can only ever be integers. What types of data are you trying to put in that are decimal values, anyway? What does it represent? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 12.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

FOr the 6 sequenced samples, we ran alignments to get expression estimates. The protocol is to align the reads, then count the number of reads falling within the boundaries of the annotated genes, then normalize with respect to the number of reads aligning in each sample(not the sample length). The analysis also attempts to capture the non uniquely aligning reads by estimating the unique read counts for each gene, then apportioning the ambiguously aligning reads among the potential sources based on the ratios of read counts among those sources established by the less ambiguous readsie the first round of apportioning assigns 2-mapped reads based on the unique alignments, then 3-mapped reads are apportioned based on the adjusted read counts, and so on). So, in the attached you'll see three sets of columns for the samples, with those head "unique" giving the per-million-reads-aligned normalized values for each samples uniquely aligned reads, "apportioned" using the adjusted values as described above, and "total" giving the number of reads aligned to the gene models without regard to their uniqueness. Note that in all cases, we consider only reads mapping to no more than 5 locations. Hence, the values that are in non integer forms. Kindly help me through this Suryavadhan ________________________________________ From: Steve Lianoglou [mailinglist.honeypot@gmail.com] Sent: Thursday, September 15, 2011 9:42 AM To: Kayilai, Suryavadhan (MU-Student) Cc: bioconductor at r-project.org Subject: Re: [BioC] Analysing RNA-Seq data using DESeq package Hi Suryavadhan, On Tue, Sep 13, 2011 at 12:41 PM, Kayilai, Suryavadhan (MU-Student) <skhx5 at="" mail.missouri.edu=""> wrote: > I downloaded the DESeq package for the RNA seq analysis of the Soybean genes. The package is really helpful and easy to use. Thanks! I have a small doubt and it would be kind of you, if could help me figure out the same. > The package works fine for the gene data with whole number or integer values. How can I run the analysis for decimal data as the class newCountDataset does not allow me to input decimal data. It would be great if you could help me through this. It doesn't let you put in non-integer data, because the models DESeq uses to test for significance assumes count data -- as in, the number of reads that align to a given region, which can only ever be integers. What types of data are you trying to put in that are decimal values, anyway? What does it represent? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 12.6 years ago Kayilai, Suryavadhan MU-Student ▴ 20

0

Entering edit mode

Dear Suryavadhan for normalisation between samples, please use the method described in the DESeq vignette, rather than the (information-losing) method described below. For the non-unique reads, DESeq has no provision for fuzzy or fractional alignments. You'll have to make a choice, and provide actual counts. Hope this helps Wolfgang Sep/15/11 6:58 PM, Kayilai, Suryavadhan (MU-Student) scripsit:: > FOr the 6 sequenced samples, we ran alignments to get expression > estimates. The protocol is to align the reads, then count the number > of reads falling within the boundaries of the annotated genes, then > normalize with respect to the number of reads aligning in each > sample(not the sample length). The analysis also attempts to capture > the non uniquely aligning reads by estimating the unique read counts > for each gene, then apportioning the ambiguously aligning reads among > the potential sources based on the ratios of read counts among those > sources established by the less ambiguous readsie the first round of > apportioning assigns 2-mapped reads based on the unique alignments, > then 3-mapped reads are apportioned based on the adjusted read > counts, and so on). So, in the attached you'll see three sets of > columns for the samples, with those head "unique" giving the > per-million-reads-aligned normalized values for each samples uniquely > aligned reads, "apportioned" using the adjusted values as described > above, and "total" giving the number of reads aligned to the gene > models without regard to their uniqueness. Note that in all cases, we > consider only reads mapping to no more than 5 locations. Hence, the > values that are in non integer forms. Kindly help me through this > > Suryavadhan ________________________________________ From: Steve > Lianoglou [mailinglist.honeypot at gmail.com] Sent: Thursday, September > 15, 2011 9:42 AM To: Kayilai, Suryavadhan (MU-Student) Cc: > bioconductor at r-project.org Subject: Re: [BioC] Analysing RNA-Seq data > using DESeq package > > Hi Suryavadhan, > > On Tue, Sep 13, 2011 at 12:41 PM, Kayilai, Suryavadhan (MU-Student) > <skhx5 at="" mail.missouri.edu=""> wrote: >> I downloaded the DESeq package for the RNA seq analysis of the >> Soybean genes. The package is really helpful and easy to use. >> Thanks! I have a small doubt and it would be kind of you, if could >> help me figure out the same. The package works fine for the gene >> data with whole number or integer values. How can I run the >> analysis for decimal data as the class newCountDataset does not >> allow me to input decimal data. It would be great if you could help >> me through this. > > It doesn't let you put in non-integer data, because the models DESeq > uses to test for significance assumes count data -- as in, the > number of reads that align to a given region, which can only ever be > integers. > > What types of data are you trying to put in that are decimal values, > anyway? What does it represent? > > -steve > > -- Steve Lianoglou Graduate Student: Computational Systems Biology | > Memorial Sloan-Kettering Cancer Center | Weill Medical College of > Cornell University Contact Info: > http://cbio.mskcc.org/~lianos/contact > > > > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 12.6 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Suryavadhan for normalisation between samples, please use the method described in the DESeq vignette, rather than the (information-losing) method described below. For the non-unique reads, DESeq has no provision for fuzzy or fractional alignments. You'll have to make a choice, and provide actual counts. Hope this helps Wolfgang Sep/15/11 6:58 PM, Kayilai, Suryavadhan (MU-Student) scripsit:: > FOr the 6 sequenced samples, we ran alignments to get expression > estimates. The protocol is to align the reads, then count the number > of reads falling within the boundaries of the annotated genes, then > normalize with respect to the number of reads aligning in each > sample(not the sample length). The analysis also attempts to capture > the non uniquely aligning reads by estimating the unique read counts > for each gene, then apportioning the ambiguously aligning reads among > the potential sources based on the ratios of read counts among those > sources established by the less ambiguous readsie the first round of > apportioning assigns 2-mapped reads based on the unique alignments, > then 3-mapped reads are apportioned based on the adjusted read > counts, and so on). So, in the attached you'll see three sets of > columns for the samples, with those head "unique" giving the > per-million-reads-aligned normalized values for each samples uniquely > aligned reads, "apportioned" using the adjusted values as described > above, and "total" giving the number of reads aligned to the gene > models without regard to their uniqueness. Note that in all cases, we > consider only reads mapping to no more than 5 locations. Hence, the > values that are in non integer forms. Kindly help me through this > > Suryavadhan ________________________________________ From: Steve > Lianoglou [mailinglist.honeypot at gmail.com] Sent: Thursday, September > 15, 2011 9:42 AM To: Kayilai, Suryavadhan (MU-Student) Cc: > bioconductor at r-project.org Subject: Re: [BioC] Analysing RNA-Seq data > using DESeq package > > Hi Suryavadhan, > > On Tue, Sep 13, 2011 at 12:41 PM, Kayilai, Suryavadhan (MU-Student) > <skhx5 at="" mail.missouri.edu=""> wrote: >> I downloaded the DESeq package for the RNA seq analysis of the >> Soybean genes. The package is really helpful and easy to use. >> Thanks! I have a small doubt and it would be kind of you, if could >> help me figure out the same. The package works fine for the gene >> data with whole number or integer values. How can I run the >> analysis for decimal data as the class newCountDataset does not >> allow me to input decimal data. It would be great if you could help >> me through this. > > It doesn't let you put in non-integer data, because the models DESeq > uses to test for significance assumes count data -- as in, the > number of reads that align to a given region, which can only ever be > integers. > > What types of data are you trying to put in that are decimal values, > anyway? What does it represent? > > -steve > > -- Steve Lianoglou Graduate Student: Computational Systems Biology | > Memorial Sloan-Kettering Cancer Center | Weill Medical College of > Cornell University Contact Info: > http://cbio.mskcc.org/~lianos/contact > > > > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 12.6 years ago Wolfgang Huber ★ 13k

Login before adding your answer.