DESeq - Estimating Dispersion with Technical Replicates

0

Entering edit mode

@andres-eduardo-rodriguez-cubillos-5486

Last seen 9.6 years ago

Good day everyone, My name is Andr?s. I'm from Universidad de los Andes located in Bogota D.C. (Colombia) and am currently using the DESeq package to analyze differential gene expression between two experimental conditions. I attach an example of the countData format I'm using to run the analysis in DESeq. Each column represents a treatment, or condition, that has the mean counts of two technical replicates; each row represents the FPKMs (count reads) obtained from CuffCompare after our RNA-seq data was processed through Bowtie and Cufflinks. In our experiment we used a technical replicate for each condition and, according to the user guide provided by Simon Anders, we must sum up their counts to get a single column corresponding to a unique biological replicate. At the end I end up with two columns: each one representing a condition that has the mean counts from the two technical replicates of that condition. It's important to say that we do not have any biological replicates, only technical replicates. Everything appears to be going fine until we try to estimate the dispersion of the normalized counts... an error message appears indicating that "X must be an array of at least two dimensions". I attach my results and the error message. I hope you can help us solve this issue. We're thinking this error might be related to the fact that we only have one column for each condition: one for "treated" and one for "untreated". In the countData from the guide I see there's more than one column for each condition: "treated2", "treated3" and "untreated3", "untreated4" (Guide: Analysing RNA-seq Data with the DESeq Package from 2012-03-16). However, if we only want to compare two conditions that only have technical replicates, we can only produce one column per condition because we must sum up both technical replicates into one column. We'll be glad to hear from you and appreciate any advice you can give us. Best regards, Andr?s Rodr?guez LAMFU Universidad de los Andes Bogot? D.C. Colombia -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: countData Format.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120915="" 487378ab="" attachment.txt=""> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Technical Replicates Error Message.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120915="" 487378ab="" attachment-0001.txt="">

GLAD DESeq GLAD DESeq • 1.5k views

ADD COMMENT • link updated 11.6 years ago by Wolfgang Huber ★ 13k • written 11.6 years ago by Andres Eduardo Rodriguez Cubillos ▴ 20

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi, Section 3.3 of the DESeq vignette of the current released version is about working without any replicates. Perhaps that will sort you out? http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq/inst /doc/DESeq.pdf HTH, Steve On Saturday, September 15, 2012, Andres Eduardo Rodriguez Cubillos wrote: > Good day everyone, > > My name is Andrés. I'm from Universidad de los Andes located in Bogota > D.C. (Colombia) and am currently using the DESeq package to analyze > differential gene expression between two experimental conditions. > > I attach an example of the countData format I'm using to run the analysis > in DESeq. Each column represents a treatment, or condition, that has the > mean counts of two technical replicates; each row represents the FPKMs > (count reads) obtained from CuffCompare after our RNA-seq data was > processed through Bowtie and Cufflinks. > > In our experiment we used a technical replicate for each condition and, > according to the user guide provided by Simon Anders, we must sum up their > counts to get a single column corresponding to a unique biological > replicate. At the end I end up with two columns: each one representing a > condition that has the mean counts from the two technical replicates of > that condition. It's important to say that we do not have any biological > replicates, only technical replicates. > > Everything appears to be going fine until we try to estimate the > dispersion of the normalized counts... an error message appears indicating > that "X must be an array of at least two dimensions". I attach my results > and the error message. > > I hope you can help us solve this issue. We're thinking this error might > be related to the fact that we only have one column for each condition: one > for "treated" and one for "untreated". In the countData from the guide I > see there's more than one column for each condition: "treated2", "treated3" > and "untreated3", "untreated4" (Guide: Analysing RNA-seq Data with the > DESeq Package from 2012-03-16). However, if we only want to compare two > conditions that only have technical replicates, we can only produce one > column per condition because we must sum up both technical replicates into > one column. > > We'll be glad to hear from you and appreciate any advice you can give us. > > Best regards, > > > Andrés Rodríguez > > LAMFU > Universidad de los Andes > Bogotá D.C. > Colombia > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]

ADD COMMENT • link 11.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 18 days ago

EMBL European Molecular Biology Laborat…

Dear Andr?s thank you for your report. From the error message that you sent, it seems that you are using an older version of DESeq. Can you update to the latest version (ideally [1], but at least the latest release, version 1.8.3). You say "according to the user guide provided by Simon Anders, we must *sum up* their counts to get a single column ... I end up with two columns: each one representing a condition that has the *mean* counts from the two technical replicates...." Please do use the sum, not the mean. Then, please consider the vignette [2], which addresses your use-case in Section 3.3 "Working without any replicates" and recommends this code: cds2 = estimateDispersions( cds2, method="blind", sharingMode="fit-only" ) If your problem persists, please send the output of 'sessionInfo()' in your next report. (So we don't need to chase after problems that already have been fixed.) [1] http://www.bioconductor.org/packages/devel/bioc/html/DESeq.html [2] http://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq/inst/d oc/DESeq.pdf Best wishes Wolfgang Sep/15/12 6:14 AM, Andres Eduardo Rodriguez Cubillos scripsit: > Good day everyone, > > My name is Andr?s. I'm from Universidad de los Andes located in > Bogota D.C. (Colombia) and am currently using the DESeq package to > analyze differential gene expression between two experimental > conditions. > > I attach an example of the countData format I'm using to run the > analysis in DESeq. Each column represents a treatment, or condition, > that has the mean counts of two technical replicates; each row > represents the FPKMs (count reads) obtained from CuffCompare after > our RNA-seq data was processed through Bowtie and Cufflinks. > > In our experiment we used a technical replicate for each condition > and, according to the user guide provided by Simon Anders, we must > sum up their counts to get a single column corresponding to a unique > biological replicate. At the end I end up with two columns: each one > representing a condition that has the mean counts from the two > technical replicates of that condition. It's important to say that we > do not have any biological replicates, only technical replicates. > > Everything appears to be going fine until we try to estimate the > dispersion of the normalized counts... an error message appears > indicating that "X must be an array of at least two dimensions". I > attach my results and the error message. > > I hope you can help us solve this issue. We're thinking this error > might be related to the fact that we only have one column for each > condition: one for "treated" and one for "untreated". In the > countData from the guide I see there's more than one column for each > condition: "treated2", "treated3" and "untreated3", "untreated4" > (Guide: Analysing RNA-seq Data with the DESeq Package from > 2012-03-16). However, if we only want to compare two conditions that > only have technical replicates, we can only produce one column per > condition because we must sum up both technical replicates into one > column. > > We'll be glad to hear from you and appreciate any advice you can give > us. > > Best regards, > > > Andr?s Rodr?guez > > LAMFU Universidad de los Andes Bogot? D.C. Colombia > > > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 11.6 years ago Wolfgang Huber ★ 13k

Login before adding your answer.