DEXSeq package - read.HTSeqCount function error

0

Entering edit mode

Matteo Carrara ▴ 20

@matteo-carrara-5734

Last seen 11.4 years ago

Dear Alejandro, thank you for the quick reply. After you mentioned the coherence of the input, I dug more in it. It looks like the problem was quite trivial: I accidentally left the "paired" option of the counting script to the default, event though my datasets were all paired-end. I have run the script again with the correct parameters and the loading ends successfully. Your question about the samples is far from being unrelated, since it has impact on the next steps of the analysis. When I wrote the message I was just testing the function, so I limited the loading to a single dataset. The real design comprises two different conditions with two biological replicates, each of them with two technical replicates, for a total of four datasets per condition. I decided to use DEXSeq exactly to assess the differential exon usage Thank you again. Best, -- Matteo Carrara PhD Student in Complex Systems for Life Sciences Department of Biotechnology and Health Sciences MBC - Molecular Biotechnology Center via Nizza, 52 Torino ITALY On Tue, Jan 29, 2013 at 6:49 PM, Alejandro Reyes <alejandro.reyes at="" embl.de=""> wrote: > > Dear Matteo Carrara, > > Could you add to this e-mail the first 10 and last 10 lines of your input files (both counts and annotation files produced by the python scripts)? > > On unrelated topics, I noticed that you have only one sample, what exactly do you want to do with DEXSeq in this case? > Note that DEXSeq is designed to test for differences in exon usage between different conditions with replicates. > > Best wishes, > Alejandro Reyes > > >> Hello, >> >> I have been trying to learn how to perform a differential expression >> analysis of RNA-seq data using the DEXSeq package lately and I encountered >> an unexpected behaviour in the function read.HTSeqCount: the function fails >> to load the file obtained from the python script "dexseq_count.py" with the >> following error message: >> >> Error in strsplit(rownames(dcounts), ":") : non-character argument >> >> I would really appreciate any pointers that might help me correct my code >> or my input files. >> >> Here is what I have done: >> - downloaded the mm9 GTF gene set from www.ensembl.org and run the script >> "dexseq_prepare_annotation.py" >> - mapped my raw RNA-seq reads on the mm9 genome using tophat, converting >> the output in sorted SAM format >> - run the script "dexseq_count.py" using the "flattened" GTF and the SAM >> file obtained before >> - loaded the dataset in R using the function read.HTSeqCount() as following: >> >> -------------------------------------- >>> >>> library(DEXSeq) >>> wt<-read.HTSeqCount("./wt_mapped.counts", "WT", >> >> flattenedfile="./flattened_mm9.gtf") >> >> >> Error in strsplit(rownames(dcounts), ":") : non-character argument >> -------------------------------------- >> >> As far as I could understand, the "pasilla" package, used for the examples >> in the vignette, provided a counts file under the name >> "pasilla_gene_counts.tsv". Loading that file, however, results in the same >> error message. >> >> All I could do was pinpointing the source of the error in the code of the >> function, although that did not help me in finding a solution or a >> workaround: >> After creating the data frame "dcounts" storing the counts and setting the >> row names, that same data frame is sub-set >> >> dcounts <- dcounts[substr(rownames(dcounts), 1, 1) != "_", >> ] >> >> This code, however changes the object dcounts in such a way that the >> "rownames()" function returns NULL. The next statement is then bound to >> fail since it requires rownames(dcounts) to be a character or a vector of >> characters: >> >> genesrle <- sapply(strsplit(rownames(dcounts), ":"), "[[", >> 1) >> >> I am running R 2.15.2 and DEXSeq 1.4.0 from Bioconductor version 2.11, >> but I was able to reproduce this on the devel version of R (2013-01-22 >> r61734) using DEXSeq_1.5.6 from Bioconductor version 2.12. >> >> --------------------------------- >>> >>> sessionInfo() >> >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] BiocInstaller_1.8.3 DEXSeq_1.4.0 Biobase_2.18.0 >> [4] BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.14.0 hwriter_1.3 RCurl_1.95-3 statmod_1.4.16 >> stringr_0.6.2 >> [6] tools_2.15.2 XML_3.95-0.1 >> -------------------------------- >> >> Thank you in advance for any help you can provide. >> Best Regards, > > -- Matteo Carrara

Annotation DEXSeq Annotation DEXSeq • 1.3k views

ADD COMMENT • link updated 13.0 years ago by Steve Lianoglou ★ 13k • written 13.0 years ago by Matteo Carrara ▴ 20

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 11 weeks ago

United States

Hi, On Friday, February 1, 2013, Matteo Carrara wrote: > Dear Alejandro, > > thank you for the quick reply. After you mentioned the coherence of > the input, I dug more in it. It looks like the problem was quite > trivial: I accidentally left the "paired" option of the counting > script to the default, event though my datasets were all paired-end. I > have run the script again with the correct parameters and the loading > ends successfully. > > Your question about the samples is far from being unrelated, since it > has impact on the next steps of the analysis. > When I wrote the message I was just testing the function, so I limited > the loading to a single dataset. The real design comprises two > different conditions with two biological replicates, each of them with > two technical replicates, for a total of four datasets per condition. Keep in mind that you are usually recommended to simply add counts from techinical replicates together, and not to treat them as separate replicates -- so technically, you will have two replicates per condition. HTH, -Steve I decided to use DEXSeq exactly to assess the differential exon usage > > Thank you again. > > Best, > -- > Matteo Carrara > PhD Student in Complex Systems for Life Sciences > Department of Biotechnology and Health Sciences > MBC - Molecular Biotechnology Center > via Nizza, 52 Torino > ITALY > > > > On Tue, Jan 29, 2013 at 6:49 PM, Alejandro Reyes > <alejandro.reyes@embl.de> wrote: > > > > Dear Matteo Carrara, > > > > Could you add to this e-mail the first 10 and last 10 lines of your > input files (both counts and annotation files produced by the python > scripts)? > > > > On unrelated topics, I noticed that you have only one sample, what > exactly do you want to do with DEXSeq in this case? > > Note that DEXSeq is designed to test for differences in exon usage > between different conditions with replicates. > > > > Best wishes, > > Alejandro Reyes > > > > > >> Hello, > >> > >> I have been trying to learn how to perform a differential expression > >> analysis of RNA-seq data using the DEXSeq package lately and I > encountered > >> an unexpected behaviour in the function read.HTSeqCount: the function > fails > >> to load the file obtained from the python script "dexseq_count.py" with > the > >> following error message: > >> > >> Error in strsplit(rownames(dcounts), ":") : non-character argument > >> > >> I would really appreciate any pointers that might help me correct my > code > >> or my input files. > >> > >> Here is what I have done: > >> - downloaded the mm9 GTF gene set from www.ensembl.org and run the > script > >> "dexseq_prepare_annotation.py" > >> - mapped my raw RNA-seq reads on the mm9 genome using tophat, converting > >> the output in sorted SAM format > >> - run the script "dexseq_count.py" using the "flattened" GTF and the SAM > >> file obtained before > >> - loaded the dataset in R using the function read.HTSeqCount() as > following: > >> > >> -------------------------------------- > >>> > >>> library(DEXSeq) > >>> wt<-read.HTSeqCount("./wt_mapped.counts", "WT", > >> > >> flattenedfile="./flattened_mm9.gtf") > >> > >> > >> Error in strsplit(rownames(dcounts), ":") : non-character argument > >> -------------------------------------- > >> > >> As far as I could understand, the "pasilla" package, used for the > examples > >> in the vignette, provided a counts file under the name > >> "pasilla_gene_counts.tsv". Loading that file, however, results in the > same > >> error message. > >> > >> All I could do was pinpointing the source of the error in the code of > the > >> function, although that did not help me in finding a solution or a > >> workaround: > >> After creating the data frame "dcounts" storing the counts and setting > the > >> row names, that same data frame is sub-set > >> > >> dcounts <- dcounts[substr(rownames(dcounts), 1, 1) != "_", > >> ] > >> > >> This code, however changes the object dcounts in such a way that the > >> "rownames()" function returns NULL. The next statement is then bound to > >> fail since it requires rownames(dcounts) to be a character or a vector > of > >> characters: > >> > >> genesrle <- sapply(strsplit(rownames(dcounts), ":"), "[[", > >> 1) > >> > >> I am running R 2.15.2 and DEXSeq 1.4.0 from Bioconductor version 2.11, > >> but I was able to reproduce this on the devel version of R (2013-01-22 > >> r61734) using DEXSeq_1.5.6 from Bioconductor version 2.12. > >> > >> --------------------------------- > >>> > >>> sessionInfo() > >> > >> R version 2.15.2 (2012-10-26) > >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >> [7] LC_PAPER=C LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] stats graphics grDevices-- > Matteo Carrara > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <javascript:;> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]

ADD COMMENT • link 13.0 years ago Steve Lianoglou ★ 13k

Login before adding your answer.