Question

DEXSeq package - read.HTSeqCount function error

0

Entering edit mode

Matteo Carrara ▴ 20

@matteo-carrara-5734

Last seen 9.6 years ago

Hello, I have been trying to learn how to perform a differential expression analysis of RNA-seq data using the DEXSeq package lately and I encountered an unexpected behaviour in the function read.HTSeqCount: the function fails to load the file obtained from the python script "dexseq_count.py" with the following error message: Error in strsplit(rownames(dcounts), ":") : non-character argument I would really appreciate any pointers that might help me correct my code or my input files. Here is what I have done: - downloaded the mm9 GTF gene set from www.ensembl.org and run the script "dexseq_prepare_annotation.py" - mapped my raw RNA-seq reads on the mm9 genome using tophat, converting the output in sorted SAM format - run the script "dexseq_count.py" using the "flattened" GTF and the SAM file obtained before - loaded the dataset in R using the function read.HTSeqCount() as following: -------------------------------------- > library(DEXSeq) > wt<-read.HTSeqCount("./wt_mapped.counts", "WT", flattenedfile="./flattened_mm9.gtf") Error in strsplit(rownames(dcounts), ":") : non-character argument -------------------------------------- As far as I could understand, the "pasilla" package, used for the examples in the vignette, provided a counts file under the name "pasilla_gene_counts.tsv". Loading that file, however, results in the same error message. All I could do was pinpointing the source of the error in the code of the function, although that did not help me in finding a solution or a workaround: After creating the data frame "dcounts" storing the counts and setting the row names, that same data frame is sub-set dcounts <- dcounts[substr(rownames(dcounts), 1, 1) != "_", ] This code, however changes the object dcounts in such a way that the "rownames()" function returns NULL. The next statement is then bound to fail since it requires rownames(dcounts) to be a character or a vector of characters: genesrle <- sapply(strsplit(rownames(dcounts), ":"), "[[", 1) I am running R 2.15.2 and DEXSeq 1.4.0 from Bioconductor version 2.11, but I was able to reproduce this on the devel version of R (2013-01-22 r61734) using DEXSeq_1.5.6 from Bioconductor version 2.12. --------------------------------- >sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.8.3 DEXSeq_1.4.0 Biobase_2.18.0 [4] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] biomaRt_2.14.0 hwriter_1.3 RCurl_1.95-3 statmod_1.4.16 stringr_0.6.2 [6] tools_2.15.2 XML_3.95-0.1 -------------------------------- Thank you in advance for any help you can provide. Best Regards, -- Matteo Carrara PhD Student in Complex Systems for Life Sciences Department of Biotechnology and Health Science MBC - Molecular Biotechnology Center via Nizza, 52 Torino ITALY [[alternative HTML version deleted]]

DEXSeq DEXSeq • 1.0k views

ADD COMMENT • link updated 11.2 years ago by Alejandro Reyes ★ 1.9k • written 11.2 years ago by Matteo Carrara ▴ 20

score 0 · Answer 1 · 2013-01-29

Dear Matteo Carrara, Could you add to this e-mail the first 10 and last 10 lines of your input files (both counts and annotation files produced by the python scripts)? On unrelated topics, I noticed that you have only one sample, what exactly do you want to do with DEXSeq in this case? Note that DEXSeq is designed to test for differences in exon usage between different conditions with replicates. Best wishes, Alejandro Reyes > Hello, > > I have been trying to learn how to perform a differential expression > analysis of RNA-seq data using the DEXSeq package lately and I encountered > an unexpected behaviour in the function read.HTSeqCount: the function fails > to load the file obtained from the python script "dexseq_count.py" with the > following error message: > > Error in strsplit(rownames(dcounts), ":") : non-character argument > > I would really appreciate any pointers that might help me correct my code > or my input files. > > Here is what I have done: > - downloaded the mm9 GTF gene set from www.ensembl.org and run the script > "dexseq_prepare_annotation.py" > - mapped my raw RNA-seq reads on the mm9 genome using tophat, converting > the output in sorted SAM format > - run the script "dexseq_count.py" using the "flattened" GTF and the SAM > file obtained before > - loaded the dataset in R using the function read.HTSeqCount() as following: > > -------------------------------------- >> library(DEXSeq) >> wt<-read.HTSeqCount("./wt_mapped.counts", "WT", > flattenedfile="./flattened_mm9.gtf") > > > Error in strsplit(rownames(dcounts), ":") : non-character argument > -------------------------------------- > > As far as I could understand, the "pasilla" package, used for the examples > in the vignette, provided a counts file under the name > "pasilla_gene_counts.tsv". Loading that file, however, results in the same > error message. > > All I could do was pinpointing the source of the error in the code of the > function, although that did not help me in finding a solution or a > workaround: > After creating the data frame "dcounts" storing the counts and setting the > row names, that same data frame is sub-set > > dcounts <- dcounts[substr(rownames(dcounts), 1, 1) != "_", > ] > > This code, however changes the object dcounts in such a way that the > "rownames()" function returns NULL. The next statement is then bound to > fail since it requires rownames(dcounts) to be a character or a vector of > characters: > > genesrle <- sapply(strsplit(rownames(dcounts), ":"), "[[", > 1) > > I am running R 2.15.2 and DEXSeq 1.4.0 from Bioconductor version 2.11, > but I was able to reproduce this on the devel version of R (2013-01-22 > r61734) using DEXSeq_1.5.6 from Bioconductor version 2.12. > > --------------------------------- >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiocInstaller_1.8.3 DEXSeq_1.4.0 Biobase_2.18.0 > [4] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.14.0 hwriter_1.3 RCurl_1.95-3 statmod_1.4.16 > stringr_0.6.2 > [6] tools_2.15.2 XML_3.95-0.1 > -------------------------------- > > Thank you in advance for any help you can provide. > Best Regards,