easyRNASeq - error reading single chromosome bam

0

Entering edit mode

Lescai, Francesco ▴ 380

@lescai-francesco-5078

Last seen 6.2 years ago

Denmark

Hi there, I'm preparing a tutorial, therefore I sliced two RNASeq bam files from chromosome one, in order to reduce the file size. However, when I count the reads as it follows I have an error. thanks, Francesco > countDataSet <- easyRNASeq(filesDirectory=getwd(), + organism="HSapiens", + chr.sizes=as.list(seqlengths(Hsapiens)), + readLength=100L, + annotationMethod="biomaRt", + format="bam", + count="exons", + filenames=bamfiles, + normalize=TRUE, + outputFormat="DESeq", + conditions=conditions, + fitType="local" + ) Checking arguments... Fetching annotations... Summarizing counts... Processing SRR349689_1.fastq.novo.new.chr1.bam Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : (list) object cannot be coerced to type 'integer' I also tried selecting the chromosome with chr.sel=c("1") or chr.sel=c("chr1") but in both cases I get the error Make sure that your file is valid, that your 'chr.sel' (if provided) contains valid values; i.e. values as found in the file, not as returned by 'RNAseq'. The alignment has been done using the ENSEMBL reference, i.e. with chromosome listed as "1". sessionInfo() R Under development (unstable) (2012-01-20 r58146) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.6.7 AnnotationDbi_1.17.15 easyRNASeq_1.1.5 [4] ShortRead_1.13.13 latticeExtra_0.6-19 RColorBrewer_1.0-5 [7] Rsamtools_1.7.27 genomeIntervals_1.11.0 intervals_0.13.3 [10] DESeq_1.7.6 locfit_1.5-6 lattice_0.20-0 [13] akima_0.5-7 BiocInstaller_1.3.7 Biobase_2.15.3 [16] edgeR_2.5.9 limma_3.11.11 biomaRt_2.11.1 [19] BSgenome.Hsapiens.UCSC.hg19_1.3.17 BSgenome_1.23.2 Biostrings_2.23.6 [22] GenomicRanges_1.7.16 IRanges_1.13.22 BiocGenerics_0.1.4 loaded via a namespace (and not attached): [1] DBI_0.2-5 RCurl_1.9-5 RSQLite_0.11.1 XML_3.8-0 annotate_1.33.2 [6] bitops_1.0-4.1 genefilter_1.37.0 geneplotter_1.33.1 grid_2.15.0 hwriter_1.3 [11] rtracklayer_1.15.7 splines_2.15.0 survival_2.36-10 tools_2.15.0 xtable_1.6-0 [16] zlibbioc_1.1.1 ---------------------------------------------------------------------- ----------- Francesco Lescai, PhD, EDBT Senior Research Associate in Genome Analysis University College London Faculty of Population Health Sciences Dept. Genes, Development & Disease ICH - Molecular Medicine Unit, GOSgene team 30 Guilford Street WC1N 1EH London UK email: f.lescai@ucl.ac.uk<mailto:f.lescai@ucl.ac.uk> phone: +44.(0)207.905.2274 [ext: 2274] ---------------------------------------------------------------------- ---------- [[alternative HTML version deleted]]

RNASeq Alignment BSgenome BSgenome RNASeq Alignment BSgenome BSgenome • 1.6k views

ADD COMMENT • link updated 12.8 years ago by delhomme@embl.de ★ 1.2k • written 12.8 years ago by Lescai, Francesco ▴ 380

0

Entering edit mode

delhomme@embl.de ★ 1.2k

@delhommeemblde-3232

Last seen 10.2 years ago

Hi Francesco, I'd need more info to help you. Can you post the header of your bam files and the content of as.list(seqlengths(Hsapiens)). In the meanwhile can you run easyRNASeq with normalize set to FALSE, no conditions and the outputFormat set to "RNAseq"? That will give you back the RNAseq object and we can have a look at what is happening in there. I get the feeling that it has to do with the exon names being converted into integers (through the use of a factor maybe) but I thought I got that bug squashed already... Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 7 Feb 2012, at 13:37, Lescai, Francesco wrote: > Hi there, > I'm preparing a tutorial, therefore I sliced two RNASeq bam files from chromosome one, in order to reduce the file size. > > However, when I count the reads as it follows I have an error. > thanks, > Francesco > > >> countDataSet <- easyRNASeq(filesDirectory=getwd(), > + organism="HSapiens", > + chr.sizes=as.list(seqlengths(Hsapiens)), > + readLength=100L, > + annotationMethod="biomaRt", > + format="bam", > + count="exons", > + filenames=bamfiles, > + normalize=TRUE, > + outputFormat="DESeq", > + conditions=conditions, > + fitType="local" > + ) > Checking arguments... > Fetching annotations... > Summarizing counts... > Processing SRR349689_1.fastq.novo.new.chr1.bam > Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : > (list) object cannot be coerced to type 'integer' > > I also tried selecting the chromosome with chr.sel=c("1") or chr.sel=c("chr1") but in both cases I get the error > Make sure that your file is valid, that your 'chr.sel' (if provided) contains valid values; i.e. values as found in the file, not as returned by 'RNAseq'. > > The alignment has been done using the ENSEMBL reference, i.e. with chromosome listed as "1". > > sessionInfo() > R Under development (unstable) (2012-01-20 r58146) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C/en_US.UTF-8/C/C/C/C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.6.7 AnnotationDbi_1.17.15 easyRNASeq_1.1.5 > [4] ShortRead_1.13.13 latticeExtra_0.6-19 RColorBrewer_1.0-5 > [7] Rsamtools_1.7.27 genomeIntervals_1.11.0 intervals_0.13.3 > [10] DESeq_1.7.6 locfit_1.5-6 lattice_0.20-0 > [13] akima_0.5-7 BiocInstaller_1.3.7 Biobase_2.15.3 > [16] edgeR_2.5.9 limma_3.11.11 biomaRt_2.11.1 > [19] BSgenome.Hsapiens.UCSC.hg19_1.3.17 BSgenome_1.23.2 Biostrings_2.23.6 > [22] GenomicRanges_1.7.16 IRanges_1.13.22 BiocGenerics_0.1.4 > > loaded via a namespace (and not attached): > [1] DBI_0.2-5 RCurl_1.9-5 RSQLite_0.11.1 XML_3.8-0 annotate_1.33.2 > [6] bitops_1.0-4.1 genefilter_1.37.0 geneplotter_1.33.1 grid_2.15.0 hwriter_1.3 > [11] rtracklayer_1.15.7 splines_2.15.0 survival_2.36-10 tools_2.15.0 xtable_1.6-0 > [16] zlibbioc_1.1.1 > > > -------------------------------------------------------------------- ------------- > Francesco Lescai, PhD, EDBT > Senior Research Associate in Genome Analysis > University College London > Faculty of Population Health Sciences > Dept. Genes, Development & Disease > ICH - Molecular Medicine Unit, GOSgene team > 30 Guilford Street > WC1N 1EH London UK > > email: f.lescai at ucl.ac.uk<mailto:f.lescai at="" ucl.ac.uk=""> > phone: +44.(0)207.905.2274 > [ext: 2274] > -------------------------------------------------------------------- ------------ > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.8 years ago delhomme@embl.de ★ 1.2k

0

Entering edit mode

Hi Nico, thanks for your quick answer. Here it is the information you ask, it might be quite lengthy. Unfortunately I get the same error in generating the RNAseq object as well. > countDataSet <- easyRNASeq(filesDirectory=getwd(), + organism="HSapiens", + chr.sizes=as.list(seqlengths(Hsapiens)), + readLength=100L, + annotationMethod="biomaRt", + format="bam", + count="exons", + filenames=bamfiles, + normalize=FALSE, + outputFormat="RNAseq", + fitType="local" + ) Checking arguments... Fetching annotations... Summarizing counts... Processing SRR349689_1.fastq.novo.new.chr1.bam Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : (list) object cannot be coerced to type 'integer' In addition: There were 50 or more warnings (use warnings() to see the first 50) I also tried using a local annotation file, generated this way hg19.tx <- makeTranscriptDbFromUCSC( genome="hg19", tablename="refGene") exon.range<-exons(hg19.tx) gAnnot <- RangedData(exon.range) colnames(gAnnot)[2]<-"exon" save(gAnnot,file="gAnnot.rda") but then the error is the same. > countDataSet <- easyRNASeq(filesDirectory=getwd(), + organism="HSapiens", + chr.sizes=as.list(seqlengths(Hsapiens)), + readLength=100L, + annotationMethod="rda", + annotationFile="gAnnot.rda", + format="bam", + count="exons", + filenames=bamfiles, + normalize=TRUE, + outputFormat="DESeq", + conditions=conditions, + fitType="local" + ) Checking arguments... Fetching annotations... Summarizing counts... Processing SRR349689_1.fastq.novo.new.chr1.bam Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : (list) object cannot be coerced to type 'integer' The .bam file has been generated following Novocraft instructions, from a combined reference of the genome, the transcriptome and the junctions and then recomposed with USeq. (http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+ mRNA+and+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id= 35) is there any other information/file that could help in tracking the issue? thanks, Francesco ---bam header--- samtools view -H SRR349689_1.fastq.novo.new.chr1.bam @HD VN:1.0 SO:unsorted @PG ID:SamTranscriptomeParser CL: args -f SRR349689_1.fastq.novo.sam -u -a 100000 -n 100000 -s SRR349689_1.fastq.novo_converted.sam @RG ID:unknownReadGroup SM:unknownSample @SQ SN:GL000229.1 AS:trascriptome.nix LN:119339 @SQ SN:GL000200.1 AS:trascriptome.nix LN:277914 @SQ SN:GL000228.1 AS:trascriptome.nix LN:228349 @SQ SN:GL000241.1 AS:trascriptome.nix LN:136814 @SQ SN:GL000219.1 AS:trascriptome.nix LN:220315 @SQ SN:GL000191.1 AS:trascriptome.nix LN:202431 @SQ SN:GL000242.1 AS:trascriptome.nix LN:142826 @SQ SN:GL000243.1 AS:trascriptome.nix LN:106738 @SQ SN:22 AS:trascriptome.nix LN:51342124 @SQ SN:GL000245.1 AS:trascriptome.nix LN:132396 @SQ SN:MT AS:trascriptome.nix LN:116551 @SQ SN:3 AS:trascriptome.nix LN:198058734 @SQ SN:GL000217.1 AS:trascriptome.nix LN:224716 @SQ SN:2 AS:trascriptome.nix LN:243284564 @SQ SN:1 AS:trascriptome.nix LN:249336112 @SQ SN:7 AS:trascriptome.nix LN:159196479 @SQ SN:6 AS:trascriptome.nix LN:171154887 @SQ SN:GL000215.1 AS:trascriptome.nix LN:232888 @SQ SN:5 AS:trascriptome.nix LN:181002586 @SQ SN:4 AS:trascriptome.nix LN:191137854 @SQ SN:9 AS:trascriptome.nix LN:141252793 @SQ SN:GL000244.1 AS:trascriptome.nix LN:129122 @SQ SN:GL000216.1 AS:trascriptome.nix LN:230528 @SQ SN:8 AS:trascriptome.nix LN:146387700 @SQ SN:GL000218.1 AS:trascriptome.nix LN:197377 @SQ SN:GL000248.1 AS:trascriptome.nix LN:124693 @SQ SN:GL000224.1 AS:trascriptome.nix LN:226253 @SQ SN:GL000210.1 AS:trascriptome.nix LN:123989 @SQ SN:19 AS:trascriptome.nix LN:59216288 @SQ SN:17 AS:trascriptome.nix LN:81293300 @SQ SN:18 AS:trascriptome.nix LN:78015394 @SQ SN:GL000194.1 AS:trascriptome.nix LN:209503 @SQ SN:15 AS:trascriptome.nix LN:102616513 @SQ SN:16 AS:trascriptome.nix LN:90388655 @SQ SN:13 AS:trascriptome.nix LN:115207096 @SQ SN:14 AS:trascriptome.nix LN:107386086 @SQ SN:GL000195.1 AS:trascriptome.nix LN:278899 @SQ SN:11 AS:trascriptome.nix LN:134990552 @SQ SN:12 AS:trascriptome.nix LN:133927113 @SQ SN:GL000225.1 AS:trascriptome.nix LN:273676 @SQ SN:21 AS:trascriptome.nix LN:48217233 @SQ SN:20 AS:trascriptome.nix LN:63026254 @SQ SN:GL000193.1 AS:trascriptome.nix LN:214851 @SQ SN:GL000246.1 AS:trascriptome.nix LN:126826 @SQ SN:GL000237.1 AS:trascriptome.nix LN:117774 @SQ SN:GL000204.1 AS:trascriptome.nix LN:180839 @SQ SN:Y AS:trascriptome.nix LN:59133001 @SQ SN:GL000247.1 AS:trascriptome.nix LN:135428 @SQ SN:X AS:trascriptome.nix LN:155357811 @SQ SN:GL000205.1 AS:trascriptome.nix LN:219192 @SQ SN:GL000192.1 AS:trascriptome.nix LN:644506 @SQ SN:GL000227.1 AS:trascriptome.nix LN:203665 @SQ SN:GL000197.1 AS:trascriptome.nix LN:132711 @SQ SN:GL000236.1 AS:trascriptome.nix LN:126779 @SQ SN:GL000211.1 AS:trascriptome.nix LN:262752 @SQ SN:GL000240.1 AS:trascriptome.nix LN:139915 @SQ SN:GL000239.1 AS:trascriptome.nix LN:123498 @SQ SN:GL000232.1 AS:trascriptome.nix LN:102363 @SQ SN:GL000212.1 AS:trascriptome.nix LN:282254 @SQ SN:GL000238.1 AS:trascriptome.nix LN:132164 @SQ SN:GL000233.1 AS:trascriptome.nix LN:108194 @SQ SN:GL000249.1 AS:trascriptome.nix LN:137299 @SQ SN:GL000223.1 AS:trascriptome.nix LN:280265 @SQ SN:GL000199.1 AS:trascriptome.nix LN:241559 @SQ SN:10 AS:trascriptome.nix LN:135619864 @SQ SN:GL000209.1 AS:trascriptome.nix LN:167344 @SQ SN:GL000196.1 AS:trascriptome.nix LN:132047 @SQ SN:GL000202.1 AS:trascriptome.nix LN:45488 @SQ SN:GL000220.1 AS:trascriptome.nix LN:261756 @SQ SN:GL000214.1 AS:trascriptome.nix LN:224473 @SQ SN:GL000198.1 AS:trascriptome.nix LN:91607 @SQ SN:GL000234.1 AS:trascriptome.nix LN:32013 @SQ SN:GL000213.1 AS:trascriptome.nix LN:237085 @SQ SN:GL000221.1 AS:trascriptome.nix LN:182753 @SQ SN:GL000222.1 AS:trascriptome.nix LN:264765 @SQ SN:GL000206.1 AS:trascriptome.nix LN:138000 ----------lengths----------------- $chr1 [1] 249250621 $chr2 [1] 243199373 $chr3 [1] 198022430 $chr4 [1] 191154276 $chr5 [1] 180915260 $chr6 [1] 171115067 $chr7 [1] 159138663 $chr8 [1] 146364022 $chr9 [1] 141213431 $chr10 [1] 135534747 $chr11 [1] 135006516 $chr12 [1] 133851895 $chr13 [1] 115169878 $chr14 [1] 107349540 $chr15 [1] 102531392 $chr16 [1] 90354753 $chr17 [1] 81195210 $chr18 [1] 78077248 $chr19 [1] 59128983 $chr20 [1] 63025520 $chr21 [1] 48129895 $chr22 [1] 51304566 $chrX [1] 155270560 $chrY [1] 59373566 $chrM [1] 16571 $chr1_gl000191_random [1] 106433 $chr1_gl000192_random [1] 547496 $chr4_ctg9_hap1 [1] 590426 $chr4_gl000193_random [1] 189789 $chr4_gl000194_random [1] 191469 $chr6_apd_hap1 [1] 4622290 $chr6_cox_hap2 [1] 4795371 $chr6_dbb_hap3 [1] 4610396 $chr6_mann_hap4 [1] 4683263 $chr6_mcf_hap5 [1] 4833398 $chr6_qbl_hap6 [1] 4611984 $chr6_ssto_hap7 [1] 4928567 $chr7_gl000195_random [1] 182896 $chr8_gl000196_random [1] 38914 $chr8_gl000197_random [1] 37175 $chr9_gl000198_random [1] 90085 $chr9_gl000199_random [1] 169874 $chr9_gl000200_random [1] 187035 $chr9_gl000201_random [1] 36148 $chr11_gl000202_random [1] 40103 $chr17_ctg5_hap1 [1] 1680828 $chr17_gl000203_random [1] 37498 $chr17_gl000204_random [1] 81310 $chr17_gl000205_random [1] 174588 $chr17_gl000206_random [1] 41001 $chr18_gl000207_random [1] 4262 $chr19_gl000208_random [1] 92689 $chr19_gl000209_random [1] 159169 $chr21_gl000210_random [1] 27682 $chrUn_gl000211 [1] 166566 $chrUn_gl000212 [1] 186858 $chrUn_gl000213 [1] 164239 $chrUn_gl000214 [1] 137718 $chrUn_gl000215 [1] 172545 $chrUn_gl000216 [1] 172294 $chrUn_gl000217 [1] 172149 $chrUn_gl000218 [1] 161147 $chrUn_gl000219 [1] 179198 $chrUn_gl000220 [1] 161802 $chrUn_gl000221 [1] 155397 $chrUn_gl000222 [1] 186861 $chrUn_gl000223 [1] 180455 $chrUn_gl000224 [1] 179693 $chrUn_gl000225 [1] 211173 $chrUn_gl000226 [1] 15008 $chrUn_gl000227 [1] 128374 $chrUn_gl000228 [1] 129120 $chrUn_gl000229 [1] 19913 $chrUn_gl000230 [1] 43691 $chrUn_gl000231 [1] 27386 $chrUn_gl000232 [1] 40652 $chrUn_gl000233 [1] 45941 $chrUn_gl000234 [1] 40531 $chrUn_gl000235 [1] 34474 $chrUn_gl000236 [1] 41934 $chrUn_gl000237 [1] 45867 $chrUn_gl000238 [1] 39939 $chrUn_gl000239 [1] 33824 $chrUn_gl000240 [1] 41933 $chrUn_gl000241 [1] 42152 $chrUn_gl000242 [1] 43523 $chrUn_gl000243 [1] 43341 $chrUn_gl000244 [1] 39929 $chrUn_gl000245 [1] 36651 $chrUn_gl000246 [1] 38154 $chrUn_gl000247 [1] 36422 $chrUn_gl000248 [1] 39786 $chrUn_gl000249 [1] 38502 On 7 Feb 2012, at 13:00, Nicolas Delhomme wrote: Hi Francesco, I'd need more info to help you. Can you post the header of your bam files and the content of as.list(seqlengths(Hsapiens)). In the meanwhile can you run easyRNASeq with normalize set to FALSE, no conditions and the outputFormat set to "RNAseq"? That will give you back the RNAseq object and we can have a look at what is happening in there. I get the feeling that it has to do with the exon names being converted into integers (through the use of a factor maybe) but I thought I got that bug squashed already... Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme@embl.de<mailto:nicolas.delhomme@embl.de> Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 7 Feb 2012, at 13:37, Lescai, Francesco wrote: Hi there, I'm preparing a tutorial, therefore I sliced two RNASeq bam files from chromosome one, in order to reduce the file size. However, when I count the reads as it follows I have an error. thanks, Francesco countDataSet <- easyRNASeq(filesDirectory=getwd(), + organism="HSapiens", + chr.sizes=as.list(seqlengths(Hsapiens)), + readLength=100L, + annotationMethod="biomaRt", + format="bam", + count="exons", + filenames=bamfiles, + normalize=TRUE, + outputFormat="DESeq", + conditions=conditions, + fitType="local" + ) Checking arguments... Fetching annotations... Summarizing counts... Processing SRR349689_1.fastq.novo.new.chr1.bam Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : (list) object cannot be coerced to type 'integer' I also tried selecting the chromosome with chr.sel=c("1") or chr.sel=c("chr1") but in both cases I get the error Make sure that your file is valid, that your 'chr.sel' (if provided) contains valid values; i.e. values as found in the file, not as returned by 'RNAseq'. The alignment has been done using the ENSEMBL reference, i.e. with chromosome listed as "1". sessionInfo() R Under development (unstable) (2012-01-20 r58146) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.6.7 AnnotationDbi_1.17.15 easyRNASeq_1.1.5 [4] ShortRead_1.13.13 latticeExtra_0.6-19 RColorBrewer_1.0-5 [7] Rsamtools_1.7.27 genomeIntervals_1.11.0 intervals_0.13.3 [10] DESeq_1.7.6 locfit_1.5-6 lattice_0.20-0 [13] akima_0.5-7 BiocInstaller_1.3.7 Biobase_2.15.3 [16] edgeR_2.5.9 limma_3.11.11 biomaRt_2.11.1 [19] BSgenome.Hsapiens.UCSC.hg19_1.3.17 BSgenome_1.23.2 Biostrings_2.23.6 [22] GenomicRanges_1.7.16 IRanges_1.13.22 BiocGenerics_0.1.4 loaded via a namespace (and not attached): [1] DBI_0.2-5 RCurl_1.9-5 RSQLite_0.11.1 XML_3.8-0 annotate_1.33.2 [6] bitops_1.0-4.1 genefilter_1.37.0 geneplotter_1.33.1 grid_2.15.0 hwriter_1.3 [11] rtracklayer_1.15.7 splines_2.15.0 survival_2.36-10 tools_2.15.0 xtable_1.6-0 [16] zlibbioc_1.1.1 ---------------------------------------------------------------------- ----------- Francesco Lescai, PhD, EDBT Senior Research Associate in Genome Analysis University College London Faculty of Population Health Sciences Dept. Genes, Development & Disease ICH - Molecular Medicine Unit, GOSgene team 30 Guilford Street WC1N 1EH London UK email: f.lescai@ucl.ac.uk<mailto:f.lescai@ucl.ac.uk><mailto:f.lescai@u cl.ac.uk=""> phone: +44.(0)207.905.2274 [ext: 2274] ---------------------------------------------------------------------- ---------- [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- ----------- Francesco Lescai, PhD, EDBT Senior Research Associate in Genome Analysis University College London Faculty of Population Health Sciences Dept. Genes, Development & Disease ICH - Molecular Medicine Unit, GOSgene team 30 Guilford Street WC1N 1EH London UK email: f.lescai@ucl.ac.uk<mailto:f.lescai@ucl.ac.uk> phone: +44.(0)207.905.2274 [ext: 2274] ---------------------------------------------------------------------- ---------- [[alternative HTML version deleted]]

ADD REPLY • link 12.8 years ago Lescai, Francesco ▴ 380

0

Entering edit mode

Hi Francesco, The problem is that the name of the chromosome in the chr.size and annotation objects are not the same as in your bam file . Some warnings must say so, look in the 50+ you got by typing warnings(). Why is it important that the names are identical? 1) between the annotation and the bam file: when assigning the reads to the gene, is it important that they both share the same reference, i.e. that they can be located on the same chromosomes. 2) between the annotation, bam file and the chr.size: imagine the case where you have genes present in your annotation that are located at the end of the chromosomes and for which you get no reads. Without defining the chr.size, the coverage vector generated using the reads would stop before the end of the chromosomes. Trying to access the coverage information for these genes would then simply fail. In your case, chromosomes are called GL000225.1 or 22 in your bam files whereas they are probably called chr1, chr2, etc... in your annotation and chr.sizes objects. In such case, I'm trying to guess the right names and emit warnings. Can you please post the warnings so that I double-check this? Finally, the fitType="local" arguments is one for DESeq, so you can drop it as well here, not that it would have any effect when not using DESeq normalization. Btw, I've adapted easyRNASeq to the new DESeq version which object structure changed in the current development release,but it seems you've got the latest release :-) Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 7 Feb 2012, at 15:07, Lescai, Francesco wrote: > Hi Nico, > thanks for your quick answer. > Here it is the information you ask, it might be quite lengthy. > Unfortunately I get the same error in generating the RNAseq object as well. > > > countDataSet <- easyRNASeq(filesDirectory=getwd(), > + organism="HSapiens", > + chr.sizes=as.list(seqlengths(Hsapiens)), > + readLength=100L, > + annotationMethod="biomaRt", > + format="bam", > + count="exons", > + filenames=bamfiles, > + normalize=FALSE, > + outputFormat="RNAseq", > + fitType="local" > + ) > Checking arguments... > Fetching annotations... > Summarizing counts... > Processing SRR349689_1.fastq.novo.new.chr1.bam > Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : > (list) object cannot be coerced to type 'integer' > In addition: There were 50 or more warnings (use warnings() to see the first 50) > > I also tried using a local annotation file, generated this way > > hg19.tx <- makeTranscriptDbFromUCSC( > genome="hg19", > tablename="refGene") > exon.range<-exons(hg19.tx) > gAnnot <- RangedData(exon.range) > colnames(gAnnot)[2]<-"exon" > save(gAnnot,file="gAnnot.rda") > > but then the error is the same. > > > countDataSet <- easyRNASeq(filesDirectory=getwd(), > + organism="HSapiens", > + chr.sizes=as.list(seqlengths(Hsapiens)), > + readLength=100L, > + annotationMethod="rda", > + annotationFile="gAnnot.rda", > + format="bam", > + count="exons", > + filenames=bamfiles, > + normalize=TRUE, > + outputFormat="DESeq", > + conditions=conditions, > + fitType="local" > + ) > Checking arguments... > Fetching annotations... > Summarizing counts... > Processing SRR349689_1.fastq.novo.new.chr1.bam > Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : > (list) object cannot be coerced to type 'integer' > > > The .bam file has been generated following Novocraft instructions, from a combined reference of the genome, the transcriptome and the junctions and then recomposed with USeq. > (http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3 A+mRNA+and+the+Spliceosome&structure=Novocraft+Technologies&page_ref_i d=35) > > is there any other information/file that could help in tracking the issue? > > thanks, > Francesco > > > > ---bam header--- > samtools view -H SRR349689_1.fastq.novo.new.chr1.bam > @HD VN:1.0 > SO:unsorted > @PG ID:SamTranscriptomeParser > CL: args -f SRR349689_1.fastq.novo.sam -u -a 100000 -n 100000 -s SRR349689_1.fastq.novo_converted.sam > @RG ID:unknownReadGroup > SM:unknownSample > @SQ SN:GL000229.1 > AS:trascriptome.nix LN:119339 > @SQ SN:GL000200.1 > AS:trascriptome.nix LN:277914 > @SQ SN:GL000228.1 > AS:trascriptome.nix LN:228349 > @SQ SN:GL000241.1 > AS:trascriptome.nix LN:136814 > @SQ SN:GL000219.1 > AS:trascriptome.nix LN:220315 > @SQ SN:GL000191.1 > AS:trascriptome.nix LN:202431 > @SQ SN:GL000242.1 > AS:trascriptome.nix LN:142826 > @SQ SN:GL000243.1 > AS:trascriptome.nix LN:106738 > @SQ SN:22 > AS:trascriptome.nix LN:51342124 > @SQ SN:GL000245.1 > AS:trascriptome.nix LN:132396 > @SQ SN:MT > AS:trascriptome.nix LN:116551 > @SQ SN:3 > AS:trascriptome.nix LN:198058734 > @SQ SN:GL000217.1 > AS:trascriptome.nix LN:224716 > @SQ SN:2 > AS:trascriptome.nix LN:243284564 > @SQ SN:1 > AS:trascriptome.nix LN:249336112 > @SQ SN:7 > AS:trascriptome.nix LN:159196479 > @SQ SN:6 > AS:trascriptome.nix LN:171154887 > @SQ SN:GL000215.1 > AS:trascriptome.nix LN:232888 > @SQ SN:5 > AS:trascriptome.nix LN:181002586 > @SQ SN:4 > AS:trascriptome.nix LN:191137854 > @SQ SN:9 > AS:trascriptome.nix LN:141252793 > @SQ SN:GL000244.1 > AS:trascriptome.nix LN:129122 > @SQ SN:GL000216.1 > AS:trascriptome.nix LN:230528 > @SQ SN:8 > AS:trascriptome.nix LN:146387700 > @SQ SN:GL000218.1 > AS:trascriptome.nix LN:197377 > @SQ SN:GL000248.1 > AS:trascriptome.nix LN:124693 > @SQ SN:GL000224.1 > AS:trascriptome.nix LN:226253 > @SQ SN:GL000210.1 > AS:trascriptome.nix LN:123989 > @SQ SN:19 > AS:trascriptome.nix LN:59216288 > @SQ SN:17 > AS:trascriptome.nix LN:81293300 > @SQ SN:18 > AS:trascriptome.nix LN:78015394 > @SQ SN:GL000194.1 > AS:trascriptome.nix LN:209503 > @SQ SN:15 > AS:trascriptome.nix LN:102616513 > @SQ SN:16 > AS:trascriptome.nix LN:90388655 > @SQ SN:13 > AS:trascriptome.nix LN:115207096 > @SQ SN:14 > AS:trascriptome.nix LN:107386086 > @SQ SN:GL000195.1 > AS:trascriptome.nix LN:278899 > @SQ SN:11 > AS:trascriptome.nix LN:134990552 > @SQ SN:12 > AS:trascriptome.nix LN:133927113 > @SQ SN:GL000225.1 > AS:trascriptome.nix LN:273676 > @SQ SN:21 > AS:trascriptome.nix LN:48217233 > @SQ SN:20 > AS:trascriptome.nix LN:63026254 > @SQ SN:GL000193.1 > AS:trascriptome.nix LN:214851 > @SQ SN:GL000246.1 > AS:trascriptome.nix LN:126826 > @SQ SN:GL000237.1 > AS:trascriptome.nix LN:117774 > @SQ SN:GL000204.1 > AS:trascriptome.nix LN:180839 > @SQ SN:Y > AS:trascriptome.nix LN:59133001 > @SQ SN:GL000247.1 > AS:trascriptome.nix LN:135428 > @SQ SN:X > AS:trascriptome.nix LN:155357811 > @SQ SN:GL000205.1 > AS:trascriptome.nix LN:219192 > @SQ SN:GL000192.1 > AS:trascriptome.nix LN:644506 > @SQ SN:GL000227.1 > AS:trascriptome.nix LN:203665 > @SQ SN:GL000197.1 > AS:trascriptome.nix LN:132711 > @SQ SN:GL000236.1 > AS:trascriptome.nix LN:126779 > @SQ SN:GL000211.1 > AS:trascriptome.nix LN:262752 > @SQ SN:GL000240.1 > AS:trascriptome.nix LN:139915 > @SQ SN:GL000239.1 > AS:trascriptome.nix LN:123498 > @SQ SN:GL000232.1 > AS:trascriptome.nix LN:102363 > @SQ SN:GL000212.1 > AS:trascriptome.nix LN:282254 > @SQ SN:GL000238.1 > AS:trascriptome.nix LN:132164 > @SQ SN:GL000233.1 > AS:trascriptome.nix LN:108194 > @SQ SN:GL000249.1 > AS:trascriptome.nix LN:137299 > @SQ SN:GL000223.1 > AS:trascriptome.nix LN:280265 > @SQ SN:GL000199.1 > AS:trascriptome.nix LN:241559 > @SQ SN:10 > AS:trascriptome.nix LN:135619864 > @SQ SN:GL000209.1 > AS:trascriptome.nix LN:167344 > @SQ SN:GL000196.1 > AS:trascriptome.nix LN:132047 > @SQ SN:GL000202.1 > AS:trascriptome.nix LN:45488 > @SQ SN:GL000220.1 > AS:trascriptome.nix LN:261756 > @SQ SN:GL000214.1 > AS:trascriptome.nix LN:224473 > @SQ SN:GL000198.1 > AS:trascriptome.nix LN:91607 > @SQ SN:GL000234.1 > AS:trascriptome.nix LN:32013 > @SQ SN:GL000213.1 > AS:trascriptome.nix LN:237085 > @SQ SN:GL000221.1 > AS:trascriptome.nix LN:182753 > @SQ SN:GL000222.1 > AS:trascriptome.nix LN:264765 > @SQ SN:GL000206.1 > AS:trascriptome.nix LN:138000 > > > ----------lengths----------------- > $chr1 > [1] 249250621 > > $chr2 > [1] 243199373 > > $chr3 > [1] 198022430 > > $chr4 > [1] 191154276 > > $chr5 > [1] 180915260 > > $chr6 > [1] 171115067 > > $chr7 > [1] 159138663 > > $chr8 > [1] 146364022 > > $chr9 > [1] 141213431 > > $chr10 > [1] 135534747 > > $chr11 > [1] 135006516 > > $chr12 > [1] 133851895 > > $chr13 > [1] 115169878 > > $chr14 > [1] 107349540 > > $chr15 > [1] 102531392 > > $chr16 > [1] 90354753 > > $chr17 > [1] 81195210 > > $chr18 > [1] 78077248 > > $chr19 > [1] 59128983 > > $chr20 > [1] 63025520 > > $chr21 > [1] 48129895 > > $chr22 > [1] 51304566 > > $chrX > [1] 155270560 > > $chrY > [1] 59373566 > > $chrM > [1] 16571 > > $chr1_gl000191_random > [1] 106433 > > $chr1_gl000192_random > [1] 547496 > > $chr4_ctg9_hap1 > [1] 590426 > > $chr4_gl000193_random > [1] 189789 > > $chr4_gl000194_random > [1] 191469 > > $chr6_apd_hap1 > [1] 4622290 > > $chr6_cox_hap2 > [1] 4795371 > > $chr6_dbb_hap3 > [1] 4610396 > > $chr6_mann_hap4 > [1] 4683263 > > $chr6_mcf_hap5 > [1] 4833398 > > $chr6_qbl_hap6 > [1] 4611984 > > $chr6_ssto_hap7 > [1] 4928567 > > $chr7_gl000195_random > [1] 182896 > > $chr8_gl000196_random > [1] 38914 > > $chr8_gl000197_random > [1] 37175 > > $chr9_gl000198_random > [1] 90085 > > $chr9_gl000199_random > [1] 169874 > > $chr9_gl000200_random > [1] 187035 > > $chr9_gl000201_random > [1] 36148 > > $chr11_gl000202_random > [1] 40103 > > $chr17_ctg5_hap1 > [1] 1680828 > > $chr17_gl000203_random > [1] 37498 > > $chr17_gl000204_random > [1] 81310 > > $chr17_gl000205_random > [1] 174588 > > $chr17_gl000206_random > [1] 41001 > > $chr18_gl000207_random > [1] 4262 > > $chr19_gl000208_random > [1] 92689 > > $chr19_gl000209_random > [1] 159169 > > $chr21_gl000210_random > [1] 27682 > > $chrUn_gl000211 > [1] 166566 > > $chrUn_gl000212 > [1] 186858 > > $chrUn_gl000213 > [1] 164239 > > $chrUn_gl000214 > [1] 137718 > > $chrUn_gl000215 > [1] 172545 > > $chrUn_gl000216 > [1] 172294 > > $chrUn_gl000217 > [1] 172149 > > $chrUn_gl000218 > [1] 161147 > > $chrUn_gl000219 > [1] 179198 > > $chrUn_gl000220 > [1] 161802 > > $chrUn_gl000221 > [1] 155397 > > $chrUn_gl000222 > [1] 186861 > > $chrUn_gl000223 > [1] 180455 > > $chrUn_gl000224 > [1] 179693 > > $chrUn_gl000225 > [1] 211173 > > $chrUn_gl000226 > [1] 15008 > > $chrUn_gl000227 > [1] 128374 > > $chrUn_gl000228 > [1] 129120 > > $chrUn_gl000229 > [1] 19913 > > $chrUn_gl000230 > [1] 43691 > > $chrUn_gl000231 > [1] 27386 > > $chrUn_gl000232 > [1] 40652 > > $chrUn_gl000233 > [1] 45941 > > $chrUn_gl000234 > [1] 40531 > > $chrUn_gl000235 > [1] 34474 > > $chrUn_gl000236 > [1] 41934 > > $chrUn_gl000237 > [1] 45867 > > $chrUn_gl000238 > [1] 39939 > > $chrUn_gl000239 > [1] 33824 > > $chrUn_gl000240 > [1] 41933 > > $chrUn_gl000241 > [1] 42152 > > $chrUn_gl000242 > [1] 43523 > > $chrUn_gl000243 > [1] 43341 > > $chrUn_gl000244 > [1] 39929 > > $chrUn_gl000245 > [1] 36651 > > $chrUn_gl000246 > [1] 38154 > > $chrUn_gl000247 > [1] 36422 > > $chrUn_gl000248 > [1] 39786 > > $chrUn_gl000249 > [1] 38502 > > > > On 7 Feb 2012, at 13:00, Nicolas Delhomme wrote: > >> Hi Francesco, >> >> I'd need more info to help you. >> >> Can you post the header of your bam files and the content of as.list(seqlengths(Hsapiens)). >> >> In the meanwhile can you run easyRNASeq with normalize set to FALSE, no conditions and the outputFormat set to "RNAseq"? >> >> That will give you back the RNAseq object and we can have a look at what is happening in there. >> >> I get the feeling that it has to do with the exon names being converted into integers (through the use of a factor maybe) but I thought I got that bug squashed already... >> >> Cheers, >> >> Nico >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 >> Email: nicolas.delhomme at embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 >> 69102 Heidelberg, Germany >> --------------------------------------------------------------- >> >> >> >> >> >> On 7 Feb 2012, at 13:37, Lescai, Francesco wrote: >> >>> Hi there, >>> I'm preparing a tutorial, therefore I sliced two RNASeq bam files from chromosome one, in order to reduce the file size. >>> >>> However, when I count the reads as it follows I have an error. >>> thanks, >>> Francesco >>> >>> >>>> countDataSet <- easyRNASeq(filesDirectory=getwd(), >>> + organism="HSapiens", >>> + chr.sizes=as.list(seqlengths(Hsapiens)), >>> + readLength=100L, >>> + annotationMethod="biomaRt", >>> + format="bam", >>> + count="exons", >>> + filenames=bamfiles, >>> + normalize=TRUE, >>> + outputFormat="DESeq", >>> + conditions=conditions, >>> + fitType="local" >>> + ) >>> Checking arguments... >>> Fetching annotations... >>> Summarizing counts... >>> Processing SRR349689_1.fastq.novo.new.chr1.bam >>> Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : >>> (list) object cannot be coerced to type 'integer' >>> >>> I also tried selecting the chromosome with chr.sel=c("1") or chr.sel=c("chr1") but in both cases I get the error >>> Make sure that your file is valid, that your 'chr.sel' (if provided) contains valid values; i.e. values as found in the file, not as returned by 'RNAseq'. >>> >>> The alignment has been done using the ENSEMBL reference, i.e. with chromosome listed as "1". >>> >>> sessionInfo() >>> R Under development (unstable) (2012-01-20 r58146) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] C/en_US.UTF-8/C/C/C/C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] GenomicFeatures_1.6.7 AnnotationDbi_1.17.15 easyRNASeq_1.1.5 >>> [4] ShortRead_1.13.13 latticeExtra_0.6-19 RColorBrewer_1.0-5 >>> [7] Rsamtools_1.7.27 genomeIntervals_1.11.0 intervals_0.13.3 >>> [10] DESeq_1.7.6 locfit_1.5-6 lattice_0.20-0 >>> [13] akima_0.5-7 BiocInstaller_1.3.7 Biobase_2.15.3 >>> [16] edgeR_2.5.9 limma_3.11.11 biomaRt_2.11.1 >>> [19] BSgenome.Hsapiens.UCSC.hg19_1.3.17 BSgenome_1.23.2 Biostrings_2.23.6 >>> [22] GenomicRanges_1.7.16 IRanges_1.13.22 BiocGenerics_0.1.4 >>> >>> loaded via a namespace (and not attached): >>> [1] DBI_0.2-5 RCurl_1.9-5 RSQLite_0.11.1 XML_3.8-0 annotate_1.33.2 >>> [6] bitops_1.0-4.1 genefilter_1.37.0 geneplotter_1.33.1 grid_2.15.0 hwriter_1.3 >>> [11] rtracklayer_1.15.7 splines_2.15.0 survival_2.36-10 tools_2.15.0 xtable_1.6-0 >>> [16] zlibbioc_1.1.1 >>> >>> >>> ------------------------------------------------------------------ --------------- >>> Francesco Lescai, PhD, EDBT >>> Senior Research Associate in Genome Analysis >>> University College London >>> Faculty of Population Health Sciences >>> Dept. Genes, Development & Disease >>> ICH - Molecular Medicine Unit, GOSgene team >>> 30 Guilford Street >>> WC1N 1EH London UK >>> >>> email: f.lescai at ucl.ac.uk<mailto:f.lescai at="" ucl.ac.uk=""> >>> phone: +44.(0)207.905.2274 >>> [ext: 2274] >>> ------------------------------------------------------------------ -------------- >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > -------------------------------------------------------------------- ------------- > Francesco Lescai, PhD, EDBT > Senior Research Associate in Genome Analysis > University College London > Faculty of Population Health Sciences > Dept. Genes, Development & Disease > ICH - Molecular Medicine Unit, GOSgene team > 30 Guilford Street > WC1N 1EH London UK > > email: f.lescai at ucl.ac.uk > phone: +44.(0)207.905.2274 > [ext: 2274] > -------------------------------------------------------------------- ------------ >

ADD REPLY • link 12.8 years ago delhomme@embl.de ★ 1.2k

0

Entering edit mode

Uhm, that was my first guess, having an alignment with the ENSEMBL reference with a process using UCSC formats. All the warnings look like the same Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : (list) object cannot be coerced to type 'integer' In addition: There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: 'matchMatrix' is deprecated. Use 'as.matrix' instead. See help("Deprecated") 2: 'matchMatrix' is deprecated. Use 'as.matrix' instead. See help("Deprecated") 3: 'matchMatrix' is deprecated. Use 'as.matrix' instead. Now, the point is: I should be able to easily adjust this somewhere, should I? Wouldn't be economical running another alignment just to add a "chr". I thought using chr.map could serve to this purpose but I was wrong, I bet I didn't understand its function. therefore, I tried to fix the chromosome names in the annotation this way exon.range<-exons(hg19.tx) gAnnot <- RangedData(exon.range) names(gAnnot)=gsub("^chr","",names(gAnnot),perl=T) colnames(gAnnot)[2]<-"exon" save(gAnnot,file="gAnnot.rda") chr.sizes=as.list(seqlengths(Hsapiens)) names(chr.sizes)=gsub("^chr","",names(chr.sizes)) but then I still have an error > countDataSet <- easyRNASeq(filesDirectory=getwd(), + organism="HSapiens", + chr.sizes=chr.sizes, + chr.map=chr.map, + readLength=100L, + annotationMethod="rda", + annotationFile="gAnnot.rda", + format="bam", + count="exons", + filenames=bamfiles, + normalize=TRUE, + outputFormat="DESeq", + conditions=conditions, + fitType="local" + ) Checking arguments... Fetching annotations... Summarizing counts... Processing SRR349689_1.fastq.novo.new.chr1.bam Error in .local(x, shift, width, weight, ...) : 'x' and 'width' have mismatching names In addition: There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In easyRNASeq(filesDirectory = getwd(), organism = "HSapiens", ... : You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. 2: In .convertToUCSC(names(chrSize(obj)), organismName(obj), ... : No function implemented to convert the names for the organism: HSapiens . If you need to provide your own mapping, use the 'custom' argument. 3: 'matchMatrix' is deprecated. Use 'as.matrix' instead. See help("Deprecated") 4: 'matchMatrix' is deprecated. Use 'as.matrix' instead. See help("Deprecated") 5: 'matchMatrix' is deprecated. which is the same error I get if I output the format RNAseq. what do you suggest? thanks, Francesco On 7 Feb 2012, at 14:46, Nicolas Delhomme wrote: Hi Francesco, The problem is that the name of the chromosome in the chr.size and annotation objects are not the same as in your bam file . Some warnings must say so, look in the 50+ you got by typing warnings(). Why is it important that the names are identical? 1) between the annotation and the bam file: when assigning the reads to the gene, is it important that they both share the same reference, i.e. that they can be located on the same chromosomes. 2) between the annotation, bam file and the chr.size: imagine the case where you have genes present in your annotation that are located at the end of the chromosomes and for which you get no reads. Without defining the chr.size, the coverage vector generated using the reads would stop before the end of the chromosomes. Trying to access the coverage information for these genes would then simply fail. In your case, chromosomes are called GL000225.1 or 22 in your bam files whereas they are probably called chr1, chr2, etc... in your annotation and chr.sizes objects. In such case, I'm trying to guess the right names and emit warnings. Can you please post the warnings so that I double-check this? Finally, the fitType="local" arguments is one for DESeq, so you can drop it as well here, not that it would have any effect when not using DESeq normalization. Btw, I've adapted easyRNASeq to the new DESeq version which object structure changed in the current development release,but it seems you've got the latest release :-) Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme@embl.de<mailto:nicolas.delhomme@embl.de> Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 7 Feb 2012, at 15:07, Lescai, Francesco wrote: Hi Nico, thanks for your quick answer. Here it is the information you ask, it might be quite lengthy. Unfortunately I get the same error in generating the RNAseq object as well. countDataSet <- easyRNASeq(filesDirectory=getwd(), + organism="HSapiens", + chr.sizes=as.list(seqlengths(Hsapiens)), + readLength=100L, + annotationMethod="biomaRt", + format="bam", + count="exons", + filenames=bamfiles, + normalize=FALSE, + outputFormat="RNAseq", + fitType="local" + ) Checking arguments... Fetching annotations... Summarizing counts... Processing SRR349689_1.fastq.novo.new.chr1.bam Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : (list) object cannot be coerced to type 'integer' In addition: There were 50 or more warnings (use warnings() to see the first 50) I also tried using a local annotation file, generated this way hg19.tx <- makeTranscriptDbFromUCSC( genome="hg19", tablename="refGene") exon.range<-exons(hg19.tx) gAnnot <- RangedData(exon.range) colnames(gAnnot)[2]<-"exon" save(gAnnot,file="gAnnot.rda") but then the error is the same. countDataSet <- easyRNASeq(filesDirectory=getwd(), + organism="HSapiens", + chr.sizes=as.list(seqlengths(Hsapiens)), + readLength=100L, + annotationMethod="rda", + annotationFile="gAnnot.rda", + format="bam", + count="exons", + filenames=bamfiles, + normalize=TRUE, + outputFormat="DESeq", + conditions=conditions, + fitType="local" + ) Checking arguments... Fetching annotations... Summarizing counts... Processing SRR349689_1.fastq.novo.new.chr1.bam Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : (list) object cannot be coerced to type 'integer' The .bam file has been generated following Novocraft instructions, from a combined reference of the genome, the transcriptome and the junctions and then recomposed with USeq. (http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+ mRNA+and+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id= 35) is there any other information/file that could help in tracking the issue? thanks, Francesco ---bam header--- samtools view -H SRR349689_1.fastq.novo.new.chr1.bam @HD VN:1.0 SO:unsorted @PG ID:SamTranscriptomeParser CL: args -f SRR349689_1.fastq.novo.sam -u -a 100000 -n 100000 -s SRR349689_1.fastq.novo_converted.sam @RG ID:unknownReadGroup SM:unknownSample @SQ SN:GL000229.1 AS:trascriptome.nix LN:119339 @SQ SN:GL000200.1 AS:trascriptome.nix LN:277914 @SQ SN:GL000228.1 AS:trascriptome.nix LN:228349 @SQ SN:GL000241.1 AS:trascriptome.nix LN:136814 @SQ SN:GL000219.1 AS:trascriptome.nix LN:220315 @SQ SN:GL000191.1 AS:trascriptome.nix LN:202431 @SQ SN:GL000242.1 AS:trascriptome.nix LN:142826 @SQ SN:GL000243.1 AS:trascriptome.nix LN:106738 @SQ SN:22 AS:trascriptome.nix LN:51342124 @SQ SN:GL000245.1 AS:trascriptome.nix LN:132396 @SQ SN:MT AS:trascriptome.nix LN:116551 @SQ SN:3 AS:trascriptome.nix LN:198058734 @SQ SN:GL000217.1 AS:trascriptome.nix LN:224716 @SQ SN:2 AS:trascriptome.nix LN:243284564 @SQ SN:1 AS:trascriptome.nix LN:249336112 @SQ SN:7 AS:trascriptome.nix LN:159196479 @SQ SN:6 AS:trascriptome.nix LN:171154887 @SQ SN:GL000215.1 AS:trascriptome.nix LN:232888 @SQ SN:5 AS:trascriptome.nix LN:181002586 @SQ SN:4 AS:trascriptome.nix LN:191137854 @SQ SN:9 AS:trascriptome.nix LN:141252793 @SQ SN:GL000244.1 AS:trascriptome.nix LN:129122 @SQ SN:GL000216.1 AS:trascriptome.nix LN:230528 @SQ SN:8 AS:trascriptome.nix LN:146387700 @SQ SN:GL000218.1 AS:trascriptome.nix LN:197377 @SQ SN:GL000248.1 AS:trascriptome.nix LN:124693 @SQ SN:GL000224.1 AS:trascriptome.nix LN:226253 @SQ SN:GL000210.1 AS:trascriptome.nix LN:123989 @SQ SN:19 AS:trascriptome.nix LN:59216288 @SQ SN:17 AS:trascriptome.nix LN:81293300 @SQ SN:18 AS:trascriptome.nix LN:78015394 @SQ SN:GL000194.1 AS:trascriptome.nix LN:209503 @SQ SN:15 AS:trascriptome.nix LN:102616513 @SQ SN:16 AS:trascriptome.nix LN:90388655 @SQ SN:13 AS:trascriptome.nix LN:115207096 @SQ SN:14 AS:trascriptome.nix LN:107386086 @SQ SN:GL000195.1 AS:trascriptome.nix LN:278899 @SQ SN:11 AS:trascriptome.nix LN:134990552 @SQ SN:12 AS:trascriptome.nix LN:133927113 @SQ SN:GL000225.1 AS:trascriptome.nix LN:273676 @SQ SN:21 AS:trascriptome.nix LN:48217233 @SQ SN:20 AS:trascriptome.nix LN:63026254 @SQ SN:GL000193.1 AS:trascriptome.nix LN:214851 @SQ SN:GL000246.1 AS:trascriptome.nix LN:126826 @SQ SN:GL000237.1 AS:trascriptome.nix LN:117774 @SQ SN:GL000204.1 AS:trascriptome.nix LN:180839 @SQ SN:Y AS:trascriptome.nix LN:59133001 @SQ SN:GL000247.1 AS:trascriptome.nix LN:135428 @SQ SN:X AS:trascriptome.nix LN:155357811 @SQ SN:GL000205.1 AS:trascriptome.nix LN:219192 @SQ SN:GL000192.1 AS:trascriptome.nix LN:644506 @SQ SN:GL000227.1 AS:trascriptome.nix LN:203665 @SQ SN:GL000197.1 AS:trascriptome.nix LN:132711 @SQ SN:GL000236.1 AS:trascriptome.nix LN:126779 @SQ SN:GL000211.1 AS:trascriptome.nix LN:262752 @SQ SN:GL000240.1 AS:trascriptome.nix LN:139915 @SQ SN:GL000239.1 AS:trascriptome.nix LN:123498 @SQ SN:GL000232.1 AS:trascriptome.nix LN:102363 @SQ SN:GL000212.1 AS:trascriptome.nix LN:282254 @SQ SN:GL000238.1 AS:trascriptome.nix LN:132164 @SQ SN:GL000233.1 AS:trascriptome.nix LN:108194 @SQ SN:GL000249.1 AS:trascriptome.nix LN:137299 @SQ SN:GL000223.1 AS:trascriptome.nix LN:280265 @SQ SN:GL000199.1 AS:trascriptome.nix LN:241559 @SQ SN:10 AS:trascriptome.nix LN:135619864 @SQ SN:GL000209.1 AS:trascriptome.nix LN:167344 @SQ SN:GL000196.1 AS:trascriptome.nix LN:132047 @SQ SN:GL000202.1 AS:trascriptome.nix LN:45488 @SQ SN:GL000220.1 AS:trascriptome.nix LN:261756 @SQ SN:GL000214.1 AS:trascriptome.nix LN:224473 @SQ SN:GL000198.1 AS:trascriptome.nix LN:91607 @SQ SN:GL000234.1 AS:trascriptome.nix LN:32013 @SQ SN:GL000213.1 AS:trascriptome.nix LN:237085 @SQ SN:GL000221.1 AS:trascriptome.nix LN:182753 @SQ SN:GL000222.1 AS:trascriptome.nix LN:264765 @SQ SN:GL000206.1 AS:trascriptome.nix LN:138000 ----------lengths----------------- $chr1 [1] 249250621 $chr2 [1] 243199373 $chr3 [1] 198022430 $chr4 [1] 191154276 $chr5 [1] 180915260 $chr6 [1] 171115067 $chr7 [1] 159138663 $chr8 [1] 146364022 $chr9 [1] 141213431 $chr10 [1] 135534747 $chr11 [1] 135006516 $chr12 [1] 133851895 $chr13 [1] 115169878 $chr14 [1] 107349540 $chr15 [1] 102531392 $chr16 [1] 90354753 $chr17 [1] 81195210 $chr18 [1] 78077248 $chr19 [1] 59128983 $chr20 [1] 63025520 $chr21 [1] 48129895 $chr22 [1] 51304566 $chrX [1] 155270560 $chrY [1] 59373566 $chrM [1] 16571 $chr1_gl000191_random [1] 106433 $chr1_gl000192_random [1] 547496 $chr4_ctg9_hap1 [1] 590426 $chr4_gl000193_random [1] 189789 $chr4_gl000194_random [1] 191469 $chr6_apd_hap1 [1] 4622290 $chr6_cox_hap2 [1] 4795371 $chr6_dbb_hap3 [1] 4610396 $chr6_mann_hap4 [1] 4683263 $chr6_mcf_hap5 [1] 4833398 $chr6_qbl_hap6 [1] 4611984 $chr6_ssto_hap7 [1] 4928567 $chr7_gl000195_random [1] 182896 $chr8_gl000196_random [1] 38914 $chr8_gl000197_random [1] 37175 $chr9_gl000198_random [1] 90085 $chr9_gl000199_random [1] 169874 $chr9_gl000200_random [1] 187035 $chr9_gl000201_random [1] 36148 $chr11_gl000202_random [1] 40103 $chr17_ctg5_hap1 [1] 1680828 $chr17_gl000203_random [1] 37498 $chr17_gl000204_random [1] 81310 $chr17_gl000205_random [1] 174588 $chr17_gl000206_random [1] 41001 $chr18_gl000207_random [1] 4262 $chr19_gl000208_random [1] 92689 $chr19_gl000209_random [1] 159169 $chr21_gl000210_random [1] 27682 $chrUn_gl000211 [1] 166566 $chrUn_gl000212 [1] 186858 $chrUn_gl000213 [1] 164239 $chrUn_gl000214 [1] 137718 $chrUn_gl000215 [1] 172545 $chrUn_gl000216 [1] 172294 $chrUn_gl000217 [1] 172149 $chrUn_gl000218 [1] 161147 $chrUn_gl000219 [1] 179198 $chrUn_gl000220 [1] 161802 $chrUn_gl000221 [1] 155397 $chrUn_gl000222 [1] 186861 $chrUn_gl000223 [1] 180455 $chrUn_gl000224 [1] 179693 $chrUn_gl000225 [1] 211173 $chrUn_gl000226 [1] 15008 $chrUn_gl000227 [1] 128374 $chrUn_gl000228 [1] 129120 $chrUn_gl000229 [1] 19913 $chrUn_gl000230 [1] 43691 $chrUn_gl000231 [1] 27386 $chrUn_gl000232 [1] 40652 $chrUn_gl000233 [1] 45941 $chrUn_gl000234 [1] 40531 $chrUn_gl000235 [1] 34474 $chrUn_gl000236 [1] 41934 $chrUn_gl000237 [1] 45867 $chrUn_gl000238 [1] 39939 $chrUn_gl000239 [1] 33824 $chrUn_gl000240 [1] 41933 $chrUn_gl000241 [1] 42152 $chrUn_gl000242 [1] 43523 $chrUn_gl000243 [1] 43341 $chrUn_gl000244 [1] 39929 $chrUn_gl000245 [1] 36651 $chrUn_gl000246 [1] 38154 $chrUn_gl000247 [1] 36422 $chrUn_gl000248 [1] 39786 $chrUn_gl000249 [1] 38502 On 7 Feb 2012, at 13:00, Nicolas Delhomme wrote: Hi Francesco, I'd need more info to help you. Can you post the header of your bam files and the content of as.list(seqlengths(Hsapiens)). In the meanwhile can you run easyRNASeq with normalize set to FALSE, no conditions and the outputFormat set to "RNAseq"? That will give you back the RNAseq object and we can have a look at what is happening in there. I get the feeling that it has to do with the exon names being converted into integers (through the use of a factor maybe) but I thought I got that bug squashed already... Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme@embl.de<mailto:nicolas.delhomme@embl.de> Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 7 Feb 2012, at 13:37, Lescai, Francesco wrote: Hi there, I'm preparing a tutorial, therefore I sliced two RNASeq bam files from chromosome one, in order to reduce the file size. However, when I count the reads as it follows I have an error. thanks, Francesco countDataSet <- easyRNASeq(filesDirectory=getwd(), + organism="HSapiens", + chr.sizes=as.list(seqlengths(Hsapiens)), + readLength=100L, + annotationMethod="biomaRt", + format="bam", + count="exons", + filenames=bamfiles, + normalize=TRUE, + outputFormat="DESeq", + conditions=conditions, + fitType="local" + ) Checking arguments... Fetching annotations... Summarizing counts... Processing SRR349689_1.fastq.novo.new.chr1.bam Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : (list) object cannot be coerced to type 'integer' I also tried selecting the chromosome with chr.sel=c("1") or chr.sel=c("chr1") but in both cases I get the error Make sure that your file is valid, that your 'chr.sel' (if provided) contains valid values; i.e. values as found in the file, not as returned by 'RNAseq'. The alignment has been done using the ENSEMBL reference, i.e. with chromosome listed as "1". sessionInfo() R Under development (unstable) (2012-01-20 r58146) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.6.7 AnnotationDbi_1.17.15 easyRNASeq_1.1.5 [4] ShortRead_1.13.13 latticeExtra_0.6-19 RColorBrewer_1.0-5 [7] Rsamtools_1.7.27 genomeIntervals_1.11.0 intervals_0.13.3 [10] DESeq_1.7.6 locfit_1.5-6 lattice_0.20-0 [13] akima_0.5-7 BiocInstaller_1.3.7 Biobase_2.15.3 [16] edgeR_2.5.9 limma_3.11.11 biomaRt_2.11.1 [19] BSgenome.Hsapiens.UCSC.hg19_1.3.17 BSgenome_1.23.2 Biostrings_2.23.6 [22] GenomicRanges_1.7.16 IRanges_1.13.22 BiocGenerics_0.1.4 loaded via a namespace (and not attached): [1] DBI_0.2-5 RCurl_1.9-5 RSQLite_0.11.1 XML_3.8-0 annotate_1.33.2 [6] bitops_1.0-4.1 genefilter_1.37.0 geneplotter_1.33.1 grid_2.15.0 hwriter_1.3 [11] rtracklayer_1.15.7 splines_2.15.0 survival_2.36-10 tools_2.15.0 xtable_1.6-0 [16] zlibbioc_1.1.1 ---------------------------------------------------------------------- ----------- Francesco Lescai, PhD, EDBT Senior Research Associate in Genome Analysis University College London Faculty of Population Health Sciences Dept. Genes, Development & Disease ICH - Molecular Medicine Unit, GOSgene team 30 Guilford Street WC1N 1EH London UK email: f.lescai@ucl.ac.uk<mailto:f.lescai@ucl.ac.uk><mailto:f.lescai@u cl.ac.uk=""> phone: +44.(0)207.905.2274 [ext: 2274] ---------------------------------------------------------------------- ---------- [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- ----------- Francesco Lescai, PhD, EDBT Senior Research Associate in Genome Analysis University College London Faculty of Population Health Sciences Dept. Genes, Development & Disease ICH - Molecular Medicine Unit, GOSgene team 30 Guilford Street WC1N 1EH London UK email: f.lescai@ucl.ac.uk<mailto:f.lescai@ucl.ac.uk> phone: +44.(0)207.905.2274 [ext: 2274] ---------------------------------------------------------------------- ---------- ---------------------------------------------------------------------- ----------- Francesco Lescai, PhD, EDBT Senior Research Associate in Genome Analysis University College London Faculty of Population Health Sciences Dept. Genes, Development & Disease ICH - Molecular Medicine Unit, GOSgene team 30 Guilford Street WC1N 1EH London UK email: f.lescai@ucl.ac.uk<mailto:f.lescai@ucl.ac.uk> phone: +44.(0)207.905.2274 [ext: 2274] ---------------------------------------------------------------------- ---------- [[alternative HTML version deleted]]

ADD REPLY • link 12.8 years ago Lescai, Francesco ▴ 380

0

Entering edit mode

Hi Francesco, > Uhm, that was my first guess, having an alignment with the ENSEMBL reference with a process using UCSC formats. Yes, but that should still have worked fine. I'm quite sure that I'd guess the names right. > > All the warnings look like the same > Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : > (list) object cannot be coerced to type 'integer' > In addition: There were 50 or more warnings (use warnings() to see the first 50) > > warnings() > Warning messages: > 1: 'matchMatrix' is deprecated. > Use 'as.matrix' instead. > See help("Deprecated") > 2: 'matchMatrix' is deprecated. > Use 'as.matrix' instead. > See help("Deprecated") > 3: 'matchMatrix' is deprecated. > Use 'as.matrix' instead. > > Now, the point is: I should be able to easily adjust this somewhere, should I? > Wouldn't be economical running another alignment just to add a "chr". > > I thought using chr.map could serve to this purpose but I was wrong, I bet I didn't understand its function. > No, I'm sure you got it correctly. From what you describe that's exactly its purpose. Anyway, thanks to the warning, I just realized that Herv? (Pages, from Bioc) change some function signature. I.e. the findOverlaps does not return a matchMatrix anymore and this is what makes easyRNASeq fail... The easiest for me would probably be to use your data excerpt to fix that problem. Do you see any way I could download them? We can set-up that off-list of course. Cheers, Nico > therefore, I tried to fix the chromosome names in the annotation this way > > exon.range<-exons(hg19.tx) > gAnnot <- RangedData(exon.range) > names(gAnnot)=gsub("^chr","",names(gAnnot),perl=T) > colnames(gAnnot)[2]<-"exon" > save(gAnnot,file="gAnnot.rda") > > chr.sizes=as.list(seqlengths(Hsapiens)) > names(chr.sizes)=gsub("^chr","",names(chr.sizes)) > > but then I still have an error > > > countDataSet <- easyRNASeq(filesDirectory=getwd(), > + organism="HSapiens", > + chr.sizes=chr.sizes, > + chr.map=chr.map, > + readLength=100L, > + annotationMethod="rda", > + annotationFile="gAnnot.rda", > + format="bam", > + count="exons", > + filenames=bamfiles, > + normalize=TRUE, > + outputFormat="DESeq", > + conditions=conditions, > + fitType="local" > + ) > Checking arguments... > Fetching annotations... > Summarizing counts... > Processing SRR349689_1.fastq.novo.new.chr1.bam > Error in .local(x, shift, width, weight, ...) : > 'x' and 'width' have mismatching names > In addition: There were 50 or more warnings (use warnings() to see the first 50) > > warnings() > Warning messages: > 1: In easyRNASeq(filesDirectory = getwd(), organism = "HSapiens", ... : > You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. > 2: In .convertToUCSC(names(chrSize(obj)), organismName(obj), ... : > No function implemented to convert the names for the organism: HSapiens . > If you need to provide your own mapping, use the 'custom' argument. > 3: 'matchMatrix' is deprecated. > Use 'as.matrix' instead. > See help("Deprecated") > 4: 'matchMatrix' is deprecated. > Use 'as.matrix' instead. > See help("Deprecated") > 5: 'matchMatrix' is deprecated. > > which is the same error I get if I output the format RNAseq. > > what do you suggest? > > thanks, > Francesco > > > > On 7 Feb 2012, at 14:46, Nicolas Delhomme wrote: > >> Hi Francesco, >> >> The problem is that the name of the chromosome in the chr.size and annotation objects are not the same as in your bam file . Some warnings must say so, look in the 50+ you got by typing warnings(). >> >> Why is it important that the names are identical? >> >> 1) between the annotation and the bam file: when assigning the reads to the gene, is it important that they both share the same reference, i.e. that they can be located on the same chromosomes. >> >> 2) between the annotation, bam file and the chr.size: imagine the case where you have genes present in your annotation that are located at the end of the chromosomes and for which you get no reads. Without defining the chr.size, the coverage vector generated using the reads would stop before the end of the chromosomes. Trying to access the coverage information for these genes would then simply fail. >> >> In your case, chromosomes are called GL000225.1 or 22 in your bam files whereas they are probably called chr1, chr2, etc... in your annotation and chr.sizes objects. In such case, I'm trying to guess the right names and emit warnings. Can you please post the warnings so that I double-check this? >> >> Finally, the fitType="local" arguments is one for DESeq, so you can drop it as well here, not that it would have any effect when not using DESeq normalization. >> >> Btw, I've adapted easyRNASeq to the new DESeq version which object structure changed in the current development release,but it seems you've got the latest release :-) >> >> Cheers, >> >> Nico >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 >> Email: nicolas.delhomme at embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 >> 69102 Heidelberg, Germany >> --------------------------------------------------------------- >> >> >> >> >> >> On 7 Feb 2012, at 15:07, Lescai, Francesco wrote: >> >>> Hi Nico, >>> thanks for your quick answer. >>> Here it is the information you ask, it might be quite lengthy. >>> Unfortunately I get the same error in generating the RNAseq object as well. >>> >>>> countDataSet <- easyRNASeq(filesDirectory=getwd(), >>> + organism="HSapiens", >>> + chr.sizes=as.list(seqlengths(Hsapiens)), >>> + readLength=100L, >>> + annotationMethod="biomaRt", >>> + format="bam", >>> + count="exons", >>> + filenames=bamfiles, >>> + normalize=FALSE, >>> + outputFormat="RNAseq", >>> + fitType="local" >>> + ) >>> Checking arguments... >>> Fetching annotations... >>> Summarizing counts... >>> Processing SRR349689_1.fastq.novo.new.chr1.bam >>> Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : >>> (list) object cannot be coerced to type 'integer' >>> In addition: There were 50 or more warnings (use warnings() to see the first 50) >>> >>> I also tried using a local annotation file, generated this way >>> >>> hg19.tx <- makeTranscriptDbFromUCSC( >>> genome="hg19", >>> tablename="refGene") >>> exon.range<-exons(hg19.tx) >>> gAnnot <- RangedData(exon.range) >>> colnames(gAnnot)[2]<-"exon" >>> save(gAnnot,file="gAnnot.rda") >>> >>> but then the error is the same. >>> >>>> countDataSet <- easyRNASeq(filesDirectory=getwd(), >>> + organism="HSapiens", >>> + chr.sizes=as.list(seqlengths(Hsapiens)), >>> + readLength=100L, >>> + annotationMethod="rda", >>> + annotationFile="gAnnot.rda", >>> + format="bam", >>> + count="exons", >>> + filenames=bamfiles, >>> + normalize=TRUE, >>> + outputFormat="DESeq", >>> + conditions=conditions, >>> + fitType="local" >>> + ) >>> Checking arguments... >>> Fetching annotations... >>> Summarizing counts... >>> Processing SRR349689_1.fastq.novo.new.chr1.bam >>> Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : >>> (list) object cannot be coerced to type 'integer' >>> >>> >>> The .bam file has been generated following Novocraft instructions, from a combined reference of the genome, the transcriptome and the junctions and then recomposed with USeq. >>> (http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis %3A+mRNA+and+the+Spliceosome&structure=Novocraft+Technologies&page_ref _id=35) >>> >>> is there any other information/file that could help in tracking the issue? >>> >>> thanks, >>> Francesco >>> >>> >>> >>> ---bam header--- >>> samtools view -H SRR349689_1.fastq.novo.new.chr1.bam >>> @HD VN:1.0 >>> SO:unsorted >>> @PG ID:SamTranscriptomeParser >>> CL: args -f SRR349689_1.fastq.novo.sam -u -a 100000 -n 100000 -s SRR349689_1.fastq.novo_converted.sam >>> @RG ID:unknownReadGroup >>> SM:unknownSample >>> @SQ SN:GL000229.1 >>> AS:trascriptome.nix LN:119339 >>> @SQ SN:GL000200.1 >>> AS:trascriptome.nix LN:277914 >>> @SQ SN:GL000228.1 >>> AS:trascriptome.nix LN:228349 >>> @SQ SN:GL000241.1 >>> AS:trascriptome.nix LN:136814 >>> @SQ SN:GL000219.1 >>> AS:trascriptome.nix LN:220315 >>> @SQ SN:GL000191.1 >>> AS:trascriptome.nix LN:202431 >>> @SQ SN:GL000242.1 >>> AS:trascriptome.nix LN:142826 >>> @SQ SN:GL000243.1 >>> AS:trascriptome.nix LN:106738 >>> @SQ SN:22 >>> AS:trascriptome.nix LN:51342124 >>> @SQ SN:GL000245.1 >>> AS:trascriptome.nix LN:132396 >>> @SQ SN:MT >>> AS:trascriptome.nix LN:116551 >>> @SQ SN:3 >>> AS:trascriptome.nix LN:198058734 >>> @SQ SN:GL000217.1 >>> AS:trascriptome.nix LN:224716 >>> @SQ SN:2 >>> AS:trascriptome.nix LN:243284564 >>> @SQ SN:1 >>> AS:trascriptome.nix LN:249336112 >>> @SQ SN:7 >>> AS:trascriptome.nix LN:159196479 >>> @SQ SN:6 >>> AS:trascriptome.nix LN:171154887 >>> @SQ SN:GL000215.1 >>> AS:trascriptome.nix LN:232888 >>> @SQ SN:5 >>> AS:trascriptome.nix LN:181002586 >>> @SQ SN:4 >>> AS:trascriptome.nix LN:191137854 >>> @SQ SN:9 >>> AS:trascriptome.nix LN:141252793 >>> @SQ SN:GL000244.1 >>> AS:trascriptome.nix LN:129122 >>> @SQ SN:GL000216.1 >>> AS:trascriptome.nix LN:230528 >>> @SQ SN:8 >>> AS:trascriptome.nix LN:146387700 >>> @SQ SN:GL000218.1 >>> AS:trascriptome.nix LN:197377 >>> @SQ SN:GL000248.1 >>> AS:trascriptome.nix LN:124693 >>> @SQ SN:GL000224.1 >>> AS:trascriptome.nix LN:226253 >>> @SQ SN:GL000210.1 >>> AS:trascriptome.nix LN:123989 >>> @SQ SN:19 >>> AS:trascriptome.nix LN:59216288 >>> @SQ SN:17 >>> AS:trascriptome.nix LN:81293300 >>> @SQ SN:18 >>> AS:trascriptome.nix LN:78015394 >>> @SQ SN:GL000194.1 >>> AS:trascriptome.nix LN:209503 >>> @SQ SN:15 >>> AS:trascriptome.nix LN:102616513 >>> @SQ SN:16 >>> AS:trascriptome.nix LN:90388655 >>> @SQ SN:13 >>> AS:trascriptome.nix LN:115207096 >>> @SQ SN:14 >>> AS:trascriptome.nix LN:107386086 >>> @SQ SN:GL000195.1 >>> AS:trascriptome.nix LN:278899 >>> @SQ SN:11 >>> AS:trascriptome.nix LN:134990552 >>> @SQ SN:12 >>> AS:trascriptome.nix LN:133927113 >>> @SQ SN:GL000225.1 >>> AS:trascriptome.nix LN:273676 >>> @SQ SN:21 >>> AS:trascriptome.nix LN:48217233 >>> @SQ SN:20 >>> AS:trascriptome.nix LN:63026254 >>> @SQ SN:GL000193.1 >>> AS:trascriptome.nix LN:214851 >>> @SQ SN:GL000246.1 >>> AS:trascriptome.nix LN:126826 >>> @SQ SN:GL000237.1 >>> AS:trascriptome.nix LN:117774 >>> @SQ SN:GL000204.1 >>> AS:trascriptome.nix LN:180839 >>> @SQ SN:Y >>> AS:trascriptome.nix LN:59133001 >>> @SQ SN:GL000247.1 >>> AS:trascriptome.nix LN:135428 >>> @SQ SN:X >>> AS:trascriptome.nix LN:155357811 >>> @SQ SN:GL000205.1 >>> AS:trascriptome.nix LN:219192 >>> @SQ SN:GL000192.1 >>> AS:trascriptome.nix LN:644506 >>> @SQ SN:GL000227.1 >>> AS:trascriptome.nix LN:203665 >>> @SQ SN:GL000197.1 >>> AS:trascriptome.nix LN:132711 >>> @SQ SN:GL000236.1 >>> AS:trascriptome.nix LN:126779 >>> @SQ SN:GL000211.1 >>> AS:trascriptome.nix LN:262752 >>> @SQ SN:GL000240.1 >>> AS:trascriptome.nix LN:139915 >>> @SQ SN:GL000239.1 >>> AS:trascriptome.nix LN:123498 >>> @SQ SN:GL000232.1 >>> AS:trascriptome.nix LN:102363 >>> @SQ SN:GL000212.1 >>> AS:trascriptome.nix LN:282254 >>> @SQ SN:GL000238.1 >>> AS:trascriptome.nix LN:132164 >>> @SQ SN:GL000233.1 >>> AS:trascriptome.nix LN:108194 >>> @SQ SN:GL000249.1 >>> AS:trascriptome.nix LN:137299 >>> @SQ SN:GL000223.1 >>> AS:trascriptome.nix LN:280265 >>> @SQ SN:GL000199.1 >>> AS:trascriptome.nix LN:241559 >>> @SQ SN:10 >>> AS:trascriptome.nix LN:135619864 >>> @SQ SN:GL000209.1 >>> AS:trascriptome.nix LN:167344 >>> @SQ SN:GL000196.1 >>> AS:trascriptome.nix LN:132047 >>> @SQ SN:GL000202.1 >>> AS:trascriptome.nix LN:45488 >>> @SQ SN:GL000220.1 >>> AS:trascriptome.nix LN:261756 >>> @SQ SN:GL000214.1 >>> AS:trascriptome.nix LN:224473 >>> @SQ SN:GL000198.1 >>> AS:trascriptome.nix LN:91607 >>> @SQ SN:GL000234.1 >>> AS:trascriptome.nix LN:32013 >>> @SQ SN:GL000213.1 >>> AS:trascriptome.nix LN:237085 >>> @SQ SN:GL000221.1 >>> AS:trascriptome.nix LN:182753 >>> @SQ SN:GL000222.1 >>> AS:trascriptome.nix LN:264765 >>> @SQ SN:GL000206.1 >>> AS:trascriptome.nix LN:138000 >>> >>> >>> ----------lengths----------------- >>> $chr1 >>> [1] 249250621 >>> >>> $chr2 >>> [1] 243199373 >>> >>> $chr3 >>> [1] 198022430 >>> >>> $chr4 >>> [1] 191154276 >>> >>> $chr5 >>> [1] 180915260 >>> >>> $chr6 >>> [1] 171115067 >>> >>> $chr7 >>> [1] 159138663 >>> >>> $chr8 >>> [1] 146364022 >>> >>> $chr9 >>> [1] 141213431 >>> >>> $chr10 >>> [1] 135534747 >>> >>> $chr11 >>> [1] 135006516 >>> >>> $chr12 >>> [1] 133851895 >>> >>> $chr13 >>> [1] 115169878 >>> >>> $chr14 >>> [1] 107349540 >>> >>> $chr15 >>> [1] 102531392 >>> >>> $chr16 >>> [1] 90354753 >>> >>> $chr17 >>> [1] 81195210 >>> >>> $chr18 >>> [1] 78077248 >>> >>> $chr19 >>> [1] 59128983 >>> >>> $chr20 >>> [1] 63025520 >>> >>> $chr21 >>> [1] 48129895 >>> >>> $chr22 >>> [1] 51304566 >>> >>> $chrX >>> [1] 155270560 >>> >>> $chrY >>> [1] 59373566 >>> >>> $chrM >>> [1] 16571 >>> >>> $chr1_gl000191_random >>> [1] 106433 >>> >>> $chr1_gl000192_random >>> [1] 547496 >>> >>> $chr4_ctg9_hap1 >>> [1] 590426 >>> >>> $chr4_gl000193_random >>> [1] 189789 >>> >>> $chr4_gl000194_random >>> [1] 191469 >>> >>> $chr6_apd_hap1 >>> [1] 4622290 >>> >>> $chr6_cox_hap2 >>> [1] 4795371 >>> >>> $chr6_dbb_hap3 >>> [1] 4610396 >>> >>> $chr6_mann_hap4 >>> [1] 4683263 >>> >>> $chr6_mcf_hap5 >>> [1] 4833398 >>> >>> $chr6_qbl_hap6 >>> [1] 4611984 >>> >>> $chr6_ssto_hap7 >>> [1] 4928567 >>> >>> $chr7_gl000195_random >>> [1] 182896 >>> >>> $chr8_gl000196_random >>> [1] 38914 >>> >>> $chr8_gl000197_random >>> [1] 37175 >>> >>> $chr9_gl000198_random >>> [1] 90085 >>> >>> $chr9_gl000199_random >>> [1] 169874 >>> >>> $chr9_gl000200_random >>> [1] 187035 >>> >>> $chr9_gl000201_random >>> [1] 36148 >>> >>> $chr11_gl000202_random >>> [1] 40103 >>> >>> $chr17_ctg5_hap1 >>> [1] 1680828 >>> >>> $chr17_gl000203_random >>> [1] 37498 >>> >>> $chr17_gl000204_random >>> [1] 81310 >>> >>> $chr17_gl000205_random >>> [1] 174588 >>> >>> $chr17_gl000206_random >>> [1] 41001 >>> >>> $chr18_gl000207_random >>> [1] 4262 >>> >>> $chr19_gl000208_random >>> [1] 92689 >>> >>> $chr19_gl000209_random >>> [1] 159169 >>> >>> $chr21_gl000210_random >>> [1] 27682 >>> >>> $chrUn_gl000211 >>> [1] 166566 >>> >>> $chrUn_gl000212 >>> [1] 186858 >>> >>> $chrUn_gl000213 >>> [1] 164239 >>> >>> $chrUn_gl000214 >>> [1] 137718 >>> >>> $chrUn_gl000215 >>> [1] 172545 >>> >>> $chrUn_gl000216 >>> [1] 172294 >>> >>> $chrUn_gl000217 >>> [1] 172149 >>> >>> $chrUn_gl000218 >>> [1] 161147 >>> >>> $chrUn_gl000219 >>> [1] 179198 >>> >>> $chrUn_gl000220 >>> [1] 161802 >>> >>> $chrUn_gl000221 >>> [1] 155397 >>> >>> $chrUn_gl000222 >>> [1] 186861 >>> >>> $chrUn_gl000223 >>> [1] 180455 >>> >>> $chrUn_gl000224 >>> [1] 179693 >>> >>> $chrUn_gl000225 >>> [1] 211173 >>> >>> $chrUn_gl000226 >>> [1] 15008 >>> >>> $chrUn_gl000227 >>> [1] 128374 >>> >>> $chrUn_gl000228 >>> [1] 129120 >>> >>> $chrUn_gl000229 >>> [1] 19913 >>> >>> $chrUn_gl000230 >>> [1] 43691 >>> >>> $chrUn_gl000231 >>> [1] 27386 >>> >>> $chrUn_gl000232 >>> [1] 40652 >>> >>> $chrUn_gl000233 >>> [1] 45941 >>> >>> $chrUn_gl000234 >>> [1] 40531 >>> >>> $chrUn_gl000235 >>> [1] 34474 >>> >>> $chrUn_gl000236 >>> [1] 41934 >>> >>> $chrUn_gl000237 >>> [1] 45867 >>> >>> $chrUn_gl000238 >>> [1] 39939 >>> >>> $chrUn_gl000239 >>> [1] 33824 >>> >>> $chrUn_gl000240 >>> [1] 41933 >>> >>> $chrUn_gl000241 >>> [1] 42152 >>> >>> $chrUn_gl000242 >>> [1] 43523 >>> >>> $chrUn_gl000243 >>> [1] 43341 >>> >>> $chrUn_gl000244 >>> [1] 39929 >>> >>> $chrUn_gl000245 >>> [1] 36651 >>> >>> $chrUn_gl000246 >>> [1] 38154 >>> >>> $chrUn_gl000247 >>> [1] 36422 >>> >>> $chrUn_gl000248 >>> [1] 39786 >>> >>> $chrUn_gl000249 >>> [1] 38502 >>> >>> >>> >>> On 7 Feb 2012, at 13:00, Nicolas Delhomme wrote: >>> >>>> Hi Francesco, >>>> >>>> I'd need more info to help you. >>>> >>>> Can you post the header of your bam files and the content of as.list(seqlengths(Hsapiens)). >>>> >>>> In the meanwhile can you run easyRNASeq with normalize set to FALSE, no conditions and the outputFormat set to "RNAseq"? >>>> >>>> That will give you back the RNAseq object and we can have a look at what is happening in there. >>>> >>>> I get the feeling that it has to do with the exon names being converted into integers (through the use of a factor maybe) but I thought I got that bug squashed already... >>>> >>>> Cheers, >>>> >>>> Nico >>>> >>>> --------------------------------------------------------------- >>>> Nicolas Delhomme >>>> >>>> Genome Biology Computational Support >>>> >>>> European Molecular Biology Laboratory >>>> >>>> Tel: +49 6221 387 8310 >>>> Email: nicolas.delhomme at embl.de >>>> Meyerhofstrasse 1 - Postfach 10.2209 >>>> 69102 Heidelberg, Germany >>>> --------------------------------------------------------------- >>>> >>>> >>>> >>>> >>>> >>>> On 7 Feb 2012, at 13:37, Lescai, Francesco wrote: >>>> >>>>> Hi there, >>>>> I'm preparing a tutorial, therefore I sliced two RNASeq bam files from chromosome one, in order to reduce the file size. >>>>> >>>>> However, when I count the reads as it follows I have an error. >>>>> thanks, >>>>> Francesco >>>>> >>>>> >>>>>> countDataSet <- easyRNASeq(filesDirectory=getwd(), >>>>> + organism="HSapiens", >>>>> + chr.sizes=as.list(seqlengths(Hsapiens)), >>>>> + readLength=100L, >>>>> + annotationMethod="biomaRt", >>>>> + format="bam", >>>>> + count="exons", >>>>> + filenames=bamfiles, >>>>> + normalize=TRUE, >>>>> + outputFormat="DESeq", >>>>> + conditions=conditions, >>>>> + fitType="local" >>>>> + ) >>>>> Checking arguments... >>>>> Fetching annotations... >>>>> Summarizing counts... >>>>> Processing SRR349689_1.fastq.novo.new.chr1.bam >>>>> Error in FUN(c("GL000229.1", "GL000200.1", "GL000228.1", "GL000241.1", : >>>>> (list) object cannot be coerced to type 'integer' >>>>> >>>>> I also tried selecting the chromosome with chr.sel=c("1") or chr.sel=c("chr1") but in both cases I get the error >>>>> Make sure that your file is valid, that your 'chr.sel' (if provided) contains valid values; i.e. values as found in the file, not as returned by 'RNAseq'. >>>>> >>>>> The alignment has been done using the ENSEMBL reference, i.e. with chromosome listed as "1". >>>>> >>>>> sessionInfo() >>>>> R Under development (unstable) (2012-01-20 r58146) >>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>>> >>>>> locale: >>>>> [1] C/en_US.UTF-8/C/C/C/C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] GenomicFeatures_1.6.7 AnnotationDbi_1.17.15 easyRNASeq_1.1.5 >>>>> [4] ShortRead_1.13.13 latticeExtra_0.6-19 RColorBrewer_1.0-5 >>>>> [7] Rsamtools_1.7.27 genomeIntervals_1.11.0 intervals_0.13.3 >>>>> [10] DESeq_1.7.6 locfit_1.5-6 lattice_0.20-0 >>>>> [13] akima_0.5-7 BiocInstaller_1.3.7 Biobase_2.15.3 >>>>> [16] edgeR_2.5.9 limma_3.11.11 biomaRt_2.11.1 >>>>> [19] BSgenome.Hsapiens.UCSC.hg19_1.3.17 BSgenome_1.23.2 Biostrings_2.23.6 >>>>> [22] GenomicRanges_1.7.16 IRanges_1.13.22 BiocGenerics_0.1.4 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] DBI_0.2-5 RCurl_1.9-5 RSQLite_0.11.1 XML_3.8-0 annotate_1.33.2 >>>>> [6] bitops_1.0-4.1 genefilter_1.37.0 geneplotter_1.33.1 grid_2.15.0 hwriter_1.3 >>>>> [11] rtracklayer_1.15.7 splines_2.15.0 survival_2.36-10 tools_2.15.0 xtable_1.6-0 >>>>> [16] zlibbioc_1.1.1 >>>>> >>>>> >>>>> ---------------------------------------------------------------- ----------------- >>>>> Francesco Lescai, PhD, EDBT >>>>> Senior Research Associate in Genome Analysis >>>>> University College London >>>>> Faculty of Population Health Sciences >>>>> Dept. Genes, Development & Disease >>>>> ICH - Molecular Medicine Unit, GOSgene team >>>>> 30 Guilford Street >>>>> WC1N 1EH London UK >>>>> >>>>> email: f.lescai at ucl.ac.uk<mailto:f.lescai at="" ucl.ac.uk=""> >>>>> phone: +44.(0)207.905.2274 >>>>> [ext: 2274] >>>>> ---------------------------------------------------------------- ---------------- >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> ------------------------------------------------------------------ --------------- >>> Francesco Lescai, PhD, EDBT >>> Senior Research Associate in Genome Analysis >>> University College London >>> Faculty of Population Health Sciences >>> Dept. Genes, Development & Disease >>> ICH - Molecular Medicine Unit, GOSgene team >>> 30 Guilford Street >>> WC1N 1EH London UK >>> >>> email: f.lescai at ucl.ac.uk >>> phone: +44.(0)207.905.2274 >>> [ext: 2274] >>> ------------------------------------------------------------------ -------------- >>> >> >> > > -------------------------------------------------------------------- ------------- > Francesco Lescai, PhD, EDBT > Senior Research Associate in Genome Analysis > University College London > Faculty of Population Health Sciences > Dept. Genes, Development & Disease > ICH - Molecular Medicine Unit, GOSgene team > 30 Guilford Street > WC1N 1EH London UK > > email: f.lescai at ucl.ac.uk > phone: +44.(0)207.905.2274 > [ext: 2274] > -------------------------------------------------------------------- ------------ >

ADD REPLY • link 12.8 years ago delhomme@embl.de ★ 1.2k

Login before adding your answer.