countOverlaps gives an empty SummarizeExperiments

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.1 years ago

Hi expert from the list, I am a newcomer trying to perform Arabidopsis RNASeq with DESeq I am stuck at countOvelaps. It gives me the following warning in all the try outs i have performed so far Warning message: In if (all(strand(reads) == "*")) ignore.strand <- TRUE : the condition has length > 1 and only the first element will be used The file is created but empty >CountOverlaps_output class: SummarizedExperiment dim: 33602 1 exptData(0): assays(1): counts rownames(33602): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410 rowData metadata column names(0): colnames(1): reads colData names(2): object records as I understand exptData is empty and there the counts should appear my .bams are paired end aligned with tophat2 usin TAIR10 as reference , They are imported to R as >Col_2 <- readGAlignmentPairsFromBam("Col_2.bam") >Col_2 GAlignmentPairs with 13746673 alignment pairs and 0 metadata columns: seqnames strand : ranges -- ranges <rle> <rle> : <iranges> -- <iranges> [1] Chr1 + : [3656, 3745] -- [3667, 3755] [2] Chr1 + : [3659, 3748] -- [3666, 3754] ... ... ... ... ... ... ... [13746672] mitochondria + : [366403, 366492] -- [366505, 366593] [13746673] mitochondria - : [366648, 366737] -- [366557, 366645] --- seqlengths: Chr1 Chr2 Chr3 ... chloroplast mitochondria 30427671 19698289 23459830 ... 154478 366924 The reference genome TAIR10 comes form biomart Previously, I also tried to made my own one using Rtracklayer import, as suggested by Valerie in the Counting with summarizeOverlaps vignnete, but I found biomart more accurate >library(TxDb.Athaliana.BioMart.plantsmart19) >TAIR10_GRL<- exonsBy (TxDb.Athaliana.BioMart.plantsmart19, "gene") # Added correct Chr sizes and the same names than the samples >seqlengths(TAIR10_GRL) <- c(30427671,19698289,23459830,18585056,26975502,154478,366924) #chr (1, 2, 3, 4, 5, C, M) >seqlevels(TAIR10_GRL) <-c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "chloroplast", "mitochondria") > TAIR10_GRL GRangesList of length 33602: $AT1G01010 GRanges with 6 ranges and 2 metadata columns: seqnames ranges strand | exon_id exon_name <rle> <iranges> <rle> | <integer> <character> [1] Chr1 [3631, 3913] + | 1 AT1G01010-E.1 [2] Chr1 [3996, 4276] + | 2 AT1G01010-E.2 [3] Chr1 [4486, 4605] + | 3 AT1G01010-E.3 [4] Chr1 [4706, 5095] + | 4 AT1G01010-E.4 [5] Chr1 [5174, 5326] + | 5 AT1G01010-E.5 [6] Chr1 [5439, 5899] + | 6 AT1G01010-E.6 ... <33601 more elements> --- seqlengths: Chr1 Chr2 Chr3 Chr4 Chr5 chloroplast mitochondria 30427671 19698289 23459830 18585056 26975502 154478 366924 I have tried the following commands to get the summarizedExperiment object: >col2_cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", singleEnd=F) as a previous Arabidopsis user asked on the forum, and realized, I also tried to make ignore.strand=T without getting a different output >col_2cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", ignore.strand=T, singleEnd=F) I have also tried, with the same empty list output >fls <- c("~/bams/accepted_hits_Col_2.bam", "~/bams/accepted_hits_Col_3.bam", "~/bams/accepted_hits_M_1.bam", "~/bams/accepted_hits_M_2.bam", "~/bams/accepted_hits_M_3.bam") >BFL <- BamFileList(fls, index = character()) >SumExp<-summarizeOverlaps(TAIR10_GRL, BFL, mode="Union", singleEnd=F) When I used Htseq (As suggested by Simon Anders in the DESeq Vignette) I get thelist, but I would like to implement the whole pipeline with bioconductor it seems to me that i am making a very basic mistake, any idea or suggestion will be greatly appreciated Kind regards -- output of sessionInfo(): R version 3.0.3 (2014-03-06) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] easyRNASeq_1.8.7 ShortRead_1.20.0 edgeR_3.4.2 [4] limma_3.18.9 biomaRt_2.18.0 genomeIntervals_1.18.0 [7] intervals_0.14.0 multicore_0.1-7 DESeq_1.14.0 [10] lattice_0.20-27 locfit_1.5-9.1 Biobase_2.22.0 [13] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4 [16] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] annotate_1.40.1 AnnotationDbi_1.24.0 bitops_1.0-6 [4] DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 [7] grid_3.0.3 hwriter_1.3 latticeExtra_0.6-26 [10] LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1 [13] RSQLite_0.11.4 splines_3.0.3 stats4_3.0.3 [16] survival_2.37-7 XML_3.98-1.1 xtable_1.7-3 [19] zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.

RNASeq Alignment biomaRt rtracklayer DESeq RNASeq Alignment biomaRt rtracklayer DESeq • 1.2k views

ADD COMMENT • link updated 10.5 years ago by Martin Morgan 25k • written 10.5 years ago by Guest User ★ 13k

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 10 weeks ago

United States

On 04/01/2014 05:54 AM, Rafael Viana [guest] wrote: > > Hi expert from the list, > > I am a newcomer trying to perform Arabidopsis RNASeq with DESeq > I am stuck at countOvelaps. It gives me the following warning in all the try outs i have performed so far > > Warning message: > In if (all(strand(reads) == "*")) ignore.strand <- TRUE : > the condition has length > 1 and only the first element will be used > > The file is created but empty > >> CountOverlaps_output > class: SummarizedExperiment > dim: 33602 1 > exptData(0): > assays(1): counts > rownames(33602): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410 > rowData metadata column names(0): > colnames(1): reads > colData names(2): object records > > as I understand exptData is empty and there the counts should appear Hi Rafael -- the counts are actually in the 'assays', e.g., assays(CountOverlaps_output)[[1]] (short-cut when there's just a single assay -- assay(CountOverlaps_output)). See the 'summarizeOverlaps' vignette http://bioconductor.org/packages/release/bioc/html/GenomicRanges.html or (for the next release, scheduled Apr 14) http://bioconductor.org/packages/devel/bioc/html/GenomicAlignments.htm l and the help page for the class ?"SummarizedExperiment-class". 'exptData' is meant for overall data pertinent to the experiment, e.g., a publication, the lab & date where the data came from, etc. Martin > > > my .bams are paired end aligned with tophat2 usin TAIR10 as reference , They are imported to R as > >> Col_2 <- readGAlignmentPairsFromBam("Col_2.bam") >> Col_2 > > GAlignmentPairs with 13746673 alignment pairs and 0 metadata columns: > seqnames strand : ranges -- ranges > <rle> <rle> : <iranges> -- <iranges> > [1] Chr1 + : [3656, 3745] -- [3667, 3755] > [2] Chr1 + : [3659, 3748] -- [3666, 3754] > ... ... ... ... ... ... ... > [13746672] mitochondria + : [366403, 366492] -- [366505, 366593] > [13746673] mitochondria - : [366648, 366737] -- [366557, 366645] > --- > seqlengths: > Chr1 Chr2 Chr3 ... chloroplast mitochondria > 30427671 19698289 23459830 ... 154478 366924 > > The reference genome TAIR10 comes form biomart > Previously, I also tried to made my own one using Rtracklayer import, as suggested by Valerie in the Counting with summarizeOverlaps vignnete, but I found biomart more accurate > >> library(TxDb.Athaliana.BioMart.plantsmart19) >> TAIR10_GRL<- exonsBy (TxDb.Athaliana.BioMart.plantsmart19, "gene") > # Added correct Chr sizes and the same names than the samples >> seqlengths(TAIR10_GRL) <- c(30427671,19698289,23459830,18585056,26975502,154478,366924) #chr (1, 2, 3, 4, 5, C, M) >> seqlevels(TAIR10_GRL) <-c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "chloroplast", "mitochondria") > >> TAIR10_GRL > GRangesList of length 33602: > $AT1G01010 > GRanges with 6 ranges and 2 metadata columns: > seqnames ranges strand | exon_id exon_name > <rle> <iranges> <rle> | <integer> <character> > [1] Chr1 [3631, 3913] + | 1 AT1G01010-E.1 > [2] Chr1 [3996, 4276] + | 2 AT1G01010-E.2 > [3] Chr1 [4486, 4605] + | 3 AT1G01010-E.3 > [4] Chr1 [4706, 5095] + | 4 AT1G01010-E.4 > [5] Chr1 [5174, 5326] + | 5 AT1G01010-E.5 > [6] Chr1 [5439, 5899] + | 6 AT1G01010-E.6 > ... > <33601 more elements> > --- > seqlengths: > Chr1 Chr2 Chr3 Chr4 Chr5 chloroplast mitochondria > 30427671 19698289 23459830 18585056 26975502 154478 366924 > > > I have tried the following commands to get the summarizedExperiment object: > > >> col2_cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", singleEnd=F) > > as a previous Arabidopsis user asked on the forum, and realized, I also tried to make ignore.strand=T without getting a different output >> col_2cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", ignore.strand=T, singleEnd=F) > > > I have also tried, with the same empty list output >> fls <- c("~/bams/accepted_hits_Col_2.bam", "~/bams/accepted_hits_Col_3.bam", > "~/bams/accepted_hits_M_1.bam", "~/bams/accepted_hits_M_2.bam", "~/bams/accepted_hits_M_3.bam") >> BFL <- BamFileList(fls, index = character()) >> SumExp<-summarizeOverlaps(TAIR10_GRL, BFL, mode="Union", singleEnd=F) > > When I used Htseq (As suggested by Simon Anders in the DESeq Vignette) I get thelist, but I would like to implement the whole pipeline with bioconductor > it seems to me that i am making a very basic mistake, any idea or suggestion will be greatly appreciated > > > Kind regards > > > > > -- output of sessionInfo(): > > R version 3.0.3 (2014-03-06) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] easyRNASeq_1.8.7 ShortRead_1.20.0 edgeR_3.4.2 > [4] limma_3.18.9 biomaRt_2.18.0 genomeIntervals_1.18.0 > [7] intervals_0.14.0 multicore_0.1-7 DESeq_1.14.0 > [10] lattice_0.20-27 locfit_1.5-9.1 Biobase_2.22.0 > [13] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4 > [16] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.1 AnnotationDbi_1.24.0 bitops_1.0-6 > [4] DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 > [7] grid_3.0.3 hwriter_1.3 latticeExtra_0.6-26 > [10] LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1 > [13] RSQLite_0.11.4 splines_3.0.3 stats4_3.0.3 > [16] survival_2.37-7 XML_3.98-1.1 xtable_1.7-3 > [19] zlibbioc_1.8.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 10.5 years ago Martin Morgan 25k

Login before adding your answer.