countOverlaps gives an empty SummarizeExperiments
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi expert from the list, I am a newcomer trying to perform Arabidopsis RNASeq with DESeq I am stuck at countOvelaps. It gives me the following warning in all the try outs i have performed so far Warning message: In if (all(strand(reads) == "*")) ignore.strand <- TRUE : the condition has length > 1 and only the first element will be used The file is created but empty >CountOverlaps_output class: SummarizedExperiment dim: 33602 1 exptData(0): assays(1): counts rownames(33602): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410 rowData metadata column names(0): colnames(1): reads colData names(2): object records as I understand exptData is empty and there the counts should appear my .bams are paired end aligned with tophat2 usin TAIR10 as reference , They are imported to R as >Col_2 <- readGAlignmentPairsFromBam("Col_2.bam") >Col_2 GAlignmentPairs with 13746673 alignment pairs and 0 metadata columns: seqnames strand : ranges -- ranges <rle> <rle> : <iranges> -- <iranges> [1] Chr1 + : [3656, 3745] -- [3667, 3755] [2] Chr1 + : [3659, 3748] -- [3666, 3754] ... ... ... ... ... ... ... [13746672] mitochondria + : [366403, 366492] -- [366505, 366593] [13746673] mitochondria - : [366648, 366737] -- [366557, 366645] --- seqlengths: Chr1 Chr2 Chr3 ... chloroplast mitochondria 30427671 19698289 23459830 ... 154478 366924 The reference genome TAIR10 comes form biomart Previously, I also tried to made my own one using Rtracklayer import, as suggested by Valerie in the Counting with summarizeOverlaps vignnete, but I found biomart more accurate >library(TxDb.Athaliana.BioMart.plantsmart19) >TAIR10_GRL<- exonsBy (TxDb.Athaliana.BioMart.plantsmart19, "gene") # Added correct Chr sizes and the same names than the samples >seqlengths(TAIR10_GRL) <- c(30427671,19698289,23459830,18585056,26975502,154478,366924) #chr (1, 2, 3, 4, 5, C, M) >seqlevels(TAIR10_GRL) <-c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "chloroplast", "mitochondria") > TAIR10_GRL GRangesList of length 33602: $AT1G01010 GRanges with 6 ranges and 2 metadata columns: seqnames ranges strand | exon_id exon_name <rle> <iranges> <rle> | <integer> <character> [1] Chr1 [3631, 3913] + | 1 AT1G01010-E.1 [2] Chr1 [3996, 4276] + | 2 AT1G01010-E.2 [3] Chr1 [4486, 4605] + | 3 AT1G01010-E.3 [4] Chr1 [4706, 5095] + | 4 AT1G01010-E.4 [5] Chr1 [5174, 5326] + | 5 AT1G01010-E.5 [6] Chr1 [5439, 5899] + | 6 AT1G01010-E.6 ... <33601 more elements> --- seqlengths: Chr1 Chr2 Chr3 Chr4 Chr5 chloroplast mitochondria 30427671 19698289 23459830 18585056 26975502 154478 366924 I have tried the following commands to get the summarizedExperiment object: >col2_cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", singleEnd=F) as a previous Arabidopsis user asked on the forum, and realized, I also tried to make ignore.strand=T without getting a different output >col_2cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", ignore.strand=T, singleEnd=F) I have also tried, with the same empty list output >fls <- c("~/bams/accepted_hits_Col_2.bam", "~/bams/accepted_hits_Col_3.bam", "~/bams/accepted_hits_M_1.bam", "~/bams/accepted_hits_M_2.bam", "~/bams/accepted_hits_M_3.bam") >BFL <- BamFileList(fls, index = character()) >SumExp<-summarizeOverlaps(TAIR10_GRL, BFL, mode="Union", singleEnd=F) When I used Htseq (As suggested by Simon Anders in the DESeq Vignette) I get thelist, but I would like to implement the whole pipeline with bioconductor it seems to me that i am making a very basic mistake, any idea or suggestion will be greatly appreciated Kind regards -- output of sessionInfo(): R version 3.0.3 (2014-03-06) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] easyRNASeq_1.8.7 ShortRead_1.20.0 edgeR_3.4.2 [4] limma_3.18.9 biomaRt_2.18.0 genomeIntervals_1.18.0 [7] intervals_0.14.0 multicore_0.1-7 DESeq_1.14.0 [10] lattice_0.20-27 locfit_1.5-9.1 Biobase_2.22.0 [13] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4 [16] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] annotate_1.40.1 AnnotationDbi_1.24.0 bitops_1.0-6 [4] DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 [7] grid_3.0.3 hwriter_1.3 latticeExtra_0.6-26 [10] LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1 [13] RSQLite_0.11.4 splines_3.0.3 stats4_3.0.3 [16] survival_2.37-7 XML_3.98-1.1 xtable_1.7-3 [19] zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.
RNASeq Alignment biomaRt rtracklayer DESeq RNASeq Alignment biomaRt rtracklayer DESeq • 1.1k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States
On 04/01/2014 05:54 AM, Rafael Viana [guest] wrote: > > Hi expert from the list, > > I am a newcomer trying to perform Arabidopsis RNASeq with DESeq > I am stuck at countOvelaps. It gives me the following warning in all the try outs i have performed so far > > Warning message: > In if (all(strand(reads) == "*")) ignore.strand <- TRUE : > the condition has length > 1 and only the first element will be used > > The file is created but empty > >> CountOverlaps_output > class: SummarizedExperiment > dim: 33602 1 > exptData(0): > assays(1): counts > rownames(33602): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410 > rowData metadata column names(0): > colnames(1): reads > colData names(2): object records > > as I understand exptData is empty and there the counts should appear Hi Rafael -- the counts are actually in the 'assays', e.g., assays(CountOverlaps_output)[[1]] (short-cut when there's just a single assay -- assay(CountOverlaps_output)). See the 'summarizeOverlaps' vignette http://bioconductor.org/packages/release/bioc/html/GenomicRanges.html or (for the next release, scheduled Apr 14) http://bioconductor.org/packages/devel/bioc/html/GenomicAlignments.htm l and the help page for the class ?"SummarizedExperiment-class". 'exptData' is meant for overall data pertinent to the experiment, e.g., a publication, the lab & date where the data came from, etc. Martin > > > my .bams are paired end aligned with tophat2 usin TAIR10 as reference , They are imported to R as > >> Col_2 <- readGAlignmentPairsFromBam("Col_2.bam") >> Col_2 > > GAlignmentPairs with 13746673 alignment pairs and 0 metadata columns: > seqnames strand : ranges -- ranges > <rle> <rle> : <iranges> -- <iranges> > [1] Chr1 + : [3656, 3745] -- [3667, 3755] > [2] Chr1 + : [3659, 3748] -- [3666, 3754] > ... ... ... ... ... ... ... > [13746672] mitochondria + : [366403, 366492] -- [366505, 366593] > [13746673] mitochondria - : [366648, 366737] -- [366557, 366645] > --- > seqlengths: > Chr1 Chr2 Chr3 ... chloroplast mitochondria > 30427671 19698289 23459830 ... 154478 366924 > > The reference genome TAIR10 comes form biomart > Previously, I also tried to made my own one using Rtracklayer import, as suggested by Valerie in the Counting with summarizeOverlaps vignnete, but I found biomart more accurate > >> library(TxDb.Athaliana.BioMart.plantsmart19) >> TAIR10_GRL<- exonsBy (TxDb.Athaliana.BioMart.plantsmart19, "gene") > # Added correct Chr sizes and the same names than the samples >> seqlengths(TAIR10_GRL) <- c(30427671,19698289,23459830,18585056,26975502,154478,366924) #chr (1, 2, 3, 4, 5, C, M) >> seqlevels(TAIR10_GRL) <-c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "chloroplast", "mitochondria") > >> TAIR10_GRL > GRangesList of length 33602: > $AT1G01010 > GRanges with 6 ranges and 2 metadata columns: > seqnames ranges strand | exon_id exon_name > <rle> <iranges> <rle> | <integer> <character> > [1] Chr1 [3631, 3913] + | 1 AT1G01010-E.1 > [2] Chr1 [3996, 4276] + | 2 AT1G01010-E.2 > [3] Chr1 [4486, 4605] + | 3 AT1G01010-E.3 > [4] Chr1 [4706, 5095] + | 4 AT1G01010-E.4 > [5] Chr1 [5174, 5326] + | 5 AT1G01010-E.5 > [6] Chr1 [5439, 5899] + | 6 AT1G01010-E.6 > ... > <33601 more elements> > --- > seqlengths: > Chr1 Chr2 Chr3 Chr4 Chr5 chloroplast mitochondria > 30427671 19698289 23459830 18585056 26975502 154478 366924 > > > I have tried the following commands to get the summarizedExperiment object: > > >> col2_cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", singleEnd=F) > > as a previous Arabidopsis user asked on the forum, and realized, I also tried to make ignore.strand=T without getting a different output >> col_2cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union", ignore.strand=T, singleEnd=F) > > > I have also tried, with the same empty list output >> fls <- c("~/bams/accepted_hits_Col_2.bam", "~/bams/accepted_hits_Col_3.bam", > "~/bams/accepted_hits_M_1.bam", "~/bams/accepted_hits_M_2.bam", "~/bams/accepted_hits_M_3.bam") >> BFL <- BamFileList(fls, index = character()) >> SumExp<-summarizeOverlaps(TAIR10_GRL, BFL, mode="Union", singleEnd=F) > > When I used Htseq (As suggested by Simon Anders in the DESeq Vignette) I get thelist, but I would like to implement the whole pipeline with bioconductor > it seems to me that i am making a very basic mistake, any idea or suggestion will be greatly appreciated > > > Kind regards > > > > > -- output of sessionInfo(): > > R version 3.0.3 (2014-03-06) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] easyRNASeq_1.8.7 ShortRead_1.20.0 edgeR_3.4.2 > [4] limma_3.18.9 biomaRt_2.18.0 genomeIntervals_1.18.0 > [7] intervals_0.14.0 multicore_0.1-7 DESeq_1.14.0 > [10] lattice_0.20-27 locfit_1.5-9.1 Biobase_2.22.0 > [13] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4 > [16] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.1 AnnotationDbi_1.24.0 bitops_1.0-6 > [4] DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 > [7] grid_3.0.3 hwriter_1.3 latticeExtra_0.6-26 > [10] LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1 > [13] RSQLite_0.11.4 splines_3.0.3 stats4_3.0.3 > [16] survival_2.37-7 XML_3.98-1.1 xtable_1.7-3 > [19] zlibbioc_1.8.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6