Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.1 years ago
Hi expert from the list,
I am a newcomer trying to perform Arabidopsis RNASeq with DESeq
I am stuck at countOvelaps. It gives me the following warning in all
the try outs i have performed so far
Warning message:
In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
the condition has length > 1 and only the first element will be used
The file is created but empty
>CountOverlaps_output
class: SummarizedExperiment
dim: 33602 1
exptData(0):
assays(1): counts
rownames(33602): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410
rowData metadata column names(0):
colnames(1): reads
colData names(2): object records
as I understand exptData is empty and there the counts should appear
my .bams are paired end aligned with tophat2 usin TAIR10 as reference
, They are imported to R as
>Col_2 <- readGAlignmentPairsFromBam("Col_2.bam")
>Col_2
GAlignmentPairs with 13746673 alignment pairs and 0 metadata columns:
seqnames strand : ranges --
ranges
<rle> <rle> : <iranges> --
<iranges>
[1] Chr1 + : [3656, 3745] -- [3667,
3755]
[2] Chr1 + : [3659, 3748] -- [3666,
3754]
... ... ... ... ... ...
...
[13746672] mitochondria + : [366403, 366492] -- [366505,
366593]
[13746673] mitochondria - : [366648, 366737] -- [366557,
366645]
---
seqlengths:
Chr1 Chr2 Chr3 ... chloroplast mitochondria
30427671 19698289 23459830 ... 154478 366924
The reference genome TAIR10 comes form biomart
Previously, I also tried to made my own one using Rtracklayer import,
as suggested by Valerie in the Counting with summarizeOverlaps
vignnete, but I found biomart more accurate
>library(TxDb.Athaliana.BioMart.plantsmart19)
>TAIR10_GRL<- exonsBy (TxDb.Athaliana.BioMart.plantsmart19, "gene")
# Added correct Chr sizes and the same names than the samples
>seqlengths(TAIR10_GRL) <-
c(30427671,19698289,23459830,18585056,26975502,154478,366924) #chr
(1, 2, 3, 4, 5, C, M)
>seqlevels(TAIR10_GRL) <-c("Chr1", "Chr2", "Chr3", "Chr4", "Chr5",
"chloroplast", "mitochondria")
> TAIR10_GRL
GRangesList of length 33602:
$AT1G01010
GRanges with 6 ranges and 2 metadata columns:
seqnames ranges strand | exon_id exon_name
<rle> <iranges> <rle> | <integer> <character>
[1] Chr1 [3631, 3913] + | 1 AT1G01010-E.1
[2] Chr1 [3996, 4276] + | 2 AT1G01010-E.2
[3] Chr1 [4486, 4605] + | 3 AT1G01010-E.3
[4] Chr1 [4706, 5095] + | 4 AT1G01010-E.4
[5] Chr1 [5174, 5326] + | 5 AT1G01010-E.5
[6] Chr1 [5439, 5899] + | 6 AT1G01010-E.6
...
<33601 more elements>
---
seqlengths:
Chr1 Chr2 Chr3 Chr4 Chr5
chloroplast mitochondria
30427671 19698289 23459830 18585056 26975502
154478 366924
I have tried the following commands to get the summarizedExperiment
object:
>col2_cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union",
singleEnd=F)
as a previous Arabidopsis user asked on the forum, and realized, I
also tried to make ignore.strand=T without getting a different output
>col_2cd<-summarizeOverlaps(TAIR10_GRL, Col_2, mode="Union",
ignore.strand=T, singleEnd=F)
I have also tried, with the same empty list output
>fls <- c("~/bams/accepted_hits_Col_2.bam",
"~/bams/accepted_hits_Col_3.bam",
"~/bams/accepted_hits_M_1.bam",
"~/bams/accepted_hits_M_2.bam", "~/bams/accepted_hits_M_3.bam")
>BFL <- BamFileList(fls, index = character())
>SumExp<-summarizeOverlaps(TAIR10_GRL, BFL, mode="Union", singleEnd=F)
When I used Htseq (As suggested by Simon Anders in the DESeq Vignette)
I get thelist, but I would like to implement the whole pipeline with
bioconductor
it seems to me that i am making a very basic mistake, any idea or
suggestion will be greatly appreciated
Kind regards
-- output of sessionInfo():
R version 3.0.3 (2014-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] easyRNASeq_1.8.7 ShortRead_1.20.0 edgeR_3.4.2
[4] limma_3.18.9 biomaRt_2.18.0
genomeIntervals_1.18.0
[7] intervals_0.14.0 multicore_0.1-7 DESeq_1.14.0
[10] lattice_0.20-27 locfit_1.5-9.1 Biobase_2.22.0
[13] Rsamtools_1.14.2 Biostrings_2.30.1
GenomicRanges_1.14.4
[16] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] annotate_1.40.1 AnnotationDbi_1.24.0 bitops_1.0-6
[4] DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0
[7] grid_3.0.3 hwriter_1.3 latticeExtra_0.6-26
[10] LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1
[13] RSQLite_0.11.4 splines_3.0.3 stats4_3.0.3
[16] survival_2.37-7 XML_3.98-1.1 xtable_1.7-3
[19] zlibbioc_1.8.0
--
Sent via the guest posting facility at bioconductor.org.