summarizeOverlaps error with hg38
1
0
Entering edit mode
Tarca, Adi ▴ 570
@tarca-adi-1500
Last seen 4 months ago
United States

Hi all,

I try to use summarizeOverlaps function on a GRangesList and get the error below. 

 

 

library(GenomicFeatures)
library(GenomicRanges)
library(Rsamtools)

library(GenomicAlignments)

library(TxDb.Hsapiens.UCSC.hg38.knownGene)

txdb2 <- TxDb.Hsapiens.UCSC.hg38.knownGene

ex=exonsBy(txdb2,"gene")

 

> head(ex)                                  
GRangesList object of length 6:             
$1                                          
GRanges object with 15 ranges and 2 metadata columns:
       seqnames               ranges strand   |   exon_id   exon_name
          <Rle>            <IRanges>  <Rle>   | <integer> <character>
   [1]    chr19 [58346806, 58347029]      -   |    264625        <NA>
   [2]    chr19 [58347353, 58347640]      -   |    264626        <NA>
   [3]    chr19 [58348466, 58349128]      -   |    264627        <NA>
   [4]    chr19 [58349568, 58350651]      -   |    264628        <NA>
   [5]    chr19 [58350370, 58350651]      -   |    264629        <NA>
   ...      ...                  ...    ... ...       ...         ...
  [11]    chr19 [58357585, 58357649]      -   |    264636        <NA>
  [12]    chr19 [58357952, 58358286]      -   |    264637        <NA>
  [13]    chr19 [58358489, 58358585]      -   |    264638        <NA>
  [14]    chr19 [58359197, 58359323]      -   |    264639        <NA>
  [15]    chr19 [58362677, 58362848]      -   |    264640        <NA>

 

bamdir <-"/home/adiatarca/arun/HG19BAMS/" 


 fls <- paste(bamdir,setdiff(dir(bamdir,pattern="*.bam"),dir(bamdir,pattern="*.bam.bai")),sep="")
 bamlst <- BamFileList(fls,yieldSize=1000000)
 register(MulticoreParam(workers = 20))

 genehits <- summarizeOverlaps(ex, bamlst, mode="Union",singleEnd=TRUE, ignore.strand=FALSE)

Error in validObject(.Object) :
  invalid class "€œSummarizedExperiment"€ object: 'rowRanges' length differs from 'assays' nrow

 

 

However, if the txdb2 object below would be obtained via 

txdb2=makeTxDbFromUCSC(
             genome="hg19",
             tablename="knownGene")

the function does not fail. Unfortunately makeTxDbFromUCSC for knownGene does not seem to work with hg38.

Thanks for any suggestions,

Adi

 

 

 

 

> sessionInfo()                                                                                    
R version 3.2.2 (2015-08-14)                                                                       
Platform: x86_64-pc-linux-gnu (64-bit)                                                             
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)                              

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] GenomicAlignments_1.4.2
 [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.1.2
 [3] Rsamtools_1.20.5
 [4] Biostrings_2.36.4
 [5] XVector_0.8.0
 [6] GenomicFeatures_1.20.6
 [7] AnnotationDbi_1.30.1
 [8] Biobase_2.28.0
 [9] GenomicRanges_1.20.8
[10] GenomeInfoDb_1.4.3
[11] IRanges_2.2.9
[12] S4Vectors_0.6.6
[13] BiocGenerics_0.14.0

loaded via a namespace (and not attached):
 [1] zlibbioc_1.14.0      BiocParallel_1.2.22  tools_3.2.2
 [4] DBI_0.3.1            lambda.r_1.1.7       futile.logger_1.4.1
 [7] rtracklayer_1.28.10  futile.options_1.0.0 bitops_1.0-6
[10] RCurl_1.95-4.7       biomaRt_2.24.1       RSQLite_1.0.0
[13] XML_3.98-1.3

 

summarizeOverlaps • 1.5k views
ADD COMMENT
1
Entering edit mode

Are your BAM files aligned against the hg19 or hg38 reference genome? Judging from the name of the data directory, it seems to be hg19. If this is the case, the two coordinate systems, i.e. reference genome coordinates, would differ, which may be the cause of the cryptic error message.

ADD REPLY
0
Entering edit mode

Thanks Julian, this sounds like a good reason for the error. 

ADD REPLY
0
Entering edit mode

Glad to hear that it helped. If your alignments are for hg19, then you would want to compare it to the hg19 gene annotation anyway - which seems to work for you already. You may also ask the maintainers of the summarizeOverlaps function if they want to add a more informative error message, in case the genomes/seqlengths do not match.

ADD REPLY
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Hi Adi,

Other people have reported issues that seem similar to yours before (see Problem with summarizeOverlaps function and Default Rapidr invalid class “SummarizedExperiment”). Unfortunately, I was never able to reproduce them. Any chance you can provide a minimal self-contained reproducible example?

Thanks in advance,

H.

PS: Starting with GenomicFeatures 1.22.12, makeTxDbFromUCSC("hg38", "knownGene") is supported. Note that it retrieves the UCSC "GENCODE v22" track (starting with hg38 there is no more "UCSC Genes" track, see this thread on bioc-devel for more information).

ADD COMMENT

Login before adding your answer.

Traffic: 698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6