getGeneLengthAndGCContent: "zero or more than one input sequence"
1
1
Entering edit mode
@markebbert-14120
Last seen 6.4 years ago

Hi,

I'm trying to use getGeneLengthAndGCContent to normalize some RNASeq data. My data was aligned to hg38 and I used featureCounts to aggregate by Ensembl gene ID (GRCh38 v. 87). I used the following call:

> hsa.len.gc <- getGeneLengthAndGCContent(id=rownames(counts.no.sex), org="hsa", mode=c("biomart")) 

I received the following error:

NAs produced by integer overflowError in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),  : 
  zero or more than one input sequence

Oddly, when I ran it a second time, the error changed a bit, but the same result:

Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),  : 
  zero or more than one input sequence
In addition: Warning message:
In nchar(str, "bytes") * 4L : NAs produced by integer overflow

I then switched to org.db mode with the following call to see if it could map Ensembl IDs:

> hsa.len.gc <- getGeneLengthAndGCContent(id=rownames(counts.no.sex), org="hg38", mode=c("org.db")) 

This completed without errors, but most of the genes came back as NA:

> summary(hsa.len.gc)
     length             gc       
 Min.   :    41   Min.   :0.20   
 1st Qu.:  1800   1st Qu.:0.45   
 Median :  3582   Median :0.51   
 Mean   :  4566   Mean   :0.51   
 3rd Qu.:  6144   3rd Qu.:0.57   
 Max.   :156366   Max.   :0.93   
 NA's   :33901    NA's   :33901

Seems I need to get the biomart version working. I suspect the issue is related to many:1 mappings. Does anyone know how to fix this?

Really appreciate your help. 

Reproducible example set

Here is a link to the smallest subset of Ensembl IDs I could get to fail: https://www.dropbox.com/s/gthmo1rb5lcrbvr/gene_ids.txt?dl=0

> tmp<-read.delim("gene_ids.txt", header=FALSE)
> head(tmp)
               V1
1 ENSG00000243477
2 ENSG00000114378
3 ENSG00000068001
4 ENSG00000114383
5 ENSG00000068028
6 ENSG00000281358
> hsa.len.gc <- getGeneLengthAndGCContent(id=tmp$V1, org="hsa", mode=c("biomart"))
Connecting to BioMart ...
Downloading sequences ...
This may take a few minutes ...
Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),  : 
  zero or more than one input sequence
edaseq normalization hg38 biomart org.db • 2.7k views
ADD COMMENT
0
Entering edit mode

It's hard to say what's going on without knowing what are your row names.

Can you please provide an example for us to reproduce it and diagnose the problem? For instance, would you be able to share the row names that you're using. How many are they? Is the error still there if you apply the function to only the first 10 genes? 100? 1000?

Please share the smallest possible reproducible example that produces the error.

ADD REPLY
0
Entering edit mode

@daviderisso, I apologize for the slow response. I've been trying to generate a *small* reproducible example. So far, the smallest group I've been able to find is 5000 Ensemble gene IDs. Let me see if I can narrow it down more.

ADD REPLY
0
Entering edit mode

@daviderisso, I updated the post to include the smallest subset (with code) I could get to fail. Will that work?

ADD REPLY
1
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 8 months ago
University of Padova

Hi Mark,

I've just tested your code and it works on my machine.

Here's what I did:

library(EDASeq)
tmp <- read.delim("gene_ids.txt", header=FALSE)
hsa.len.gc <- getGeneLengthAndGCContent(id=tmp$V1, org="hsa", mode=c("biomart"))

and the resulting object

> summary(hsa.len.gc)
     length            gc        
 Min.   :   23   Min.   :0.1633  
 1st Qu.:  406   1st Qu.:0.3942  
 Median :  897   Median :0.4347  
 Mean   : 2341   Mean   :0.4477  
 3rd Qu.: 3089   3rd Qu.:0.4910  
 Max.   :42646   Max.   :0.8636  
 NA's   :40      NA's   :40

Are you using the latest versions of EDASeq and biomaRt? I'm using EDASeq 2.12.0 and biomaRt 2.34.0.

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] EDASeq_2.12.0              ShortRead_1.36.0          
 [3] GenomicAlignments_1.14.1   SummarizedExperiment_1.8.0
 [5] DelayedArray_0.4.1         matrixStats_0.52.2        
 [7] Rsamtools_1.30.0           GenomicRanges_1.30.0      
 [9] GenomeInfoDb_1.14.0        Biostrings_2.46.0         
[11] XVector_0.18.0             IRanges_2.12.0            
[13] S4Vectors_0.16.0           BiocParallel_1.12.0       
[15] Biobase_2.38.0             BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] genefilter_1.60.0       progress_1.1.2          splines_3.4.2          
 [4] lattice_0.20-35         rtracklayer_1.38.0      GenomicFeatures_1.30.0 
 [7] blob_1.1.0              XML_3.98-1.9            survival_2.41-3        
[10] rlang_0.1.4             R.oo_1.21.0             DBI_0.7                
[13] R.utils_2.6.0           bit64_0.9-7             aroma.light_3.8.0      
[16] RColorBrewer_1.1-2      GenomeInfoDbData_0.99.1 stringr_1.2.0          
[19] zlibbioc_1.24.0         hwriter_1.3.2           R.methodsS3_1.7.1      
[22] memoise_1.1.0           latticeExtra_0.6-28     geneplotter_1.56.0     
[25] biomaRt_2.34.0          AnnotationDbi_1.40.0    Rcpp_0.12.14           
[28] xtable_1.8-2            annotate_1.56.1         bit_1.1-12             
[31] RMySQL_0.10.13          digest_0.6.12           stringi_1.1.6          
[34] DESeq_1.30.0            grid_3.4.2              tools_3.4.2            
[37] bitops_1.0-6            magrittr_1.5            RCurl_1.95-4.8         
[40] RSQLite_2.0             tibble_1.3.4            Matrix_1.2-12          
[43] prettyunits_1.0.2       assertthat_0.2.0        R6_2.2.2               
[46] compiler_3.4.2
ADD COMMENT
0
Entering edit mode

My apologies. Seems version was the problem. Not sure why because I installed everything fresh just a few months ago.

ADD REPLY

Login before adding your answer.

Traffic: 828 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6