Question: getGeneLengthAndGCContent: "zero or more than one input sequence"
1
gravatar for mark.ebbert
16 months ago by
mark.ebbert0 wrote:

Hi,

I'm trying to use getGeneLengthAndGCContent to normalize some RNASeq data. My data was aligned to hg38 and I used featureCounts to aggregate by Ensembl gene ID (GRCh38 v. 87). I used the following call:

> hsa.len.gc <- getGeneLengthAndGCContent(id=rownames(counts.no.sex), org="hsa", mode=c("biomart")) 

I received the following error:

NAs produced by integer overflowError in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),  : 
  zero or more than one input sequence

Oddly, when I ran it a second time, the error changed a bit, but the same result:

Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),  : 
  zero or more than one input sequence
In addition: Warning message:
In nchar(str, "bytes") * 4L : NAs produced by integer overflow

I then switched to org.db mode with the following call to see if it could map Ensembl IDs:

> hsa.len.gc <- getGeneLengthAndGCContent(id=rownames(counts.no.sex), org="hg38", mode=c("org.db")) 

This completed without errors, but most of the genes came back as NA:

> summary(hsa.len.gc)
     length             gc       
 Min.   :    41   Min.   :0.20   
 1st Qu.:  1800   1st Qu.:0.45   
 Median :  3582   Median :0.51   
 Mean   :  4566   Mean   :0.51   
 3rd Qu.:  6144   3rd Qu.:0.57   
 Max.   :156366   Max.   :0.93   
 NA's   :33901    NA's   :33901

Seems I need to get the biomart version working. I suspect the issue is related to many:1 mappings. Does anyone know how to fix this?

Really appreciate your help. 

Reproducible example set

Here is a link to the smallest subset of Ensembl IDs I could get to fail: https://www.dropbox.com/s/gthmo1rb5lcrbvr/gene_ids.txt?dl=0

> tmp<-read.delim("gene_ids.txt", header=FALSE)
> head(tmp)
               V1
1 ENSG00000243477
2 ENSG00000114378
3 ENSG00000068001
4 ENSG00000114383
5 ENSG00000068028
6 ENSG00000281358
> hsa.len.gc <- getGeneLengthAndGCContent(id=tmp$V1, org="hsa", mode=c("biomart"))
Connecting to BioMart ...
Downloading sequences ...
This may take a few minutes ...
Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),  : 
  zero or more than one input sequence
ADD COMMENTlink modified 15 months ago by davide risso810 • written 16 months ago by mark.ebbert0

It's hard to say what's going on without knowing what are your row names.

Can you please provide an example for us to reproduce it and diagnose the problem? For instance, would you be able to share the row names that you're using. How many are they? Is the error still there if you apply the function to only the first 10 genes? 100? 1000?

Please share the smallest possible reproducible example that produces the error.

ADD REPLYlink modified 16 months ago • written 16 months ago by davide risso810

@daviderisso, I apologize for the slow response. I've been trying to generate a *small* reproducible example. So far, the smallest group I've been able to find is 5000 Ensemble gene IDs. Let me see if I can narrow it down more.

ADD REPLYlink written 15 months ago by mark.ebbert0

@daviderisso, I updated the post to include the smallest subset (with code) I could get to fail. Will that work?

ADD REPLYlink written 15 months ago by mark.ebbert0
Answer: getGeneLengthAndGCContent: "zero or more than one input sequence"
1
gravatar for davide risso
15 months ago by
davide risso810
Weill Cornell Medicine
davide risso810 wrote:

Hi Mark,

I've just tested your code and it works on my machine.

Here's what I did:

library(EDASeq)
tmp <- read.delim("gene_ids.txt", header=FALSE)
hsa.len.gc <- getGeneLengthAndGCContent(id=tmp$V1, org="hsa", mode=c("biomart"))

and the resulting object

> summary(hsa.len.gc)
     length            gc        
 Min.   :   23   Min.   :0.1633  
 1st Qu.:  406   1st Qu.:0.3942  
 Median :  897   Median :0.4347  
 Mean   : 2341   Mean   :0.4477  
 3rd Qu.: 3089   3rd Qu.:0.4910  
 Max.   :42646   Max.   :0.8636  
 NA's   :40      NA's   :40

Are you using the latest versions of EDASeq and biomaRt? I'm using EDASeq 2.12.0 and biomaRt 2.34.0.

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] EDASeq_2.12.0              ShortRead_1.36.0          
 [3] GenomicAlignments_1.14.1   SummarizedExperiment_1.8.0
 [5] DelayedArray_0.4.1         matrixStats_0.52.2        
 [7] Rsamtools_1.30.0           GenomicRanges_1.30.0      
 [9] GenomeInfoDb_1.14.0        Biostrings_2.46.0         
[11] XVector_0.18.0             IRanges_2.12.0            
[13] S4Vectors_0.16.0           BiocParallel_1.12.0       
[15] Biobase_2.38.0             BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] genefilter_1.60.0       progress_1.1.2          splines_3.4.2          
 [4] lattice_0.20-35         rtracklayer_1.38.0      GenomicFeatures_1.30.0 
 [7] blob_1.1.0              XML_3.98-1.9            survival_2.41-3        
[10] rlang_0.1.4             R.oo_1.21.0             DBI_0.7                
[13] R.utils_2.6.0           bit64_0.9-7             aroma.light_3.8.0      
[16] RColorBrewer_1.1-2      GenomeInfoDbData_0.99.1 stringr_1.2.0          
[19] zlibbioc_1.24.0         hwriter_1.3.2           R.methodsS3_1.7.1      
[22] memoise_1.1.0           latticeExtra_0.6-28     geneplotter_1.56.0     
[25] biomaRt_2.34.0          AnnotationDbi_1.40.0    Rcpp_0.12.14           
[28] xtable_1.8-2            annotate_1.56.1         bit_1.1-12             
[31] RMySQL_0.10.13          digest_0.6.12           stringi_1.1.6          
[34] DESeq_1.30.0            grid_3.4.2              tools_3.4.2            
[37] bitops_1.0-6            magrittr_1.5            RCurl_1.95-4.8         
[40] RSQLite_2.0             tibble_1.3.4            Matrix_1.2-12          
[43] prettyunits_1.0.2       assertthat_0.2.0        R6_2.2.2               
[46] compiler_3.4.2
ADD COMMENTlink written 15 months ago by davide risso810

My apologies. Seems version was the problem. Not sure why because I installed everything fresh just a few months ago.

ADD REPLYlink written 15 months ago by mark.ebbert0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 338 users visited in the last hour