Question

Error when running goseq (error in makespline)

0

Entering edit mode

suzy.stiegelmeyer@syngenta.com ▴ 40

@suzystiegelmeyersyngentacom-5940

Last seen 9.1 years ago

United States

Hi,

I am getting the following error when running nullp

Error in if (min(fv) < lower_bound) fv = fv - min(fv) + lower_bound :
missing value where TRUE/FALSE needed

I've traced the problem to makespline.R line 53. pcls(G) is return NaNs. FYI, line 17 (if(hi<=low) {) is true so the reflection occurs. If I change the nKnots parameter from 6 to 7, there are no errors. I don't know if this modification is a fix for the true problem. Am I using nullp correctly?

Here is the code:

genes=rep(0,length(allgenes))
  names(genes)=allgenes
  genes[gsubset]=1
pwf=nullp(genes,bias.data=biasdata$length,plot.fit=F)

Input data. I know this isn't enough data to debug the problem but I'm not sure how to attach it to this message. Sorry.

> table(genes)
genes
    0     1 
43951    22 
> biasdata$length[genes==1]
 [1] 10137  5278  3742  6833  1150  4303  6000  6294  4182  7910  3593 10336  2842  4232  2797  3267  2865  1208  6328  5968
[21] 21117  3467

Thanks in advance!

Suzy

Here's my sessionInfo():

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mgcv_1.8-5              nlme_3.1-118            RCytoscape_1.16.0       XMLRPC_0.3-0            graph_1.44.0           
 [6] goseq_1.18.0            AnnotationDbi_1.28.1    RSQLite_1.0.0           DBI_0.3.1               geneLenDataBase_1.1.1  
[11] BiasedUrn_1.06.1        gtools_3.4.1            rgl_0.95.1158           WGCNA_1.41-1            flashClust_1.01-2      
[16] dynamicTreeCut_1.62     gplots_2.15.0           EDASeq_2.0.0            ShortRead_1.24.0        GenomicAlignments_1.2.1
[21] Rsamtools_1.18.2        GenomicRanges_1.18.3    GenomeInfoDb_1.2.3      Biostrings_2.34.0       XVector_0.6.0          
[26] IRanges_2.0.0           S4Vectors_0.4.0         BiocParallel_1.0.0      Biobase_2.26.0          BiocGenerics_0.12.1    
[31] edgeR_3.8.5             limma_3.22.1           

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3        annotate_1.44.0        aroma.light_2.2.0      base64enc_0.1-2        BatchJobs_1.5         
 [6] BBmisc_1.8             biomaRt_2.22.0         bitops_1.0-6           brew_1.0-6             caTools_1.17.1        
[11] checkmate_1.5.0        cluster_1.15.3         codetools_0.2-9        DESeq_1.18.0           digest_0.6.4          
[16] doParallel_1.0.8       fail_1.2               foreach_1.4.2          foreign_0.8-61         Formula_1.1-2         
[21] gdata_2.13.3           genefilter_1.48.1      geneplotter_1.44.0     GenomicFeatures_1.18.2 GO.db_3.0.0           
[26] grid_3.1.2             Hmisc_3.14-6           hwriter_1.3.2          impute_1.40.0          iterators_1.0.7       
[31] KernSmooth_2.23-13     lattice_0.20-29        latticeExtra_0.6-26    Matrix_1.1-4           matrixStats_0.10.3    
[36] nnet_7.3-8             plyr_1.8.1             R.methodsS3_1.6.1      R.oo_1.18.0            R.utils_1.34.0        
[41] RColorBrewer_1.0-5     Rcpp_0.11.3            RCurl_1.95-4.4         reshape_0.8.5          rpart_4.1-8           
[46] rtracklayer_1.26.2     sendmailR_1.2-1        splines_3.1.2          stringr_0.6.2          survival_2.37-7       
[51] tools_3.1.2            XML_3.98-1.1           xtable_1.7-4           zlibbioc_1.12.0

software error • 1.1k views

ADD COMMENT • link updated 9.1 years ago by Gordon Smyth 50k • written 9.1 years ago by suzy.stiegelmeyer@syngenta.com ▴ 40

score 0 · Answer 1 · 2015-03-19

Hi Suzy,

It looks as though you don’t have enough differentially expressed genes to reliably fit the bias distribution (22 is a bit low. Usually there would be 100s to 1000s). In your situation I would do one of the following two things:

- depending on how you defined the differentially expressed genes, you could loosen the definition to include more genes. You may want to do this anyway because it’s unlikely you will find any GO enrichment with just 22 genes.

- or you could assume there is no length bias and run goseq with method=“Hypergeometric"

Cheers,

Nadia.

score 0 · Answer 2 · 2015-03-22

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Did you use edgeR or limma to get the differentially expressed genes? If so, you might try to goana() function, which gives some of the same functionality as the goseq package but without needing the nullp() function. It will work even for very small numbers of DE genes. Try typing ?goana.DGELRT or ?goana.MArrayLM.

ADD COMMENT • link 9.1 years ago Gordon Smyth 50k