Question

goseq, warning and strange plot from nullp

0

Entering edit mode

boczniak767 ▴ 740

@maciej-jonczyk-3945

Last seen 3 months ago

Poland

HI,

I'm getting following warning from nullp function from goseq package.

pwf=nullp(gene.vector1, bias.data=length.vec)
Komunikat ostrzegawczy:
W poleceniu 'pcls(G)': initial point very close to some inequality constraints

I've read here goseq: In pcls(G) : initial point very close to some inequality constraints that it is okay if the plot looks reasonably. In my case it is not,

plot

I'm working on maize so I have to prepare input files manually, here is a summary of the files

> gene.vector1[1:5]
Zm00001eb015280 Zm00001eb000610 Zm00001eb033210 Zm00001eb044610 Zm00001eb014360 
              0               0               0               0               0 
> length.vec[1:5]
[1] 1624  385  990 1182  386
> length(gene.vector1)
[1] 44303
> length(length.vec)
[1] 44303

I have 746 DE genes.

Should I be concerned with such result?

Here is my setup

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8    
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=pl_PL.UTF-8   
 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Warsaw
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rtracklayer_1.60.0     GenomicRanges_1.52.0   GenomeInfoDb_1.36.1   
[4] IRanges_2.34.1         S4Vectors_0.38.1       BiocGenerics_0.46.0   
[7] goseq_1.52.0           geneLenDataBase_1.36.0 BiasedUrn_2.0.10      

loaded via a namespace (and not attached):
 [1] KEGGREST_1.40.0             SummarizedExperiment_1.30.2
 [3] rjson_0.2.21                Biobase_2.60.0             
 [5] lattice_0.22-5              vctrs_0.6.5                
 [7] tools_4.3.2                 bitops_1.0-7               
 [9] generics_0.1.3              curl_5.0.1                 
[11] parallel_4.3.2              tibble_3.2.1               
[13] fansi_1.0.6                 AnnotationDbi_1.62.2       
[15] RSQLite_2.3.1               blob_1.2.4                 
[17] pkgconfig_2.0.3             Matrix_1.6-5               
[19] dbplyr_2.3.3                lifecycle_1.0.4            
[21] GenomeInfoDbData_1.2.10     compiler_4.3.2             
[23] stringr_1.5.0               Rsamtools_2.16.0           
[25] Biostrings_2.68.1           progress_1.2.2             
[27] codetools_0.2-19            RCurl_1.98-1.12            
[29] yaml_2.3.7                  GO.db_3.17.0               
[31] pillar_1.9.0                crayon_1.5.2               
[33] BiocParallel_1.34.2         DelayedArray_0.26.6        
[35] cachem_1.0.8                nlme_3.1-163               
[37] tidyselect_1.2.0            digest_0.6.34              
[39] stringi_1.7.12              dplyr_1.1.2                
[41] restfulr_0.0.15             splines_4.3.2              
[43] biomaRt_2.56.1              fastmap_1.1.1              
[45] grid_4.3.2                  cli_3.6.2                  
[47] magrittr_2.0.3              S4Arrays_1.0.4             
[49] GenomicFeatures_1.52.1      utf8_1.2.4                 
[51] XML_3.99-0.14               rappdirs_0.3.3             
[53] filelock_1.0.2              prettyunits_1.1.1          
[55] bit64_4.0.5                 XVector_0.40.0             
[57] httr_1.4.6                  matrixStats_1.0.0          
[59] bit_4.0.5                   png_0.1-8                  
[61] hms_1.1.3                   memoise_2.0.1              
[63] BiocIO_1.10.0               BiocFileCache_2.8.0        
[65] mgcv_1.9-1                  rlang_1.1.3                
[67] glue_1.7.0                  DBI_1.1.3                  
[69] xml2_1.3.5                  R6_2.5.1                   
[71] MatrixGenerics_1.12.2       GenomicAlignments_1.36.0   
[73] zlibbioc_1.46.0

goseq • 1.4k views

ADD COMMENT • link 23 months ago boczniak767 ▴ 740

score 0 · Answer 1 · 2024-01-31

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 21 minutes ago

WEHI, Melbourne, Australia

The plot looks a good as is possible given the data. It appears that goseq is working correctly and the warning can be ignored.

ADD COMMENT • link 23 months ago Gordon Smyth 53k

0

Entering edit mode

Thanks, but the points suggest something like parabola and the plot has standard flat shape after initial rising.

ADD REPLY • link 23 months ago boczniak767 ▴ 740

1

Entering edit mode

The fact that the points look like a parabola suggests a problem with your data or the model you have fitted. It is not a problem caused by goseq.

goseq always fits monotonic increasing curves because that is the only shape of curve that can be caused by gene-length bias. In this case, the fitted curve that you see is the best possible monotonic curve compatible with your data. goseq does not fit parabolic curves because it is scientifically impossible for such a curve to arise from gene-length bias. The principles of goseq are explained in the documentation and published paper.