goseq, warning and strange plot from nullp
1
0
Entering edit mode
boczniak767 ▴ 740
@maciej-jonczyk-3945
Last seen 11 weeks ago
Poland

HI,

I'm getting following warning from nullp function from goseq package.

pwf=nullp(gene.vector1, bias.data=length.vec)
Komunikat ostrzegawczy:
W poleceniu 'pcls(G)': initial point very close to some inequality constraints

I've read here goseq: In pcls(G) : initial point very close to some inequality constraints that it is okay if the plot looks reasonably. In my case it is not,

plot

I'm working on maize so I have to prepare input files manually, here is a summary of the files

> gene.vector1[1:5]
Zm00001eb015280 Zm00001eb000610 Zm00001eb033210 Zm00001eb044610 Zm00001eb014360 
              0               0               0               0               0 
> length.vec[1:5]
[1] 1624  385  990 1182  386
> length(gene.vector1)
[1] 44303
> length(length.vec)
[1] 44303

I have 746 DE genes.

Should I be concerned with such result?

Here is my setup

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8    
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=pl_PL.UTF-8   
 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Warsaw
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rtracklayer_1.60.0     GenomicRanges_1.52.0   GenomeInfoDb_1.36.1   
[4] IRanges_2.34.1         S4Vectors_0.38.1       BiocGenerics_0.46.0   
[7] goseq_1.52.0           geneLenDataBase_1.36.0 BiasedUrn_2.0.10      

loaded via a namespace (and not attached):
 [1] KEGGREST_1.40.0             SummarizedExperiment_1.30.2
 [3] rjson_0.2.21                Biobase_2.60.0             
 [5] lattice_0.22-5              vctrs_0.6.5                
 [7] tools_4.3.2                 bitops_1.0-7               
 [9] generics_0.1.3              curl_5.0.1                 
[11] parallel_4.3.2              tibble_3.2.1               
[13] fansi_1.0.6                 AnnotationDbi_1.62.2       
[15] RSQLite_2.3.1               blob_1.2.4                 
[17] pkgconfig_2.0.3             Matrix_1.6-5               
[19] dbplyr_2.3.3                lifecycle_1.0.4            
[21] GenomeInfoDbData_1.2.10     compiler_4.3.2             
[23] stringr_1.5.0               Rsamtools_2.16.0           
[25] Biostrings_2.68.1           progress_1.2.2             
[27] codetools_0.2-19            RCurl_1.98-1.12            
[29] yaml_2.3.7                  GO.db_3.17.0               
[31] pillar_1.9.0                crayon_1.5.2               
[33] BiocParallel_1.34.2         DelayedArray_0.26.6        
[35] cachem_1.0.8                nlme_3.1-163               
[37] tidyselect_1.2.0            digest_0.6.34              
[39] stringi_1.7.12              dplyr_1.1.2                
[41] restfulr_0.0.15             splines_4.3.2              
[43] biomaRt_2.56.1              fastmap_1.1.1              
[45] grid_4.3.2                  cli_3.6.2                  
[47] magrittr_2.0.3              S4Arrays_1.0.4             
[49] GenomicFeatures_1.52.1      utf8_1.2.4                 
[51] XML_3.99-0.14               rappdirs_0.3.3             
[53] filelock_1.0.2              prettyunits_1.1.1          
[55] bit64_4.0.5                 XVector_0.40.0             
[57] httr_1.4.6                  matrixStats_1.0.0          
[59] bit_4.0.5                   png_0.1-8                  
[61] hms_1.1.3                   memoise_2.0.1              
[63] BiocIO_1.10.0               BiocFileCache_2.8.0        
[65] mgcv_1.9-1                  rlang_1.1.3                
[67] glue_1.7.0                  DBI_1.1.3                  
[69] xml2_1.3.5                  R6_2.5.1                   
[71] MatrixGenerics_1.12.2       GenomicAlignments_1.36.0   
[73] zlibbioc_1.46.0
goseq • 829 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

The plot looks a good as is possible given the data. It appears that goseq is working correctly and the warning can be ignored.

ADD COMMENT
0
Entering edit mode

Thanks, but the points suggest something like parabola and the plot has standard flat shape after initial rising.

ADD REPLY
1
Entering edit mode

The fact that the points look like a parabola suggests a problem with your data or the model you have fitted. It is not a problem caused by goseq.

goseq always fits monotonic increasing curves because that is the only shape of curve that can be caused by gene-length bias. In this case, the fitted curve that you see is the best possible monotonic curve compatible with your data. goseq does not fit parabolic curves because it is scientifically impossible for such a curve to arise from gene-length bias. The principles of goseq are explained in the documentation and published paper.

ADD REPLY
0
Entering edit mode

Thank you, I'm happy that this is only test data.

ADD REPLY

Login before adding your answer.

Traffic: 427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6