Search
Question: Problems with GOseq
1
2.2 years ago by
webquelzinhablue10 wrote:

I am writing because of some warnings that appeared when running the nullp command. It seems GOseq cannot find the gene lenghts for my data ('hg38','ensGene') in genLenDataBase. I installed the TxDb.Hsapiens.UCSC.hg38.knownGene package and it seemed to help a little. However, some errors still apear (see below). Could you please, help me to solve this problems?  I did not used all the differentially expressed genes. Instead, for the analysis, I used a list of DEGs of my interest plus all non DEG.

> pwf=nullp(genes,'hg38','ensGene', bias.data=NULL, plot.fit = TRUE)
Can't find hg38/ensGene length data in genLenDataBase...
Found the annotaion package, TxDb.Hsapiens.UCSC.hg38.knownGene
Trying to get the gene lengths from it.
Warning messages:
1: In library() :
bibliotecas ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ não contém pacotes
2: In getlength(names(DEgenes), genome, id) :
More than 40% of gene names specified did not match the gene names for genome hg38 and ID ensGene.  No length data will be available for these genes.
Gene names which failed to match were: ENSG00000002079, ENSG00000018607, ENSG00000020219, ENSG00000067601, ENSG00000078319, ENSG00000083622, ENSG00000088340, ENSG00000093100, ENSG00000101278, ENSG00000101898
Required gene names are: ENSG00000000003, ENSG00000000005, ENSG00000000419, ENSG00000000457, ENSG00000000460, ENSG00000000938, ENSG00000000971, ENSG00000001036, ENSG00000001084, ENSG00000001167
3: In pcls(G) : initial point very close to some inequality constraints

Thank you!

modified 2.2 years ago by Nadia Davidson270 • written 2.2 years ago by webquelzinhablue10
1
2.2 years ago by
Australia

Hi,

This is probably happening because there are fewer "knownGene"s than "ensGene"s. How does you pwf graph look? You may still have enough gene lengths for the bias weighting. Otherwise I would suggest that you use all your genes (you could try treating the DEGs not of interest as non DEGs). You could also pull out the gene lengths from the count tables if you uses featureCounts etc. to generate them. You can then supply the gene lengths to nullp as one of the function options.

Cheers,