Question

Problems with GOseq

1

Entering edit mode

webquelzinhablue ▴ 10

@webquelzinhablue-11485

Last seen 7.6 years ago

I am writing because of some warnings that appeared when running the nullp command. It seems GOseq cannot find the gene lenghts for my data ('hg38','ensGene') in genLenDataBase. I installed the TxDb.Hsapiens.UCSC.hg38.knownGene package and it seemed to help a little. However, some errors still apear (see below). Could you please, help me to solve this problems? I did not used all the differentially expressed genes. Instead, for the analysis, I used a list of DEGs of my interest plus all non DEG.

> pwf=nullp(genes,'hg38','ensGene', bias.data=NULL, plot.fit = TRUE)
Can't find hg38/ensGene length data in genLenDataBase...
Found the annotaion package, TxDb.Hsapiens.UCSC.hg38.knownGene
Trying to get the gene lengths from it.
Warning messages:
1: In library() :
  bibliotecas ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ não contém pacotes
2: In getlength(names(DEgenes), genome, id) :
  More than 40% of gene names specified did not match the gene names for genome hg38 and ID ensGene.  No length data will be available for these genes.
	Gene names which failed to match were: ENSG00000002079, ENSG00000018607, ENSG00000020219, ENSG00000067601, ENSG00000078319, ENSG00000083622, ENSG00000088340, ENSG00000093100, ENSG00000101278, ENSG00000101898
	Required gene names are: ENSG00000000003, ENSG00000000005, ENSG00000000419, ENSG00000000457, ENSG00000000460, ENSG00000000938, ENSG00000000971, ENSG00000001036, ENSG00000001084, ENSG00000001167
3: In pcls(G) : initial point very close to some inequality constraints

Thank you!

goseq • 2.2k views

ADD COMMENT • link updated 7.6 years ago by Nadia Davidson ▴ 310 • written 7.6 years ago by webquelzinhablue ▴ 10

score 1 · Answer 1 · 2016-09-20

Hi,

This is probably happening because there are fewer "knownGene"s than "ensGene"s. How does you pwf graph look? You may still have enough gene lengths for the bias weighting. Otherwise I would suggest that you use all your genes (you could try treating the DEGs not of interest as non DEGs). You could also pull out the gene lengths from the count tables if you uses featureCounts etc. to generate them. You can then supply the gene lengths to nullp as one of the function options.

Cheers,

Nadia.