Question

danRer10 is missing in geneLenDataBase

0

Entering edit mode

Mehmet Ilyas Cosacak • 0

@mehmet-ilyas-cosacak-9020

Last seen 6.1 years ago

Germany/Dresden/ CRTD - DZNE

danRer10 is missing in "geneLenDataBase". In my analysis, I am always using the current releases of databases. danRer6" is available and is not working for me.

I am using GOSeq for gene ontology analyses and had the error as below:

Can't find danRer10/ensGene length data in genLenDataBase...
Loading required package: rtracklayer
Trying to download from UCSC. This might take a couple of minutes.
Error in getlength(names(DEgenes), genome, id) :
  The gene names specified do not match the gene names for genome danRer10 and ID ensGene.
        Gene names given were: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
        Required gene names are: ENSDART00000145068.2, ENSDART00000010526.9 ...

genelendatabase goseq • 1.7k views

ADD COMMENT • link updated 8.1 years ago by Nadia Davidson ▴ 310 • written 8.2 years ago by Mehmet Ilyas Cosacak • 0

score 0 · Answer 1 · 2016-03-04

0

Entering edit mode

Nadia Davidson ▴ 310

@nadia-davidson-5739

Last seen 5.0 years ago

Australia

Hi,

From the error message, it looks like the gene names might be the problem (e.g. 1,2,3 etc. instead of valid ensembl gene IDs). Are you sure you named your vector of DE genes correctly?

Cheers,

Nadia.

ADD COMMENT • link 8.1 years ago Nadia Davidson ▴ 310

0

Entering edit mode

Hi Nadia,

thanks for the reply. Yes, you are definitely right, the gene names are wrong.

I am using ensembl GeneIDs (e.g., ENSDARG00000000019, ...) in order to calculate number of reads mapped to a gene from RNA-Seq data using featureCounts. Here, GOSeq asks for transcript IDs as you can see below in the error message. Do you have a suggestions for that?

thanks,

ilyas.

Can't find danRer10/ensGene length data in genLenDataBase...
Loading required package: rtracklayer
Trying to download from UCSC. This might take a couple of minutes.
Error in getlength(names(DEgenes), genome, id) :
  The gene names specified do not match the gene names for genome danRer10 and ID ensGene.
        Gene names given were: ENSDARG00000000001, ENSDARG00000000002, ENSDARG00000000018, ENSDARG00000000019, ENSDARG00000000068, ENSDARG00000000069, ENSDARG00000000086, ENSDARG00000000103, ENSDARG00000000142, ENSDARG00000000151
        Required gene names are: ENSDART00000145068.2, ENSDART00000010526.9, ENSDART00000103262.3, ENSDART00000169619.1, ENSDART00000171901.1, ENSDART00000109714.3, ENSDART00000138740.2, ENSDART00000101306.5, ENSDART00000137609.3, ENSDART00000150353.2
>

ADD REPLY • link 8.1 years ago Mehmet Ilyas Cosacak • 0

0

Entering edit mode

Hi Ilyas,

It should be expecting the gene names and not the transcript IDs, so we will fix this in the next release of goseq. We're also planning on updating the geneLenDataBase (finally) with the next release. If you are looking for a solution faster than that, and have the gene lengths in your count table (e.g. as output by featureCounts or other programs), you can pass these to the nullp function through the parameter, bias.data.

Cheers,

Nadia.

ADD REPLY • link 8.1 years ago Nadia Davidson ▴ 310

0

Entering edit mode

Hi Nadia,

Thank you very much!

I solved the problem as below for the moment. Might be helpful!

theSampling <- 10000
txdb <- makeTxDbFromBiomart(biomart="ENSEMBL_MART_ENSEMBL", dataset = "drerio_gene_ensembl", transcript_ids = NULL, circ_seqs=DEFAULT_CIRC_SEQS, filters="", id_prefix="ensembl_", host="www.ensembl.org", port = 80, taxonomyId = NA, miRBaseBuild = NA)

txsByGene <- transcriptsBy(txdb,"gene")
lengthData <- median(width(txsByGene))  
uniGeneList <- resdata$Row.names #data.frame from "res <- results(dds)"
sigDEGenes <- resdata$Row.names[resdata$padj <= fdr_threshold & abs(resdata$log2FoldChange) >= lfc_threshold]
selGenes = as.integer(uniGeneList%in%sigDEGenes)
names(selGenes) = uniGeneList   
pwf <- nullp(selGenes, "danRer10", "ensGene",bias.data = lengthData)
gene2cats <- getgo(names(selGenes), "danRer10", "ensGene")
ontos = c("BP","CC","MF")
GO.wall <- goseq(pwf, "danRer10", "enseGene", gene2cat = gene2cats, test.cats = c("GO:CC", "GO:BP", "GO:MF"), method = "Wallenius", repcnt = theSampling, use_genes_without_cat = TRUE)

GO.samp = goseq(pwf,"danRer10", "ensGene", gene2cat = gene2cats, method = "Sampling", repcnt = theSampling, use_genes_without_cat = TRUE)
GO.nobias = goseq(pwf, "danRer10", "ensGene", gene2cat = gene2cats, method="Hypergeometric", use_genes_without_cat = TRUE)

updating GOSeq and geneLenDataBase will make it easier for beginner of R.

best,

ilyas.

ADD REPLY • link 8.1 years ago Mehmet Ilyas Cosacak • 0