danRer10 is missing in geneLenDataBase
1
0
Entering edit mode
@mehmet-ilyas-cosacak-9020
Last seen 6.1 years ago
Germany/Dresden/ CRTD - DZNE

danRer10 is missing in "geneLenDataBase". In my analysis, I am always using the current releases of databases. danRer6" is available and is not working for me.

I am using GOSeq for gene ontology analyses and had the error as below:

 

Can't find danRer10/ensGene length data in genLenDataBase...
Loading required package: rtracklayer
Trying to download from UCSC. This might take a couple of minutes.
Error in getlength(names(DEgenes), genome, id) :
  The gene names specified do not match the gene names for genome danRer10 and ID ensGene.
        Gene names given were: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
        Required gene names are: ENSDART00000145068.2, ENSDART00000010526.9 ...
genelendatabase goseq • 1.7k views
ADD COMMENT
0
Entering edit mode
@nadia-davidson-5739
Last seen 5.0 years ago
Australia

Hi,

From the error message, it looks like the gene names might be the problem (e.g. 1,2,3 etc. instead of valid ensembl gene IDs). Are you sure you named your vector of DE genes correctly?

 

Cheers,

Nadia.

ADD COMMENT
0
Entering edit mode

Hi Nadia,

thanks for the reply. Yes, you are definitely right, the gene names are wrong.

I am using ensembl GeneIDs (e.g., ENSDARG00000000019, ...) in order to calculate number of reads mapped to a gene from RNA-Seq data using featureCounts. Here, GOSeq asks for transcript IDs as you can see below in the error message. Do you have a suggestions for that?

thanks,

ilyas.

Can't find danRer10/ensGene length data in genLenDataBase...
Loading required package: rtracklayer
Trying to download from UCSC. This might take a couple of minutes.
Error in getlength(names(DEgenes), genome, id) :
  The gene names specified do not match the gene names for genome danRer10 and ID ensGene.
        Gene names given were: ENSDARG00000000001, ENSDARG00000000002, ENSDARG00000000018, ENSDARG00000000019, ENSDARG00000000068, ENSDARG00000000069, ENSDARG00000000086, ENSDARG00000000103, ENSDARG00000000142, ENSDARG00000000151
        Required gene names are: ENSDART00000145068.2, ENSDART00000010526.9, ENSDART00000103262.3, ENSDART00000169619.1, ENSDART00000171901.1, ENSDART00000109714.3, ENSDART00000138740.2, ENSDART00000101306.5, ENSDART00000137609.3, ENSDART00000150353.2
>
ADD REPLY
0
Entering edit mode

Hi Ilyas,

It should be expecting the gene names and not the transcript IDs, so we will fix this in the next release of goseq. We're also planning on updating the geneLenDataBase (finally) with the next release. If you are looking for a solution faster than that, and have the gene lengths in your count table (e.g. as output by featureCounts or other programs), you can pass these to the nullp function through the parameter, bias.data.

 

Cheers,

Nadia.

 

 

ADD REPLY
0
Entering edit mode

Hi Nadia,

Thank you very much!

I solved the problem as below for the moment. Might be helpful!

theSampling <- 10000
txdb <- makeTxDbFromBiomart(biomart="ENSEMBL_MART_ENSEMBL", dataset = "drerio_gene_ensembl", transcript_ids = NULL, circ_seqs=DEFAULT_CIRC_SEQS, filters="", id_prefix="ensembl_", host="www.ensembl.org", port = 80, taxonomyId = NA, miRBaseBuild = NA)

txsByGene <- transcriptsBy(txdb,"gene")
lengthData <- median(width(txsByGene))  
uniGeneList <- resdata$Row.names #data.frame from "res <- results(dds)"
sigDEGenes <- resdata$Row.names[resdata$padj <= fdr_threshold & abs(resdata$log2FoldChange) >= lfc_threshold]
selGenes = as.integer(uniGeneList%in%sigDEGenes)
names(selGenes) = uniGeneList   
pwf <- nullp(selGenes, "danRer10", "ensGene",bias.data = lengthData)
gene2cats <- getgo(names(selGenes), "danRer10", "ensGene")
ontos = c("BP","CC","MF")
GO.wall <- goseq(pwf, "danRer10", "enseGene", gene2cat = gene2cats, test.cats = c("GO:CC", "GO:BP", "GO:MF"), method = "Wallenius", repcnt = theSampling, use_genes_without_cat = TRUE)

GO.samp = goseq(pwf,"danRer10", "ensGene", gene2cat = gene2cats, method = "Sampling", repcnt = theSampling, use_genes_without_cat = TRUE)
GO.nobias = goseq(pwf, "danRer10", "ensGene", gene2cat = gene2cats, method="Hypergeometric", use_genes_without_cat = TRUE)

updating GOSeq and geneLenDataBase will make it easier for beginner of R.   

best,

ilyas.

 


 

ADD REPLY

Login before adding your answer.

Traffic: 839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6