dataset TxDb.Mmusculus.UCSC.mm9.knownGene showing exons with gene_id but there is no such gene
1
0
Entering edit mode
tonja.r ▴ 80
@tonjar-7565
Last seen 8.1 years ago
United Kingdom

I have found some strange thing in this dataset. Namely, I have exons that correspond to one gene_id, but there is no gene with such gene_id in a dataset.

mm9 = TxDb.Mmusculus.UCSC.mm9.knownGene

exon dataset has a gene with ID 100038977

exon = exons(mm9)
gene_id_exons = select(mm9, keys=as.character(exon$exon_id), columns = c("GENEID"), keytype = "EXONID")
> gene_id_exons[which(gene_id_exons$GENEID == "100038977"),][1:4,]
       EXONID    GENEID
243122 241618 100038977
243123 241619 100038977
243124 241620 100038977
243125 241621 100038977

gene dataset does not have a gene with such ID

gene<-genes(mm9)
> which(gene$gene_id == "100038977")
integer(0)

Why are there exons that belong to the 100038977 gene (Gm1993) but there is no such gene listed in the gene dataset?

 

The same happens with gene_ids 100039550 (Gm10486),100039890 (Gm15093),100039939 (Gm2506), 100040048 (Ccl27b), 100040631 etc

 

annotation • 768 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States
> select(TxDb.Mmusculus.UCSC.mm9.knownGene, "100038977", c("GENEID","EXONID"), "GENEID")
      GENEID EXONID
1  100038977 241626
2  100038977 241625
3  100038977 241624
4  100038977 241623
5  100038977 241622
6  100038977 241621
7  100038977 241620
8  100038977 241619
9  100038977 241618
10 100038977 245890
11 100038977 245891
12 100038977 245892
13 100038977 245893
14 100038977 245894
15 100038977 245895
16 100038977 245896
17 100038977 245897
18 100038977 245898

> gns <- genes(TxDb.Mmusculus.UCSC.mm9.knownGene)

> gns["100038977",]
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript contains invalid names

> gns <- genes(TxDb.Mmusculus.UCSC.mm9.knownGene, single.strand.genes.only = FALSE)
> gns["100038977",]
GRangesList object of length 1:
$100038977
GRanges object with 2 ranges and 0 metadata columns:
         seqnames               ranges strand
            <Rle>            <IRanges>  <Rle>
  [1]        chrX [24975422, 24997578]      -
  [2] chrX_random [ 1375833,  1397989]      +

 

ADD COMMENT

Login before adding your answer.

Traffic: 538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6