dataset TxDb.Mmusculus.UCSC.mm9.knownGene shows other gene_id then genome browser
1
0
Entering edit mode
tonja.r ▴ 80
@tonjar-7565
Last seen 7.5 years ago
United Kingdom

 

I extracted exons from TxDb.Mmusculus.UCSC.mm9.knownGene and org.Mm.eg.db and found out in Rsubread, featureCounts misses some exons and in Genome Browser that some exons do not belong (or are not included in the analysis of featureCounts from rsubread package) to the gene that is specified by those packages.


 

mm9 = TxDb.Mmusculus.UCSC.mm9.knownGene
exon = exons(mm9)
exon_ranges = ranges(exon)

gene_id_exons = select(mm9, keys=as.character(exon$exon_id), columns = c("GENEID","TXNAME"), keytype = "EXONID")
colnames(gene_id_exons) = c("EXONID","ENTREZID","TXNAME")
symbol <- select(org.Mm.eg.db, keys=as.character(unique(gene_id_exons$ENTREZID)), keytype="ENTREZID",
                 columns="SYMBOL")
gene_id_exons = merge(gene_id_exons,symbol,all.x=T)
exon_info =  data.frame(START = start(exon_ranges), END = end(exon_ranges), CHR = seqnames(exon), STRAND = strand(exon),EXONID = exon$exon_id)
exon_info = merge(exon_info,gene_id_exons,all.x=T)

> subset(exon_info, ENTREZID == 497097)
      EXONID   START     END  CHR STRAND ENTREZID     TXNAME SYMBOL
14642   7584 3195985 3197398 chr1      -   497097 uc007aet.1   Xkr4
14643   7585 3203520 3205713 chr1      -   497097 uc007aet.1   Xkr4
14644   7586 3204563 3207049 chr1      -   497097 uc007aeu.1   Xkr4
14645   7587 3411783 3411982 chr1      -   497097 uc007aeu.1   Xkr4
14646   7588 3638392 3640590 chr1      -   497097 uc007aev.1   Xkr4
14647   7589 3648928 3648985 chr1      -   497097 uc007aev.1   Xkr4
14648   7590 3660633 3661579 chr1      -   497097 uc007aeu.1   Xkr4

 

The are 3 transcripts of gene Xkr4: uc007aet.1, uc007aeu.1 and uc007aev.1.

Genome Browser gives me following information: Mouse Gene mKIAA1889 (uc007aet.1), Mouse Gene Xkr4 (uc007aeu.1), Mouse Gene AK149000 (uc007aev.1). However, RefSeq says all 3 are Xkr4. 

I do not know why but only one transcript is included in the in-built version of FeatureCounts of Rsubread package: Rsubread, featureCounts misses some exons. And why does Genome Browser show that they are three different genes?

rsubread featurecounts • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

This isn't the correct place for this question. If you wonder why UCSC is doing something different from NCBI, shouldn't you be asking them?

Do note that we are simply supplying data from public repositories. And given that there are multiple groups that are doing related (but slightly different) things to annotate thousands of genes from various genomes, it is inevitable that there will be differences between the various groups.

 

ADD COMMENT

Login before adding your answer.

Traffic: 516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6