Question

Unexpected results from annotatr

0

Entering edit mode

xie186 • 0

@xie186-11029

Last seen 3.3 years ago

USA

Hi Raymond @rcavalca,

I was using R-3.4 and annotatr_1.4.0. I randomly checked one region (chr19 43510901 43511100). This region is assigned to two categories (1to5kb and introns) of the same gene (Got1).

I don't understand how a region can be assigned to both 1to5kb and introns of the same gene. Could you help me about this? Thanks.

This is R code named "test.R" I was using:

input = "test.txt"

library(annotatr)
#annots =c('mm10_cpg_islands', 'mm10_cpg_shores', 'mm10_cpg_shelves', 'mm10_cpg_inter')
annots = c('mm10_genes_promoters',
    'mm10_genes_5UTRs',
    'mm10_genes_3UTRs',
    'mm10_genes_cds',
    'mm10_genes_introns',
     "mm10_genes_1to5kb",
    'mm10_genes_intergenic')
# Build the annotations (a single GRanges object)
annotations = build_annotations(genome = 'mm10', annotations = annots)


extraCols = c(diff_meth = 'character', mu0 = 'character', mu1 = 'character')

dm_regions = read_regions(con = input,
                          genome = 'mm10', extraCols = extraCols, format = 'bed',
                          rename_name = 'DM_status', rename_score = 'pval')

cat("Flag\n")
dm_annotated = annotate_regions(
  regions = dm_regions,
  annotations = annotations,
  ignore.strand = TRUE,
  quiet = FALSE)

med_file = gsub(".txt$", "_geneCDSAnno.tab", input);
write.table(file=med_file, dm_annotated, sep="\t", row.names = F, quote = F)

sessionInfo()

This is the input file "test.txt":

chr19    43510900    43511100    DMW    0.000334260998602243    *    -24.1241458253842    *    *

This is the output:

seqnames    start    end    width    strand    DM_status    pval    diff_meth    mu0    mu1    annot.seqnames    annot.start    annot.end    annot.width    annot.strand    annot.id    annot.tx_id    annot.gene_id    annot.symbol    annot.type
chr19    43510901    43511100    200    *    DMW    0.000334260998602243    -24.1241458253842    *    *    chr19    43508435    43512434    4000    -    1to5kb:60288    uc008hoj.1    14718    Got1    mm10_genes_1to5kb
chr19    43510901    43511100    200    *    DMW    0.000334260998602243    -24.1241458253842    *    *    chr19    43507994    43515680    7687    -    intron:419960    uc012bmb.1    14718    Got1    mm10_genes_introns

I also got a IGV screenshot here: https://photos.app.goo.gl/JBwZ3cygEVa4pOap1

annotatr • 1.3k views

ADD COMMENT • link updated 6.9 years ago by rcavalca ▴ 140 • written 6.9 years ago by xie186 • 0

score 2 · Accepted Answer · 2018-02-20

2

Entering edit mode

rcavalca ▴ 140

@rcavalca-7718

Last seen 6.0 years ago

United States

Hello,

The annotations in annotatr include all the transcripts for a particular gene. In this case Got1 has an annotation for knownGene transcript uc008hoj.1 as 1to5kb upstream of a TSS, and it overlaps an intron that transcript uc012bmb.1 has. And in this case, your input region intersects both of them.

The screenshot you provided from IGV is no doubt correct, but does not seem to include all the possible transcripts for the gene, hence the confusion, I think.

Hope that helps,

Raymond

ADD COMMENT • link 6.9 years ago rcavalca ▴ 140

0

Entering edit mode

Got it. Thanks for your help. I got a link for this gene (http://useast.ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000025190;r=19:43499752-43524605). It indeed has multiple transcripts with different TSS.

ADD REPLY • link 6.9 years ago xie186 • 0