Search
Question: Unexpected results from annotatr
0
gravatar for xie186
9 months ago by
xie1860
USA
xie1860 wrote:

Hi Raymond @rcavalca,

I was using R-3.4 and annotatr_1.4.0. I randomly checked one region (chr19 43510901 43511100). This region is assigned to two categories (1to5kb and introns) of the same gene (Got1).

I don't understand how a region can be assigned to both 1to5kb and introns of the same gene. Could you help me about this? Thanks.

This is R code named "test.R" I was using:

input = "test.txt"

library(annotatr)
#annots =c('mm10_cpg_islands', 'mm10_cpg_shores', 'mm10_cpg_shelves', 'mm10_cpg_inter')
annots = c('mm10_genes_promoters',
    'mm10_genes_5UTRs',
    'mm10_genes_3UTRs',
    'mm10_genes_cds',
    'mm10_genes_introns',
     "mm10_genes_1to5kb",
    'mm10_genes_intergenic')
# Build the annotations (a single GRanges object)
annotations = build_annotations(genome = 'mm10', annotations = annots)


extraCols = c(diff_meth = 'character', mu0 = 'character', mu1 = 'character')

dm_regions = read_regions(con = input,
                          genome = 'mm10', extraCols = extraCols, format = 'bed',
                          rename_name = 'DM_status', rename_score = 'pval')

cat("Flag\n")
dm_annotated = annotate_regions(
  regions = dm_regions,
  annotations = annotations,
  ignore.strand = TRUE,
  quiet = FALSE)

med_file = gsub(".txt$", "_geneCDSAnno.tab", input);
write.table(file=med_file, dm_annotated, sep="\t", row.names = F, quote = F)

sessionInfo()

This is the input file "test.txt":

chr19    43510900    43511100    DMW    0.000334260998602243    *    -24.1241458253842    *    *

This is the output:

seqnames    start    end    width    strand    DM_status    pval    diff_meth    mu0    mu1    annot.seqnames    annot.start    annot.end    annot.width    annot.strand    annot.id    annot.tx_id    annot.gene_id    annot.symbol    annot.type
chr19    43510901    43511100    200    *    DMW    0.000334260998602243    -24.1241458253842    *    *    chr19    43508435    43512434    4000    -    1to5kb:60288    uc008hoj.1    14718    Got1    mm10_genes_1to5kb
chr19    43510901    43511100    200    *    DMW    0.000334260998602243    -24.1241458253842    *    *    chr19    43507994    43515680    7687    -    intron:419960    uc012bmb.1    14718    Got1    mm10_genes_introns

I also got a IGV screenshot here: https://photos.app.goo.gl/JBwZ3cygEVa4pOap1

ADD COMMENTlink modified 9 months ago by rcavalca130 • written 9 months ago by xie1860
2
gravatar for rcavalca
9 months ago by
rcavalca130
United States
rcavalca130 wrote:

Hello,

The annotations in annotatr include all the transcripts for a particular gene. In this case Got1 has an annotation for knownGene transcript uc008hoj.1 as 1to5kb upstream of a TSS, and it overlaps an intron that transcript uc012bmb.1 has. And in this case, your input region intersects both of them.

The screenshot you provided from IGV is no doubt correct, but does not seem to include all the possible transcripts for the gene, hence the confusion, I think.

Hope that helps,

Raymond

ADD COMMENTlink written 9 months ago by rcavalca130

Got it. Thanks for your help. I got a link for this gene (http://useast.ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000025190;r=19:43499752-43524605).  It indeed has multiple transcripts with different TSS. 

ADD REPLYlink written 9 months ago by xie1860
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 423 users visited in the last hour