Hi Raymond @rcavalca,
I was using R-3.4 and annotatr_1.4.0. I randomly checked one region (chr19 43510901 43511100). This region is assigned to two categories (1to5kb and introns) of the same gene (Got1).
I don't understand how a region can be assigned to both 1to5kb and introns of the same gene. Could you help me about this? Thanks.
This is R code named "test.R" I was using:
input = "test.txt"
library(annotatr)
#annots =c('mm10_cpg_islands', 'mm10_cpg_shores', 'mm10_cpg_shelves', 'mm10_cpg_inter')
annots = c('mm10_genes_promoters',
'mm10_genes_5UTRs',
'mm10_genes_3UTRs',
'mm10_genes_cds',
'mm10_genes_introns',
"mm10_genes_1to5kb",
'mm10_genes_intergenic')
# Build the annotations (a single GRanges object)
annotations = build_annotations(genome = 'mm10', annotations = annots)
extraCols = c(diff_meth = 'character', mu0 = 'character', mu1 = 'character')
dm_regions = read_regions(con = input,
genome = 'mm10', extraCols = extraCols, format = 'bed',
rename_name = 'DM_status', rename_score = 'pval')
cat("Flag\n")
dm_annotated = annotate_regions(
regions = dm_regions,
annotations = annotations,
ignore.strand = TRUE,
quiet = FALSE)
med_file = gsub(".txt$", "_geneCDSAnno.tab", input);
write.table(file=med_file, dm_annotated, sep="\t", row.names = F, quote = F)
sessionInfo()
This is the input file "test.txt":
chr19 43510900 43511100 DMW 0.000334260998602243 * -24.1241458253842 * *
This is the output:
seqnames start end width strand DM_status pval diff_meth mu0 mu1 annot.seqnames annot.start annot.end annot.width annot.strand annot.id annot.tx_id annot.gene_id annot.symbol annot.type
chr19 43510901 43511100 200 * DMW 0.000334260998602243 -24.1241458253842 * * chr19 43508435 43512434 4000 - 1to5kb:60288 uc008hoj.1 14718 Got1 mm10_genes_1to5kb
chr19 43510901 43511100 200 * DMW 0.000334260998602243 -24.1241458253842 * * chr19 43507994 43515680 7687 - intron:419960 uc012bmb.1 14718 Got1 mm10_genes_introns
I also got a IGV screenshot here: https://photos.app.goo.gl/JBwZ3cygEVa4pOap1
Got it. Thanks for your help. I got a link for this gene (http://useast.ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000025190;r=19:43499752-43524605). It indeed has multiple transcripts with different TSS.