Annotation by CHIPeakAnno tool
0
0
Entering edit mode
Arunima ▴ 10
@2832c5da
Last seen 8 months ago
United States

Currently I am doing an analysis where it is required to annotate 100kb region. It is DNA-sequencing data. I used the ChIPpeakAnno tool to annotate it. I have the following question.

Can anyone suggest me how does the tool assign the 100kb region into the promoter region. I know the distance is calculated from TSS. However, I do not understand how was the promoter region assigned to 100kb regions? I have the sample file attached here.

I used the following function in r to do the annotation

Code should be placed in three backticks as shown below

overlaps.anno <- annotatePeakInBatch(gr3, AnnotationData=annoData  , output="both")

aCR<-assignChromosomeRegion(gr3, nucleotideLevel=FALSE,
precedence=c("Promoters", "immediateDownstream",
"fiveUTRs", "threeUTRs",
"Exons", "Introns"),
TxDb=TxDb.Hsapiens.UCSC.hg19.knownGene)

# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session

sessionInfo( )

ChIPpeakAnno • 674 views
0
Entering edit mode

Hi Arunima,

Can you post the output of your sesssionInfo() and let us know the annoData that you used? The output from annotatePeakInBatch should include additional information such as distance information.

I suggest you set set PeakLocForDistance = "endMinusStart" for strand-specific annotation. With this parameter setting, the end of the peak will be used for calculating the distance to features on plus strand and the start of the peak will be used for calculating the distance to features on minus strand.

Please type ?annotatePeakInBatch for more detailed description of other parameters.

Best regards,

Julie

0
Entering edit mode

Hi Julie

I tried to set PeakLocForDistance = "endMinusStart" , but it shows error. It allows me to either use c("start, middle or end"). It does not give me option to use "endMinusStart".

Thanks Arunima

0
Entering edit mode

Hi Arunima,

Best regards,

Julie

0
Entering edit mode

Hi Julie

Thank you so much for your response. I was able to update R and re-install the package. I used the code to annotate the 100kb region and the graph is attached below. genomicElementDistribution(gr3, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene, promoterRegion=c(upstream=2000, downstream=500), geneDownstream=c(upstream=0, downstream=2000))

I have the following question regarding the graph below. The regions have grouped into 3 different classes as gene-level, exon/intron/intergenic and exon level.

1. Why are they grouped into 3 different classes ?
2. How is gene level defined ?
3. How is exon/intron/intergenic defined?
4. How is exon level defined?

The gene level is further grouped as promoter, downstream and gene body.

5. What does gene body mean ?

6. What does downstream mean ?

I just have one last question. How does the tool annotates the peaks associated with the promoter region of gene ? I am trying to annotate 100 kb region. In the figure below , the FGR gene has been annotated as promoter from the peak. It not only includes the promoter regions but it also other regions .

Thanks Arunima

0
Entering edit mode

Hi Arunima,

All the annotation information and definition are provided by the input. For example, The Txdb you provided has definition of exon/intron/gene/TSS. The other parameters such as promoterRegion and downstream specify the promoter region, downstream and gene body.

In your example, you defined a promoter region as a region between 2000 bp upstream and 500 bp downstream of TSS.

Best regards,

Julie

1
Entering edit mode

Hello Dr. Juli

Thank you so much for your response. It is indeed very helpful tool for the annotation. The table attached gives much clear understanding to the annotation .

Arunima