Question: ChIPpeakAnno, MACS format annotation
0
gravatar for Khademul Islam
2.1 years ago by
Khademul Islam30 wrote:

Hi,

I just have installed latest ChIPpeakAnno and tried example code and data. But got error. Same error with my data as well. How to solve this?

# Just another question: when it annotate to nearest TSS, does it use Summit or Start position from MACS file?


https://bioconductor.org/packages/devel/bioc/vignettes/ChIPpeakAnno/inst/doc/ChIPpeakAnno.html

macs <- system.file("extdata", "MACS_peaks.xls", package="ChIPpeakAnno")

macsOutput <- toGRanges(macs, format="MACS")

duplicated or NA names found. Rename all the names by numbers.

Many thanks,

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 24 (Workstation Edition)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils   
 [8] datasets  methods   base    

other attached packages:
 [1] EnsDb.Hsapiens.v75_2.1.0 ensembldb_1.6.2          GenomicFeatures_1.26.0 
 [4] AnnotationDbi_1.36.0     Biobase_2.34.0           ChIPpeakAnno_3.8.9     
 [7] VennDiagram_1.6.17       futile.logger_1.4.3      GenomicRanges_1.26.1   
[10] GenomeInfoDb_1.10.1      Biostrings_2.42.1        XVector_0.14.0         
[13] IRanges_2.8.1            S4Vectors_0.12.1         BiocGenerics_0.20.0    

chippeakanno bioconductor • 629 views
ADD COMMENTlink modified 3 months ago by Julie Zhu4.0k • written 2.1 years ago by Khademul Islam30
Answer: ChIPpeakAnno, MACS format annotation
2
gravatar for Ou, Jianhong
2.1 years ago by
Ou, Jianhong1.1k
United States
Ou, Jianhong1.1k wrote:

Hi,

Thanks for selecting ChIPpeakAnno as your annotation tool.

First question, that is a warning. I am consider to change it to a message. That message tells you the function could not find peak name or there are duplicated peak names. And the toGRanges function will automatically give a name for each peak. 

When it annotate to nearest TSS by default, it use start position for calculation.

Let me know if you still have any question.

ADD COMMENTlink written 2.1 years ago by Ou, Jianhong1.1k

Hi,

I am trying to make a custom annotation file to use with ChIPpeakAnno. I am starting with an Ensembl GTF file. The following command gives the error: duplicated or NA names found. Rename all the names by numbers.

annoData <- toGRanges(gff, format="GFF")

Which part of the GTF file does it not like?

If I run annotatePeakInBatch using this file:

annotatedPeak <- annotatePeakInBatch(myPeakList=peaks, AnnotationData=annoData, ignore.strand=TRUE)

I get the error: Error inrownames<-(tmp, value = c("(-73.9,5e+03]", "(5e+03,9.99e+03]", : invalid rownames length In addition: Warning message: In annotatePeakInBatch(myPeakList = peaks, AnnotationData = annoData, : not all the seqnames of myPeakList is in the AnnotationData.

Could someone please explain what this means and what I need to change?

Thank you!

ADD REPLYlink written 3 months ago by Lucy0

Hi,

You mentioned that you downloaded the annotation file as GTF format from Ensembl. If this is correct, toGranges with format = "GFF" is not correct since GTF format is different from GFF format. Without changing your code, could you please download the annotation file as a GFF file format instead? Alternatively, you can use the following code to get the annotation assuming that you are interested in the human gene annotation.

library(EnsDb.Hsapiens.v86) annoData <- toGRanges(EnsDb.Hsapiens.v86, feature="gene")

Best regards, Julie

ADD REPLYlink modified 3 months ago • written 3 months ago by Julie Zhu4.0k
Answer: ChIPpeakAnno, MACS format annotation
0
gravatar for Julie Zhu
3 months ago by
Julie Zhu4.0k
United States
Julie Zhu4.0k wrote:

Lucy,

You mentioned that you downloaded the annotation file as GTF format from Ensembl. If this is correct, toGranges with format = "GFF" is not correct since GTF format is different from GFF format. Without changing your code, you could download the annotation file as a GFF file format instead. Alternatively, you can use the following code to get the annotation assuming that you are interested in the human gene annotation.

library(EnsDb.Hsapiens.v86) annoData <- toGRanges(EnsDb.Hsapiens.v86, feature="gene")

Best regards, Julie

ADD COMMENTlink written 3 months ago by Julie Zhu4.0k

Thank you Julie.

I wasn't sure whether I could use the GFF option as Ensembl states that "The GTF (General Transfer Format) is identical to GFF version 2" https://www.ensembl.org/info/website/upload/gff.html

I have a matched RNA-seq dataset for which I used the Ensembl GTF file for annotation, so I would like to use the exact same annotation version for my peak data. If I download the equivalent GFF file, does this contain all of the same information as the GTF file?

ADD REPLYlink modified 3 months ago • written 3 months ago by Lucy0

Lucy,

Thanks for the clarification!

Could you please post a few lines of the gtf annotation you used for analyzing your RNA-seq dataset? Thanks!

Best regards,

Julie

ADD REPLYlink modified 3 months ago • written 3 months ago by Julie Zhu4.0k
#!genome-build GRCh38.p12
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.27
#!genebuild-last-updated 2018-07
chr1    havana  gene    11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
chr1    havana  transcript  11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; tag "basic"; transcript_support_level "1";
chr1    havana  exon    11869   12227   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
chr1    havana  exon    12613   12721   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00003582793"; exon_version "1"; tag "basic"; transcript_support_level "1";
chr1    havana  exon    13221   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00002312635"; exon_version "1"; tag "basic"; transcript_support_level "1";
chr1    havana  transcript  12010   13670   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; tag "basic"; transcript_support_level "NA";
chr1    havana  exon    12010   12057   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; exon_id "ENSE00001948541"; exon_version "1"; tag "basic"; transcript_support_level "NA";
chr1    havana  exon    12179   12227   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; exon_id "ENSE00001671638"; exon_version "2"; tag "basic"; transcript_support_level "NA";
ADD REPLYlink written 3 months ago by Lucy0

Sorry that isn't very easy to read! I would be happy to send you the file if it is easier.

ADD REPLYlink written 3 months ago by Lucy0

Lucy,

Please send me the gtf file (julie.zhu@umassmed.edujulie.zhu@umassmed.edu). Thanks!

BTW, I just noticed that you continued with an old thread which is about MACs format. Could you please start a new thread as ChIPpeakAnno::toGRanges GTF format instead to facilitate future searches? Thanks!

Best,

Julie

ADD REPLYlink modified 3 months ago • written 3 months ago by Julie Zhu4.0k
Answer: ChIPpeakAnno, MACS format annotation
0
gravatar for Julie Zhu
3 months ago by
Julie Zhu4.0k
United States
Julie Zhu4.0k wrote:

Lucy, Please try the following code snippet for importing the gtf file hg38_200000.gtf.

library(refGenome)

gtf = ensemblGenome()

read.gtf(gtf, filename = "hg38_200000.gtf")

genes = gtf@ev$genes[ ,c("geneid","genename", "start", "end", "strand", "seqid")]

annoData <- toGRanges(genes, format="others", colNames=c("names", "gene_name", "start", "end", "strand", "space"))

Convert peaks file to GRanges object

peaks <- toGRanges("peaks_counts.bed", format="BED", header=FALSE)

peaks <- peaks[width(peaks) >0]

annotatedPeak <- annotatePeakInBatch(myPeakList=peaks, AnnotationData=annoData, ignore.strand=TRUE)

Best regards, Julie

ADD COMMENTlink modified 3 months ago • written 3 months ago by Julie Zhu4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 197 users visited in the last hour