ChIPseeker/Question about ''annoatePeak' function and for Bioconductor AnnotationData Packages for L.japonicus
2
0
Entering edit mode
egbastakis • 0
@egbastakis-17559
Last seen 3.4 years ago

Hello,

I am currently trying to perform part of my ChIPseq analysis with the ChIPseeker program. 

I would like to use the ''annotatePeak'' function but for that apart for the bed file with my peaks (derived from MACS), which I have it, I need two more files, the ''MeSH.Miy.eg.db'', which I also  have it (from https://www.bioconductor.org/packages/devel/data/annotation/), and a ``TxDb.Lotus japonicus.knownGene`` file that I cannot find it somewhere.

Could anyone help with that?

Best,

Manolis

commands:

This is an example of the command that I should follow (from the manual of the ChIPseeker):

>peakAnno <- annotatePeak(files[[4]], tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Hs.eg.db")

# txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

I would like to do the same but with Lotus japonicus.

chipseeker Bioconductor AnnotationData Packages Lotus japonicus ChIPseq • 2.4k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

The TxDb packages contain information about where the genes for your species are found in the genome. You presumably have a genome, as you would have needed one to get this far. At this point you will need a GFF file that describes the genomic locations, based on the genome you used. You can then use makeTxDbFromGFF in the GenomicFeatures package to make the TxDb.

ADD COMMENT
0
Entering edit mode

Hi James,

Many thanks, it did work great with your advise.

Nevertheless, I stuck again little bit later...

After I executed the annotatePeak command this is what I got:

-----------------------------------------

History session starts:

> peakAnno <- annotatePeak(files[[1]], tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="MeSH.Miy.eg.db")

>> loading peak file... 2018-09-28 11:09:22 

>> preparing features information... 2018-09-28 11:09:23 

>> identifying nearest features... 2018-09-28 11:09:24 

>> calculating distance from peak to TSS... 2018-09-28 11:09:25 

>> assigning genomic annotation... 2018-09-28 11:09:25 

>> adding gene annotation... 2018-09-28 11:09:35 

>> assigning chromosome lengths 2018-09-28 11:09:35 

>> done... 2018-09-28 11:09:35 

Warning message:

In annotatePeak(files[[1]], tssRegion = c(-3000, 3000), TxDb = txdb,  :

  Unknown ID type, gene annotation will not be added...

> plotAnnoPie(peakAnno)

> plotAnnoBar(peakAnno)

> plotAnnoPie(peakAnno)

> vennpie(peakAnno)

> upsetplot(peakAnno)

> upsetplot(peakAnno, vennpie=TRUE)

History session ends:

--------------------------------------------------

So, I have gotten this a warning message, which actually tells that I have no list of my annotated peaks! 

Though, I was able afterwords to execute several commands (), in order to produce the corresponding graphs. 

I do need this list as a csv file to see the affiliation of my peaks with the corresponding genes but also for the continuation of my analysis as well (when I start using the package ''ReactomePA'' and the command 

>pathway1 <- enrichPathway(as.data.frame(peakAnno)$geneId) from  of the ChIPseeker manual till the end of it).

Could you please help me with that issue as well?

Greetings,

Manolis

P.S.: Please see below part of the gff3, which I have used for my annotation: 

          V1                   V2   V3   V4   V5  V6 V7 V8
1 Lj3.0_chrC       protein_coding gene  519 1371   .  .  .
2 Lj3.0_chrC       protein_coding mRNA  519 1371   .  -  .
3 Lj3.0_chrC       protein_coding exon  519 1371 100  -  .
4 Lj3.0_chrC       protein_coding  CDS  520 1365   .  -  0
5 Lj3.0_chrC processed_transcript gene 1323 1431   .  .  .
6 Lj3.0_chrC processed_transcript mRNA 1323 1431   .  +  .
                                                                                               V9
1       ID=Ljchlorog3v0000020;Name=NODE_65561_length_799_cov_134.377975.path1;Type=protein_coding
2 ID=Ljchlorog3v0000020.1;Parent=Ljchlorog3v0000020;Name=Ljchlorog3v0000020.1;Type=protein_coding
3                  ID=Ljchlorog3v0000020.1.exon.1;Parent=Ljchlorog3v0000020.1;Type=protein_coding
4                   ID=Ljchlorog3v0000020.1.CDS.1;Parent=Ljchlorog3v0000020.1;Type=protein_coding
5                  ID=Ljchlorog3v0000030;Name=NODE_120091_length_55_cov_10.472727.path1;Type=NULL
6           ID=Ljchlorog3v0000030.1;Parent=Ljchlorog3v0000030;Name=Ljchlorog3v0000030.1;Type=NULL

ADD REPLY
0
Entering edit mode

Hi James,

Many thanks, it did work great with your advise.

Nevertheless, I stuck again little bit later...

After I executed the annotatePeak command this is what I got:

-----------------------------------------

History session starts:

> peakAnno <- annotatePeak(files[[1]], tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="MeSH.Miy.eg.db")

>> loading peak file... 2018-09-28 11:09:22 

>> preparing features information... 2018-09-28 11:09:23 

>> identifying nearest features... 2018-09-28 11:09:24 

>> calculating distance from peak to TSS... 2018-09-28 11:09:25 

>> assigning genomic annotation... 2018-09-28 11:09:25 

>> adding gene annotation... 2018-09-28 11:09:35 

>> assigning chromosome lengths 2018-09-28 11:09:35 

>> done... 2018-09-28 11:09:35 

Warning message:

In annotatePeak(files[[1]], tssRegion = c(-3000, 3000), TxDb = txdb,  :

  Unknown ID type, gene annotation will not be added...

> plotAnnoPie(peakAnno)

> plotAnnoBar(peakAnno)

> plotAnnoPie(peakAnno)

> vennpie(peakAnno)

> upsetplot(peakAnno)

> upsetplot(peakAnno, vennpie=TRUE)

History session ends:

--------------------------------------------------

So, I have gotten this a warning message, which actually tells that I have no list of my annotated peaks! 

Though, I was able afterwords to execute several commands (), in order to produce the corresponding graphs. 

I do need this list as a csv file to see the affiliation of my peaks with the corresponding genes but also for the continuation of my analysis as well (when I start using the package ''ReactomePA'' and the command 

>pathway1 <- enrichPathway(as.data.frame(peakAnno)$geneId) from  of the ChIPseeker manual till the end of it).

Could you please help me with that issue as well?

Greetings,

Manolis

P.S.: Please see below part of the gff3, which I have used for my annotation: 

          V1                   V2   V3   V4   V5  V6 V7 V8
1 Lj3.0_chrC       protein_coding gene  519 1371   .  .  .
2 Lj3.0_chrC       protein_coding mRNA  519 1371   .  -  .
3 Lj3.0_chrC       protein_coding exon  519 1371 100  -  .
4 Lj3.0_chrC       protein_coding  CDS  520 1365   .  -  0
5 Lj3.0_chrC processed_transcript gene 1323 1431   .  .  .
6 Lj3.0_chrC processed_transcript mRNA 1323 1431   .  +  .
                                                                                               V9
1       ID=Ljchlorog3v0000020;Name=NODE_65561_length_799_cov_134.377975.path1;Type=protein_coding
2 ID=Ljchlorog3v0000020.1;Parent=Ljchlorog3v0000020;Name=Ljchlorog3v0000020.1;Type=protein_coding
3                  ID=Ljchlorog3v0000020.1.exon.1;Parent=Ljchlorog3v0000020.1;Type=protein_coding
4                   ID=Ljchlorog3v0000020.1.CDS.1;Parent=Ljchlorog3v0000020.1;Type=protein_coding
5                  ID=Ljchlorog3v0000030;Name=NODE_120091_length_55_cov_10.472727.path1;Type=NULL
6           ID=Ljchlorog3v0000030.1;Parent=Ljchlorog3v0000030;Name=Ljchlorog3v0000030.1;Type=NULL

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

The MeSH.Miy.eg.db package isn't a conventional OrgDb package, which is what ChIPseeker is expecting. That package only has one table that maps gendoo IDs to MESH IDs, so won't be particularly helpful.

The IDs you have in your GFF appear to be LotusBase IDs, so for ChIPseeker to annotate things you will probably need an OrdDb that maps those IDs to whatever annotation ChIPseeker is normally parsing from the OrgDb package. Unfortunately there doesn't seem to be much data out there for L. japonicus (NCBI has somewhere around 780 Gene IDs, but probably more RefSeq or GI numbers), so you are in the position where there isn't much existing infrastructure, so to a large extent you are on your own.

But do note that the annoDb argument is optional, and according to the help page all it adds is the Gene ID, Symbol, and Gene Name for the nearest gene. You could probably get that yourself by hand.

ADD COMMENT

Login before adding your answer.

Traffic: 702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6