Hi Federico,
The feature column is the rownames of annotationData. so you could
try:
rownames(RDucscmm9)<-RDucscmm9$tx_name
before annotatePeakInBatch.
Yours sincerely,
Jianhong Ou
jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu>
On Mar 29, 2012, at 3:12 PM, Federico Marini wrote:
Hi Jianhong,
I tried the intermediate step you suggested and it actually returned
no error, just the warning message:
Warning message:
In annotatePeakInBatch(peaksRDt[1:10, ], AnnotationData = RDucscmm9,
:
Please use select instead of multiple!
but this is minor. So, first of all thank you one more time for
solving this issue. Just one more thing, and then I think I really am
good to go: the table I downloaded should ideally report the NM_ and
NR_ identifiers for the transcripts, but in the final output what I
see is this (in bold the column of interest)
RangedData with 14 rows and 9 value columns across 1 space
space ranges | peak
strand
<factor> <iranges> | <character>
<character>
MACS_peak_1 01511 chr1 [3024414, 3024497] | MACS_peak_1
-
MACS_peak_10 01515 chr1 [4346000, 4346141] | MACS_peak_10
-
MACS_peak_2 01511 chr1 [3115427, 3115503] | MACS_peak_2
-
MACS_peak_3 01511 chr1 [3145725, 3145836] | MACS_peak_3
-
MACS_peak_4 01511 chr1 [3332349, 3332698] | MACS_peak_4
-
MACS_peak_5 01513 chr1 [3566535, 3566719] | MACS_peak_5
-
MACS_peak_6 01512 chr1 [3726359, 3726485] | MACS_peak_6
-
MACS_peak_7 01512 chr1 [3795201, 3795271] | MACS_peak_7
-
MACS_peak_8 01515 chr1 [4204115, 4204357] | MACS_peak_8
-
MACS_peak_9 01515 chr1 [4283090, 4283190] | MACS_peak_9
-
MACS_peak_10 01514 chr1 [4346000, 4346141] | MACS_peak_10
-
MACS_peak_4 01512 chr1 [3332349, 3332698] | MACS_peak_4
-
MACS_peak_5 01512 chr1 [3566535, 3566719] | MACS_peak_5
-
MACS_peak_9 01514 chr1 [4283090, 4283190] | MACS_peak_9
-
feature start_position end_position
insideFeature
<character> <numeric> <numeric>
<character>
MACS_peak_1 01511 01511 3195985 3205713
downstream
MACS_peak_10 01515 01515 4333588 4350395
inside
MACS_peak_2 01511 01511 3195985 3205713
downstream
MACS_peak_3 01511 01511 3195985 3205713
downstream
MACS_peak_4 01511 01511 3195985 3205713
upstream
MACS_peak_5 01513 01513 3638392 3648985
downstream
MACS_peak_6 01512 01512 3204563 3661579
upstream
MACS_peak_7 01512 01512 3204563 3661579
upstream
MACS_peak_8 01515 01515 4333588 4350395
downstream
MACS_peak_9 01515 01515 4333588 4350395
downstream
MACS_peak_10 01514 01514 4280927 4399322
inside
MACS_peak_4 01512 01512 3204563 3661579
inside
MACS_peak_5 01512 01512 3204563 3661579
inside
MACS_peak_9 01514 01514 4280927 4399322
inside
distancetoFeature shortestDistance
fromOverlappingOrNearest
<numeric> <numeric>
<character>
MACS_peak_1 01511 181299 171488
NearestStart
MACS_peak_10 01515 4395 4254
NearestStart
MACS_peak_2 01511 90286 80482
NearestStart
MACS_peak_3 01511 59988 50149
NearestStart
MACS_peak_4 01511 -126636 126636
NearestStart
MACS_peak_5 01513 82450 71673
NearestStart
MACS_peak_6 01512 -64780 64780
NearestStart
MACS_peak_7 01512 -133622 133622
NearestStart
MACS_peak_8 01515 146280 129231
NearestStart
MACS_peak_9 01515 67305 50398
NearestStart
MACS_peak_10 01514 53322 53181
Overlapping
MACS_peak_4 01512 329230 127786
Overlapping
MACS_peak_5 01512 95044 94860
Overlapping
MACS_peak_9 01514 116232 2163
Overlapping
So, no(t yet) the TranscriptID as I was planning (while it appears
regularly ENSMUSG00000064842 or others when I use the TSS you
provided). Is some step still missing? I also tried playing around
with the vals and columns parameters in the transcripts step, but it
didn't work.
Best regards,
Federico
On 29/03/12 18:52, Ou, Jianhong wrote:
Hi Federico,
Could you please have a try
RDucscmm9$strand<-as(RDucscmm9$strand,"factor")
before using annotatePeakInBatch, and tell me how about the results.
Yours sincerely,
Jianhong Ou
jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu>
On Mar 29, 2012, at 12:42 PM, Zhu, Lihua (Julie) wrote:
Dear Federico,
The following code works. So RDucscmm9<- as(ucscmm9, "RangedData")
should work as well. It complains about ranges being NA.
Jianhong will help you to debug and get back to you. Thanks!
gr <-
GRanges(seqnames =
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges =
IRanges(1:10, width = 10:1, names = head(letters,10)),
strand =
Rle(strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10))
as(gr, RangedData)
Best regards,
Julie
On 3/29/12 11:27 AM, "Federico Marini" <f.marini@imb- mainz.de<x-msg:="" 458="" f.marini@imb-mainz.de="">> wrote:
Dear Julie,
Thanks first of all for the positive and prompt answer.
I just have one additional question: when retrieving the TranscriptDb
object, these are the steps I took.
source("
http://bioconductor.org/biocLite.R"
<http: bioconductor.org="" bioclite.r=""> )
biocLite("GenomicFeatures")
library(GenomicFeatures)
supportedUCSCtables()
mm9KG <-makeTranscriptDbFromUCSC(genome = "mm9",tablename =
"knownGene")
ucscmm9<- transcripts(mm9KG)
annotPeaksUCSC<-annotatePeakInBatch(peaksRDt[1:10,],AnnotationData=ucs
cmm9,output="both",multiple=FALSE,maxgap=0)
This returns an error since annotatePeakInBatch requires a RangedData
object for the annotation, but my "ucscmm9" is a GRanges object;
still,
http://wiki.fhcrc.org/bioc/Transcript_Annotation_API_Spec
specified that I would have a RangedData object as the output for the
transcripts function - but apparently it didn't work like that.
I thought one way to make it work is to us
RDucscmm9<- as (ucscmm9, "RangedData")
but I get this as an output
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE =
"IRanges") :
solving row 1: range cannot be determined from the supplied
arguments (too many NAs)
I tried googling around for this, but sadly no solution was found
(yet). Is there a tested and smart way to do this - and get something
which is under all aspects comparable to the TSS.mouse.NCBIM37
annotation?
Thanks again for the attention.
Best regards,
Federico
On 29/03/12 16:43, Zhu, Lihua (Julie) wrote:
Re: ChIPpeakAnno - Request of information Dear Federico,
Yes. You can ChIPpeakAnno to annotate your peaks with refseq
genes/transcripts. You could also use any custom annotation file you
have.
One option for obtaining the refseq genes/transcripts would be using
GenomicFeatures package. I am ccing Bioconductor listserv for others
to chime in.
library(GenomicFeatures)
supportedUCSCtables()
Best regards,
Julie
On 3/29/12 3:58 AM, "Federico Marini" <f.marini@imb- mainz.de<x-msg:="" 458="" f.marini@imb-mainz.de="">> wrote:
Dear Prof. Zhu,
My name is Federico Marini, and I started recently my activity as
PhD student at the Institute of Molecular Biology in Mainz, Germany.
I contact you to request additional information regarding one of
your published works, namely "ChIPpeakAnno: a Bioconductor package to
annotate ChIP-seq and ChIP-chip data", which I had the pleasure to
read in detail. I would like to get perform the annotation referring
to the refseq genes/transcripts, instead of the ensembl ones. Could it
be possible to obtain a similar object for this, and if so, what could
be the best/smartest way to do it?
Since it could be a very precious resource for the needs of our lab,
this is why I chose to contact you directly.
Thanks in advance for your time and attention.
Best regards,
Federico Marini
--
Federico Marini
Signaling to Chromatin Laboratory
Institute of Molecular Biology gGmbH (IMB)
http://www.imb-mainz.de<http: www.imb-mainz.de=""/>
Tel: +49 (0) 6131 39 21462
[[alternative HTML version deleted]]