Metagenomic analysis: Aligning ChIPseq data on TSS
4
3
Entering edit mode
@patrickschordert-6745
Last seen 9.6 years ago
United States

Hi everybody,

I am fairly new to NGS analysis, so this might be an easy question to answer, but I�ve been desperately trying to get it to work with no success for several days now.

I have three ChIP-seq datasets on which I am trying to do metagenomic analysis. Basically, I would like to average and plot the patterns of methylation marks around given features, in my case transcription start sites (TSS). For now, I have done the following steps:
- Read in my bam files (that correspond to peaks) into variables called: aligns1, 2, 3
- Reformatted these into coverage objects called: cov1, 2, 3
- Gotten the TSS coordinates using biomaRt and saved these in an variable called: martReply

This is were I get stuck. I have tried to follow the tutorial found on the pdf "EMBL course on Short Read analysis with Bioconductor: An exercice with coverage vectors� (23 June 2009), but there seems like some functions are deprecated since it was written. I get the following error:

Error in as.vector(subseq(coverage, winCentres[i] + left, winCentres[i] +  :
error in evaluating the argument 'x' in selecting a method for function 'as.vector': Error in (function (classes, fdef, mtable)  :
unable to find an inherited method for function �subseq� for signature �"SimpleRleList"�

I would appreciate any help with this.
Thanks a lot in advance,

Patrick

# Read in alignments from bam files
aligns1 <- readGAlignmentsFromBam(samplespath[1])
aligns2 <- readGAlignmentsFromBam(samplespath[2])
aligns3 <- readGAlignmentsFromBam(samplespath[3])

# Convert to coverage
cov1 <- coverage(aligns1)
cov2 <- coverage(aligns2)
cov3 <- coverage(aligns3)

# Get TSS from Biomart
library("biomaRt")
library("doBy")
mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
martReply <- getBM( attributes=c( "chromosome_name", "transcript_start", "transcript_end", "strand" ), mart=mart )
head(martReply)
tss <- ifelse(martReply$strand == 1, martReply$transcript_start, martReply$transcript_end)

deltaConvolve <- function(coverage, winCentres, revStrand,left, right){
result = rep( 0, right - left + 1 )
for(i in seq(along=winCentres)){
   range = IRanges(winCentres[i]+left, winCentres[i]+right)
   v = as.vector(Views(coverage, range))
   if(revStrand[i])
     v <- rev(v)
   result <- result+v   
}
return(result)
}

win = c(-500, 500)
H3K4me3 <- deltaConvolve(coverage=cov1, winCentres=tss, martReply$strand == -1, left=win[1], right=win[2])
metagenomics chipseq biomart shortread convert biomaRt • 2.0k views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 18 hours ago
Seattle, WA, United States

Hi Patrick,

Not sure which version of R/Bioconductor you're using, you didn't show the output of sessionInfo(). Would be helpful to know. Anyway, please make sure you use the latest (i.e. R-3.1 and BioC 2.14).

The call to as.vector() inside the deltaConvolve() function maybe used to work but won't work with a recent version of Bioconductor because Views(coverage, range) is an RleViewsList object and as.vector() doesn't work on these objects (I'm actually surprised it ever did). If I understand correctly what the real intent of this line is

  v = as.vector(Views(coverage, range))

then it should rather be done with

  v = unlist((endoapply(coverage, `[`, range)), use.names=FALSE)

Hope this helps.

ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 18 hours ago
Seattle, WA, United States

Thanks for sending your sessionInfo(). I took for granted that you were running exactly the code you showed in your original post but that doesn't seem to be the case (if that was the case, endoapply() would be called on an RleList, not an Rle()). We absolutely need to see what code you run exactly that produces the error you report in order to help. And this code should be reproducible. For example: 

library(GenomicAlignments)
bamfile <- system.file("extdata", "ex1.bam", package="Rsamtools")
gal1 <- readGAlignmentsFromBam(bamfile)
cov1 <- coverage(gal1)

deltaConvolve <- function(coverage, winCentres, revStrand,left, right)
{
  result = rep( 0, right - left + 1 )
  for(i in seq(along=winCentres)){
    range = IRanges(winCentres[i]+left, winCentres[i]+right)
    v = unlist((endoapply(coverage, `[`, range)), use.names=FALSE)
    if(revStrand[i])
      v <- rev(v)
    result <- result+v   
  }
  return(result)
}

Then:

> deltaConvolve(cov1, winCentres=c(100, 500), revStrand=c(TRUE, FALSE), left=-10, right=10)
numeric-Rle of length 42 with 32 runs
  Lengths:  1  1  2  1  1  1  1  2  1  1  1 ...  3  2  2  1  1  1  1  2  1  2
  Values : 65 66 65 67 66 67 66 67 68 69 75 ... 52 51 55 51 53 56 59 58 59 61

Works for me :-)

But maybe you want to reconsider your approach. I'm not familiar with the tutorial you're following (EMBL course on Short Read analysis with Bioconductor: An exercice with coverage vectors, from 23 June 2009), but it's possible that it's a little bit outdated and/or obsolete. So many things have changed in Bioconductor in 5 years. There are a lot of good packages these days that will do the hard work for you. Did you have a look at them?

http://bioconductor.org/packages/release/BiocViews.html#___ChIPSeq

Hard for me to recommend one in particular. It all depends on what you want to do. If, for whatever reason, you absolutely need to stick to this tutorial from 2009, maybe you'll have better luck by contacting its author directly.

Cheers,

H.

ADD COMMENT
0
Entering edit mode
@patrickschordert-6745
Last seen 9.6 years ago
United States

Thanks Hervé,

You are probably right. I have tried your method but am still getting the following error message:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘endoapply’ for signature ‘"Rle"’

See below for the sessionInfo which I had forgotten to put.

R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

Bioconductor version 2.14 (BiocInstaller 1.14.2)

 

Thanks for any additional help, even if it is a whole different way of doing things.

 

 

ADD COMMENT
0
Entering edit mode
@patrickschordert-6745
Last seen 9.6 years ago
United States

Yes, you are absolutely right and I don't need to stick with this particular approach. The problem that I am facing is that there are so many different packages out there that all seem great, but which are hard to get around.

It might sound stupid, but I am simply trying to (for now) explore my data. I did the normalization (by the corresponding input) for each of my three samples and the peak calling via SPP. I know have bed files that correspond to peaks and I am trying to compare simple things like the width/length/height of my peaks for each conditions. I read them in as RangedData objects.

 

Now, I would like to intersect these with TSS that I have gotten from UCSC with the following commands:

library("biomaRt")

mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")

martReply <- getBM( attributes=c( "chromosome_name", "transcript_start", "transcript_end", "strand" ), mart=mart )

 

Ideally, this would allow me to 'visualize' the patterns of each mark over TSSs. I hope this is clear.

Any hints or specific packages you could point me to?

Thanks a lot for the help.

p.

ADD COMMENT
0
Entering edit mode

Have you looked at the ChIPpeakAnno or ChIPseeker packages? A good starting point is to go to their landing pages and to read their description (generally a short summary of their functionalities). If that sounds promising, then keep going and have a look at their vignettes. You might find that they provide tools for doing exactly what you are trying to do.

If you really want to stick to the "do it myself approach", it's going to be more work and will require that you learn the basics of many important containers like GRanges, GRangesList, TranscriptDb (renamed TxDb in BioC 3.0), RleList, RleViewsList, etc... and methods to operate on these objects.  It might be worth it. If you run into some problems or need help with a specific piece of code, please ask and try to be as specific as you can (and show the exact code you're having problem with + the output you get). Note that the use of RangedData objects is now discouraged. We recommend that people use GRanges objects instead if they can. This is one of the many things that have changed in the last 5 years. However, some popular packages dealing with ChIPseq data still use RangedData objects so it's OK in that context.

Hope this helps,

H.

ADD REPLY

Login before adding your answer.

Traffic: 812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6