Entering edit mode
Dear Bioc gurus,
I am a newbie with using R tools for ChIP-seq analyses and seek advice
on
the best way to go about a data set I have. Following are the file
formats
I have and what I would like to do with them:
1) Using Samtools, I created BED files (about 5 Gb) from the BAM files
(3-4
Gb)
2) Want to read the BED files (or BAM files) into R.
3) Perform quality control plots (like the number of duplicated reads
across the samples because the nature of ChIP-seq processing is
different
in some of the samples, and so I want to know what bias it
introduces).
4) Be able to retrieve specific genomic regions for exploration and
visualization of reads/peaks in the context of genomic annotations (I
guess
to have in a format so that I can play with GenomicRanges).
I am doing all this in a cluster with fairly good memory capacities
(about
18G; Or perhaps I think it is 'good' memory). I went through the
mailing
list and found some very useful discussion on reading BED/BAM files:
https://stat.ethz.ch/pipermail/bioc-sig-
sequencing/2011-March/001900.html
https://stat.ethz.ch/pipermail/bioc-sig-
sequencing/2011-September/002242.html
I thought BED files will be easy to work with because it already has
data
in a format that I understand (chromosome, start, end, tags). I tried
the
'import' function from rtracklayer, as suggested in the above link, to
read
the BED file. However, it didn't work as I run out of memory.
>From the discussions, it seems an alternative is Rsamtools to read
BAM
files. Before I go about with trying Rsamtools, I would be happy to
get
some advice on whether I am on the right track by using Rsamtools, and
if
any other packages/tools might have in-built functions to achieve what
I
want with the data.
Thanks for your time.
Sincerely
Hari Easwaran
[[alternative HTML version deleted]]