Question

chip-seq data

0

Entering edit mode

John linux-user ▴ 210

@john-linux-user-4917

Last seen 10.2 years ago

United States

Hi, I am wondering how to simply prepare the input files in range format like bed file for BayesPeak and chipseq package , as well as input file for RNA-seq packages. Assumed the bam files are already generated by BWA and samtools. Thanks. John [[alternative HTML version deleted]]

ChIPSeq chipseq BayesPeak ChIPSeq chipseq BayesPeak • 2.7k views

ADD COMMENT • link 13.4 years ago John linux-user ▴ 210

score 0 · Answer 1 · 2012-09-17

0

Entering edit mode

John linux-user ▴ 210

@john-linux-user-4917

Last seen 10.2 years ago

United States

Hi, I am wondering how to simply prepare the input files for R BayesPeak and chipseq packages, assuming BAM files already generated by BWA and samtools. Thanks. John [[alternative HTML version deleted]]

ADD COMMENT • link 13.4 years ago John linux-user ▴ 210

0

Entering edit mode

Hi John, I would try the Rsamtools package. You'd need something like this (warning, untested code): library(Rsamtools) bamFile = "path/to/Bamfile.bam" p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth")) bam <- scanBam(bamFile, param=p)[[1]] BayesPeak accepts data.frames or RangedDatas. I would suggest the easiest thing to do is construct a RangedData: library(IRanges) IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]]) x <- RangedData(ranges=IR, strand=bam[["strand"]], space=bam[["rname"]]) chipseq accepts GRanges by preference: library(GenomicRanges) y <- GRanges(seqnames=bam[["rname"]], ranges=IR, strand=bam[["strand"]]) There may be a faster/cleverer way of doing it, but this should work. Jonathan ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] On Behalf Of John linux-user [johnlinuxuser@yahoo.com] Sent: 17 September 2012 15:04 To: bioconductor at r-project.org Subject: [BioC] ChiPseq input files? Hi, I am wondering how to simply prepare the input files for R BayesPeak and chipseq packages, assuming BAM files already generated by BWA and samtools. Thanks. John [[alternative HTML version deleted]] NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:17}}

ADD REPLY • link 13.4 years ago Jonathan Cairns ▴ 130

0

Entering edit mode

Hi Jonathan, Thanks for your response and codes. That saves me a lot of time to look over the webs. Your answers are great! but if I try to create a table using python or other scripts and then input the table to R for statistics, how can I decide the range (e.g. start and end) when I count the reads in each position across the chromosomes/genome? Can you give me more suggestions? Thanks. Best, John ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk> <bioconductor@r-project.org> Sent: Monday, September 17, 2012 10:38 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I would try the Rsamtools package. You'd need something like this (warning, untested code): library(Rsamtools) bamFile = "path/to/Bamfile.bam" p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth")) bam <- scanBam(bamFile, param=p)[[1]] BayesPeak accepts data.frames or RangedDatas. I would suggest the easiest thing to do is construct a RangedData: library(IRanges) IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]]) x <- RangedData(ranges=IR, strand=bam[["strand"]], space=bam[["rname"]]) chipseq accepts GRanges by preference: library(GenomicRanges) y <- GRanges(seqnames=bam[["rname"]], ranges=IR, strand=bam[["strand"]]) There may be a faster/cleverer way of doing it, but this should work. Jonathan ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.or Sent: 17 September 2012 15:04 To: bioconductor@r-project.org Subject: [BioC] ChiPseq input files? Hi, I am wondering how to simply prepare the input files for R BayesPeak and chipseq packages, assuming BAM files already generated by BWA and samtools. Thanks. John [[alternative HTML version deleted]] NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:19}}

ADD REPLY • link 13.4 years ago John linux-user ▴ 210

0

Entering edit mode

Hi John, I'm afraid I don't understand your question. It sounds like you are trying to bin the reads? This shouldn't be necessary, as both packages do this for you. Was that your intended query? Jonathan ________________________________________ From: John linux-user [johnlinuxuser@yahoo.com] Sent: 17 September 2012 15:59 To: Jonathan Cairns; bioconductor at r-project.org Subject: Re: [BioC] ChiPseq input files? Hi Jonathan, Thanks for your response and codes. That saves me a lot of time to look over the webs. Your answers are great! but if I try to create a table using python or other scripts and then input the table to R for statistics, how can I decide the range (e.g. start and end) when I count the reads in each position across the chromosomes/genome? Can you give me more suggestions? Thanks. Best, John ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk> To: John linux-user <johnlinuxuser at="" yahoo.com="">; "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Sent: Monday, September 17, 2012 10:38 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I would try the Rsamtools package. You'd need something like this (warning, untested code): library(Rsamtools) bamFile = "path/to/Bamfile.bam" p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth")) bam <- scanBam(bamFile, param=p)[[1]] BayesPeak accepts data.frames or RangedDatas. I would suggest the easiest thing to do is construct a RangedData: library(IRanges) IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]]) x <- RangedData(ranges=IR, strand=bam[["strand"]], space=bam[["rname"]]) chipseq accepts GRanges by preference: library(GenomicRanges) y <- GRanges(seqnames=bam[["rname"]], ranges=IR, strand=bam[["strand"]]) There may be a faster/cleverer way of doing it, but this should work. Jonathan ________________________________________ From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [bioconductor-bounces@r-project.org<mailto :bioconductor-bounces@r-project.org="">] On Behalf Of John linux-user [johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>] Sent: 17 September 2012 15:04 To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> Subject: [BioC] ChiPseq input files? Hi, I am wondering how to simply prepare the input files for R BayesPeak and chipseq packages, assuming BAM files already generated by BWA and samtools. Thanks. John [[alternative HTML version deleted]] NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.

ADD REPLY • link 13.4 years ago Jonathan Cairns ▴ 130

0

Entering edit mode

Hi Jonathan, Thanks for your response. I just liked to use python instead of R to generate these IRange data. I looked over the introduction part of RNA-seq data and it seemed that it just counted the read hits overlapped the annotated gene regions as coded below, and I am wondering if it was the similar things occurred for chip-seq data. Thanks. John gnModel <- exonsBy(txdb, "gene") counter <- function(fl, gnModel) { aln <- readGappedAlignments(fl) strand(aln) <- "*" # for strand- blind sample prep protocol hits <- countOverlaps(aln, gnModel) counts <- countOverlaps(gnModel, aln[hits==1]) names(counts) <- names(gnModel) counts } ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk> <bioconductor@r-project.org> Sent: Monday, September 17, 2012 11:35 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I'm afraid I don't understand your question. It sounds like you are trying to bin the reads? This shouldn't be necessary, as both packages do this for you. Was that your intended query? Jonathan ________________________________________ Sent: 17 September 2012 15:59 To: Jonathan Cairns; bioconductor@r-project.org Subject: Re: [BioC] ChiPseq input files? Hi Jonathan, Thanks for your response and codes. That saves me a lot of time to look over the webs. Your answers are great! but if I try to create a table using python or other scripts and then input the table to R for statistics, how can I decide the range (e.g. start and end) when I count the reads in each position across the chromosomes/genome? Can you give me more suggestions? Thanks. Best, John ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk> <bioconductor@r-project.org> Sent: Monday, September 17, 2012 10:38 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I would try the Rsamtools package. You'd need something like this (warning, untested code): library(Rsamtools) bamFile = "path/to/Bamfile.bam" p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth")) bam <- scanBam(bamFile, param=p)[[1]] BayesPeak accepts data.frames or RangedDatas. I would suggest the easiest thing to do is construct a RangedData: library(IRanges) IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]]) x <- RangedData(ranges=IR, strand=bam[["strand"]], space=bam[["rname"]]) chipseq accepts GRanges by preference: library(GenomicRanges) y <- GRanges(seqnames=bam[["rname"]], ranges=IR, strand=bam[["strand"]]) There may be a faster/cleverer way of doing it, but this should work. Jonathan ________________________________________ From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [bioconductor-bounces@r-project.org<mailto :bioconductor-bounces@r-="" sent:="" 17="" september="" 2012="" 15:04="" to:="" bioconductor@r-project.org<mailto:bioconductor@r-project.org=""> Subject: [BioC] ChiPseq input files? Hi, I am wondering how to simply prepare the input files for R BayesPeak and chipseq packages, assuming BAM files already generated by BWA and samtools. Thanks. John [[alternative HTML version deleted]] NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. [[alternative HTML version deleted]]

ADD REPLY • link 13.4 years ago John linux-user ▴ 210

0

Entering edit mode

Hi, In RNA-seq, one knows where the regions of interest (i.e. exons) are, so binning is straightforward. No such database of "regions of interest" exists for ChIP-seq. Hence, peak-caller algorithms, to find them. IRanges/RangedData/GRanges etc are internal R objects, so you'll have a hard time constructing such a thing in python. If disk space is a major issue, you could try creating a .bed file from your .bam file, and then read that in with e.g. read.bed() in BayesPeak, or import() in rtracklayer. J ________________________________________ From: John linux-user [johnlinuxuser@yahoo.com] Sent: 17 September 2012 16:55 To: Jonathan Cairns; bioconductor at r-project.org Subject: Re: [BioC] ChiPseq input files? Hi Jonathan, Thanks for your response. I just liked to use python instead of R to generate these IRange data. I looked over the introduction part of RNA-seq data and it seemed that it just counted the read hits overlapped the annotated gene regions as coded below, and I am wondering if it was the similar things occurred for chip-seq data. Thanks. John gnModel <- exonsBy(txdb, "gene") counter <- function(fl, gnModel) { aln <- readGappedAlignments(fl) strand(aln) <- "*" # for strand-blind sample prep protocol hits <- countOverlaps(aln, gnModel) counts <- countOverlaps(gnModel, aln[hits==1]) names(counts) <- names(gnModel) counts } ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk> To: John linux-user <johnlinuxuser at="" yahoo.com="">; "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Sent: Monday, September 17, 2012 11:35 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I'm afraid I don't understand your question. It sounds like you are trying to bin the reads? This shouldn't be necessary, as both packages do this for you. Was that your intended query? Jonathan ________________________________________ From: John linux-user [johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>] Sent: 17 September 2012 15:59 To: Jonathan Cairns; bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> Subject: Re: [BioC] ChiPseq input files? Hi Jonathan, Thanks for your response and codes. That saves me a lot of time to look over the webs. Your answers are great! but if I try to create a table using python or other scripts and then input the table to R for statistics, how can I decide the range (e.g. start and end) when I count the reads in each position across the chromosomes/genome? Can you give me more suggestions? Thanks. Best, John ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk<mailto:jonathan.cairns@cancer.org.uk>> To: John linux-user <johnlinuxuser at="" yahoo.com<mailto:johnlinuxuser="" at="" yahoo.com="">>; "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> Sent: Monday, September 17, 2012 10:38 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I would try the Rsamtools package. You'd need something like this (warning, untested code): library(Rsamtools) bamFile = "path/to/Bamfile.bam" p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth")) bam <- scanBam(bamFile, param=p)[[1]] BayesPeak accepts data.frames or RangedDatas. I would suggest the easiest thing to do is construct a RangedData: library(IRanges) IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]]) x <- RangedData(ranges=IR, strand=bam[["strand"]], space=bam[["rname"]]) chipseq accepts GRanges by preference: library(GenomicRanges) y <- GRanges(seqnames=bam[["rname"]], ranges=IR, strand=bam[["strand"]]) There may be a faster/cleverer way of doing it, but this should work. Jonathan ________________________________________ From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">> [bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>] On Behalf Of John linux-user [johnlinuxuser@yahoo.com<mailto:johnlinuxuse r@yahoo.com=""><mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo .com="">>] Sent: 17 September 2012 15:04 To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> Subject: [BioC] ChiPseq input files? Hi, I am wondering how to simply prepare the input files for R BayesPeak and chipseq packages, assuming BAM files already generated by BWA and samtools. Thanks. John [[alternative HTML version deleted]] NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.

ADD REPLY • link 13.4 years ago Jonathan Cairns ▴ 130

0

Entering edit mode

Hi Jonathan, Your clarification is great and how to create the bed file and what format the bed file would be is the exact question I like to ask, e.g counting reads in each base position or in each regions. If in each regions, how to decide the length of each region? Two specific example below for two formats. It would be easy to count reads in format1, but if format2, it would be hard to determine the range. Thanks for further suggestions. Best, John format 1, chr start end reads chr1,6557,6557, 233 ch10,9454,94545,100 format 2, chr start end reads chr1, 6557,8567, 2333 ch10,9454,194595,1000 ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk> <bioconductor@r-project.org> Sent: Monday, September 17, 2012 12:14 PM Subject: RE: [BioC] ChiPseq input files? Hi, In RNA-seq, one knows where the regions of interest (i.e. exons) are, so binning is straightforward. No such database of "regions of interest" exists for ChIP-seq. Hence, peak-caller algorithms, to find them. IRanges/RangedData/GRanges etc are internal R objects, so you'll have a hard time constructing such a thing in python. If disk space is a major issue, you could try creating a .bed file from your .bam file, and then read that in with e.g. read.bed() in BayesPeak, or import() in rtracklayer. J ________________________________________ Sent: 17 September 2012 16:55 To: Jonathan Cairns; bioconductor@r-project.org Subject: Re: [BioC] ChiPseq input files? Hi Jonathan, Thanks for your response. I just liked to use python instead of R to generate these IRange data. I looked over the introduction part of RNA-seq data and it seemed that it just counted the read hits overlapped the annotated gene regions as coded below, and I am wondering if it was the similar things occurred for chip-seq data. Thanks. John gnModel <- exonsBy(txdb, "gene") counter <- function(fl, gnModel) { aln <- readGappedAlignments(fl) strand(aln) <- "*" # for strand-blind sample prep protocol hits <- countOverlaps(aln, gnModel) counts <- countOverlaps(gnModel, aln[hits==1]) names(counts) <- names(gnModel) counts } ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk> <bioconductor@r-project.org> Sent: Monday, September 17, 2012 11:35 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I'm afraid I don't understand your question. It sounds like you are trying to bin the reads? This shouldn't be necessary, as both packages do this for you. Was that your intended query? Jonathan ________________________________________ om>] Sent: 17 September 2012 15:59 To: Jonathan Cairns; bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] ChiPseq input files? Hi Jonathan, Thanks for your response and codes. That saves me a lot of time to look over the webs. Your answers are great! but if I try to create a table using python or other scripts and then input the table to R for statistics, how can I decide the range (e.g. start and end) when I count the reads in each position across the chromosomes/genome? Can you give me more suggestions? Thanks. Best, John ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk<mailto:jonathan.cairns@cancer.org.uk>> >>; "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Sent: Monday, September 17, 2012 10:38 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I would try the Rsamtools package. You'd need something like this (warning, untested code): library(Rsamtools) bamFile = "path/to/Bamfile.bam" p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth")) bam <- scanBam(bamFile, param=p)[[1]] BayesPeak accepts data.frames or RangedDatas. I would suggest the easiest thing to do is construct a RangedData: library(IRanges) IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]]) x <- RangedData(ranges=IR, strand=bam[["strand"]], space=bam[["rname"]]) chipseq accepts GRanges by preference: library(GenomicRanges) y <- GRanges(seqnames=bam[["rname"]], ranges=IR, strand=bam[["strand"]]) There may be a faster/cleverer way of doing it, but this should work. Jonathan ________________________________________ From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">> [bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>] On Behalf Of John linux-user [johnlinux Sent: 17 September 2012 15:04 To: bioconductor@r-project.org<mailto:bioconductor@r-project.org><mail to:bioconductor@r-project.org<mailto:bioconductor@r-project.org="">> Subject: [BioC] ChiPseq input files? Hi, I am wondering how to simply prepare the input files for R BayesPeak and chipseq packages, assuming BAM files already generated by BWA and samtools. Thanks. John [[alternative HTML version deleted]] NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. [[alternative HTML version deleted]]

ADD REPLY • link 13.4 years ago John linux-user ▴ 210

0

Entering edit mode

see: http://genome.ucsc.edu/FAQ/FAQformat.html#format1 - each region should represent a single mapped read. Format 2 is insufficient to determine the original read locations. In fact, so is format 1 as presented; I assume the 5 on the end of 94545 is a typo. Format 1 is also missing "strand", so if you have the original .bam files, I'd suggest starting from those and sticking to the bed format outlined above. How to perform bam -> bed file conversion in python is not a bioconductor-related question and is therefore outside of the scope of this mailing list. J ________________________________________ From: John linux-user [johnlinuxuser@yahoo.com] Sent: 17 September 2012 17:37 To: Jonathan Cairns; bioconductor at r-project.org Subject: Re: [BioC] ChiPseq input files? Hi Jonathan, Your clarification is great and how to create the bed file and what format the bed file would be is the exact question I like to ask, e.g counting reads in each base position or in each regions. If in each regions, how to decide the length of each region? Two specific example below for two formats. It would be easy to count reads in format1, but if format2, it would be hard to determine the range. Thanks for further suggestions. Best, John format 1, chr start end reads chr1,6557,6557, 233 ch10,9454,94545,100 format 2, chr start end reads chr1, 6557,8567, 2333 ch10,9454,194595,1000 ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk> To: John linux-user <johnlinuxuser at="" yahoo.com="">; "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Sent: Monday, September 17, 2012 12:14 PM Subject: RE: [BioC] ChiPseq input files? Hi, In RNA-seq, one knows where the regions of interest (i.e. exons) are, so binning is straightforward. No such database of "regions of interest" exists for ChIP-seq. Hence, peak-caller algorithms, to find them. IRanges/RangedData/GRanges etc are internal R objects, so you'll have a hard time constructing such a thing in python. If disk space is a major issue, you could try creating a .bed file from your .bam file, and then read that in with e.g. read.bed() in BayesPeak, or import() in rtracklayer. J ________________________________________ From: John linux-user [johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>] Sent: 17 September 2012 16:55 To: Jonathan Cairns; bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> Subject: Re: [BioC] ChiPseq input files? Hi Jonathan, Thanks for your response. I just liked to use python instead of R to generate these IRange data. I looked over the introduction part of RNA-seq data and it seemed that it just counted the read hits overlapped the annotated gene regions as coded below, and I am wondering if it was the similar things occurred for chip-seq data. Thanks. John gnModel <- exonsBy(txdb, "gene") counter <- function(fl, gnModel) { aln <- readGappedAlignments(fl) strand(aln) <- "*" # for strand-blind sample prep protocol hits <- countOverlaps(aln, gnModel) counts <- countOverlaps(gnModel, aln[hits==1]) names(counts) <- names(gnModel) counts } ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk<mailto:jonathan.cairns@cancer.org.uk>> To: John linux-user <johnlinuxuser at="" yahoo.com<mailto:johnlinuxuser="" at="" yahoo.com="">>; "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> Sent: Monday, September 17, 2012 11:35 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I'm afraid I don't understand your question. It sounds like you are trying to bin the reads? This shouldn't be necessary, as both packages do this for you. Was that your intended query? Jonathan ________________________________________ From: John linux-user [johnlinuxuser@yahoo.com<mailto:johnlinuxuser@ya hoo.com=""><mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>>] Sent: 17 September 2012 15:59 To: Jonathan Cairns; bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> Subject: Re: [BioC] ChiPseq input files? Hi Jonathan, Thanks for your response and codes. That saves me a lot of time to look over the webs. Your answers are great! but if I try to create a table using python or other scripts and then input the table to R for statistics, how can I decide the range (e.g. start and end) when I count the reads in each position across the chromosomes/genome? Can you give me more suggestions? Thanks. Best, John ________________________________ From: Jonathan Cairns <jonathan.cairns@cancer.org.uk<mailto:jonathan.c airns@cancer.org.uk=""><mailto:jonathan.cairns@cancer.org.uk<mailto:jonat han.cairns@cancer.org.uk="">>> To: John linux-user <johnlinuxuser at="" yahoo.com<mailto:johnlinuxuser="" at="" yahoo.com=""><mailto:johnlinuxuser at="" yahoo.com<mailto:johnlinuxuser="" at="" yahoo.com="">>>; "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>> Sent: Monday, September 17, 2012 10:38 AM Subject: RE: [BioC] ChiPseq input files? Hi John, I would try the Rsamtools package. You'd need something like this (warning, untested code): library(Rsamtools) bamFile = "path/to/Bamfile.bam" p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth")) bam <- scanBam(bamFile, param=p)[[1]] BayesPeak accepts data.frames or RangedDatas. I would suggest the easiest thing to do is construct a RangedData: library(IRanges) IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]]) x <- RangedData(ranges=IR, strand=bam[["strand"]], space=bam[["rname"]]) chipseq accepts GRanges by preference: library(GenomicRanges) y <- GRanges(seqnames=bam[["rname"]], ranges=IR, strand=bam[["strand"]]) There may be a faster/cleverer way of doing it, but this should work. Jonathan ________________________________________ From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-="" bounces@r-project.org="">><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-="" bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>> [bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-="" bounces@r-project.org="">><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-="" bounces@r-project.org=""><mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org="">>>] On Behalf Of John linux-user [johnlinuxuser@yahoo.com<mailto:johnlinuxuse r@yahoo.com=""><mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo .com="">><mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>< mailto:johnlinuxuser@yahoo.com<mailto:johnlinuxuser@yahoo.com>>>] Sent: 17 September 2012 15:04 To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>> Subject: [BioC] ChiPseq input files? Hi, I am wondering how to simply prepare the input files for R BayesPeak and chipseq packages, assuming BAM files already generated by BWA and samtools. Thanks. John [[alternative HTML version deleted]] NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.

ADD REPLY • link 13.4 years ago Jonathan Cairns ▴ 130