How does MEDIPS handle multiple mapping of a read

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.4 years ago

Hi, I was wondering if someone can tell me what happens in MEDIPS a particular read maps to multiple locations on the genome? I have just aligned my MeDIPS-seq data using BWA and only about 40% have a mapping quality of above 30. Does MEDIPS only take into account unique reads and throws out the rest? I looked at the MEDIPS manual (http://www.bioconductor.org/packages/re lease/bioc/vignettes/MEDIPS/inst/doc/MEDIPS.pdf) and can't find any information on this. I read on various forums and many people suggest just using unique reads. However, for ChIP-seq, it is suggested that one could use CSEM so that information from multiple mapping is not completely lost. Thanks in advance for your help. Allen -- output of sessionInfo(): error -- Sent via the guest posting facility at bioconductor.org.

MEDIPS MEDIPS • 2.6k views

ADD COMMENT • link updated 11.4 years ago by Lukas Chavez ▴ 570 • written 11.4 years ago by Guest User ★ 13k

0

Entering edit mode

Davis, Wade ▴ 350

@davis-wade-2803

Last seen 11.4 years ago

Hi Allen, It seems there are two somewhat related issues that are relevant here: multi-mapped reads (i.e., NOT uniquely MAPPABLE read) and duplicate reads (i.e., not the only read with a given same start/stop). I like to avoid the term "unique" by itself because it can be ambiguous as you can see. To my knowledge, MEDIPS doesn't address the multi-mapped reads; I think many would argue that is something best handled by the aligner or afterward by filtering your bam files based on certain flags. Specifically, it is more efficient for the aligner to implement whatever you desire (e.g., spreading them around uniformly or based on some estimated probability) during alignment than for MEDIPS to do it. However, MEDIPS does deal with duplicate reads via the uniq flag, as noted in the vignette: "MEDIPS will replace all reads which map to exactly the same start and end positions on the same strand by only one representative: > uniq=TRUE" See ?MEDIPS.createSet Most of what I have read about for ChiP/MeDIP lead us to only count duplicate reads once (i.e., use uniq=TRUE). We looked at a number of samples analyzed both ways and looked at the Irreproducible Discovery Rate (IDR) plots and found better reproducibility using this approach. Another paper suggests using the duplicates or discarding them depending on the purpose: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcb i.1003326 I hope this helps a little, Wade -----Original Message----- From: Allen [guest] [mailto:guest@bioconductor.org] Sent: Tuesday, August 26, 2014 4:50 AM To: bioconductor at r-project.org; patcksa at nus.edu.sg Cc: MEDIPS Maintainer Subject: [BioC] How does MEDIPS handle multiple mapping of a read Hi, I was wondering if someone can tell me what happens in MEDIPS a particular read maps to multiple locations on the genome? I have just aligned my MeDIPS-seq data using BWA and only about 40% have a mapping quality of above 30. Does MEDIPS only take into account unique reads and throws out the rest? I looked at the MEDIPS manual (http://www.bioconductor.org/packages/re lease/bioc/vignettes/MEDIPS/inst/doc/MEDIPS.pdf) and can't find any information on this. I read on various forums and many people suggest just using unique reads. However, for ChIP-seq, it is suggested that one could use CSEM so that information from multiple mapping is not completely lost. Thanks in advance for your help. Allen -- output of sessionInfo(): error -- Sent via the guest posting facility at bioconductor.org.

ADD COMMENT • link 11.4 years ago Davis, Wade ▴ 350

0

Entering edit mode

Dear Allen and Wade, indeed "multiple mappers" and reads that align to exactly the same position (sometimes named "clonal" or "stacked" reads) are two different things and need to be discussed separately. Wade has given a great overview about "stacked reads" by referring to the uniq=TRUE parameter of the MEDIPS package, sharing his results with the IDR, and by pointing to a study that further discusses this topic. Here, I would like to add that some tools provide the opportunity to estimate a global or local maximum of "stacked" reads. MEDIPS currently only allows for considering one representative per genomic position and strand or all reads. It is somewhere on my ToDo list to add a third option allowing a certain maximum of stacked reads. Reads that align to several positions in the genome ("multiple mapper") are common in MeDIP experiments, because of high abundance of methylation at repetitive regions. MEDIPS uses functions of the Rsamtools package to import mapping results from a bam file. Personally, I do not have experience using bam files including multiple alignments per read, because I am typically reporting only mapping results for unique mappers. However, i f a bam file contains multiple mappers, all hits will be imported into MEDIPS. This behavior is determined by the internal MEDIPS function getGRange(). The relevant line is scanFlag = scanBamFlag(isUnmappedQuery = F) Here, all unmapped reads are neglected in the scanBamFlag constructor. Among others, the scanBamFlag constructor has the parameter isNotPrimaryRead which is NA by default and the parameter is not changed by MEDIPS. Consequently, all alignments are returned regardless of their primary status. More information on the primary status of an alignment ( http://www.bioconductor.org/packages/release/bioc/manuals/Rsamtools/ma n/Rsamtools.pdf ): As far as I remember, I previously encountered problems when importing CSEM alignment results into MEDIPS, but this is a while ago. What you can always do is to convert your mapping results/ bam file into a plain bed file, regardless of the presence of multiple alignments per read and regardless of the appropriate use of flags of any alignment tool. MEDIPS will then consider each given entry as a mapping hit and calculate the genomic coverage at genome wide windows accordingly. Best regards, Lukas On Tue, Aug 26, 2014 at 9:28 AM, Davis, Wade <davisjwa at="" health.missouri.edu=""> wrote: > Hi Allen, > It seems there are two somewhat related issues that are relevant here: > multi-mapped reads (i.e., NOT uniquely MAPPABLE read) and duplicate reads > (i.e., not the only read with a given same start/stop). I like to avoid the > term "unique" by itself because it can be ambiguous as you can see. > > To my knowledge, MEDIPS doesn't address the multi-mapped reads; I think > many would argue that is something best handled by the aligner or afterward > by filtering your bam files based on certain flags. Specifically, it is > more efficient for the aligner to implement whatever you desire (e.g., > spreading them around uniformly or based on some estimated probability) > during alignment than for MEDIPS to do it. > > However, MEDIPS does deal with duplicate reads via the uniq flag, as noted > in the vignette: > "MEDIPS will replace all reads which map to exactly the same start and end > positions on the same strand by only one representative: > > uniq=TRUE" > > See ?MEDIPS.createSet > > Most of what I have read about for ChiP/MeDIP lead us to only count > duplicate reads once (i.e., use uniq=TRUE). We looked at a number of > samples analyzed both ways and looked at the Irreproducible Discovery Rate > (IDR) plots and found better reproducibility using this approach. Another > paper suggests using the duplicates or discarding them depending on the > purpose: > > http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.p cbi.1003326 > > I hope this helps a little, > Wade > > -----Original Message----- > From: Allen [guest] [mailto:guest at bioconductor.org] > Sent: Tuesday, August 26, 2014 4:50 AM > To: bioconductor at r-project.org; patcksa at nus.edu.sg > Cc: MEDIPS Maintainer > Subject: [BioC] How does MEDIPS handle multiple mapping of a read > > Hi, > I was wondering if someone can tell me what happens in MEDIPS a particular > read maps to multiple locations on the genome? I have just aligned my > MeDIPS-seq data using BWA and only about 40% have a mapping quality of > above 30. Does MEDIPS only take into account unique reads and throws out > the rest? > > I looked at the MEDIPS manual ( > http://www.bioconductor.org/packages/release/bioc/vignettes/MEDIPS/i nst/doc/MEDIPS.pdf) > and can't find any information on this. > > I read on various forums and many people suggest just using unique reads. > However, for ChIP-seq, it is suggested that one could use CSEM so that > information from multiple mapping is not completely lost. > > Thanks in advance for your help. > > Allen > > > > -- output of sessionInfo(): > > error > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago Lukas Chavez ▴ 570

0

Entering edit mode

Dear Lukas and Wade. Thanks for your replies. It is very much appreciated. Lukas, you mentioned that the internal MEDIPS function getGRange() is set so as to only ignore all unmapped reads and this is done through scanBamFlag. I was wondering if it is possible for me to modify the getGRange() function so as to only use unique single-mapped reads or do I really need to filter my bam file as Wade suggested. Wade, thanks for highlighting "uniq=TRUE" for duplicate/stacked reads. I actually had not considered this would be problem because (correct me if I am wrong) I thought that the number of reads for a peak was an indicator of the degree of methylation, much like the quantitation of methylation given by beta or M values for arrays. So, my question is why would one want to remove such stacked reads? I look forward to hearing to your answers and thank you for your time and assistance. best, Allen ________________________________________ From: Lukas Chavez [lukas.chavez.mailings@googlemail.com] Sent: Wednesday, August 27, 2014 3:50 AM To: Davis, Wade Cc: Allen [guest]; bioconductor at r-project.org; Chong Kim San Allen; MEDIPS Maintainer Subject: Re: [BioC] How does MEDIPS handle multiple mapping of a read Dear Allen and Wade, indeed "multiple mappers" and reads that align to exactly the same position (sometimes named "clonal" or "stacked" reads) are two different things and need to be discussed separately. Wade has given a great overview about "stacked reads" by referring to the uniq=TRUE parameter of the MEDIPS package, sharing his results with the IDR, and by pointing to a study that further discusses this topic. Here, I would like to add that some tools provide the opportunity to estimate a global or local maximum of "stacked" reads. MEDIPS currently only allows for considering one representative per genomic position and strand or all reads. It is somewhere on my ToDo list to add a third option allowing a certain maximum of stacked reads. Reads that align to several positions in the genome ("multiple mapper") are common in MeDIP experiments, because of high abundance of methylation at repetitive regions. MEDIPS uses functions of the Rsamtools package to import mapping results from a bam file. Personally, I do not have experience using bam files including multiple alignments per read, because I am typically reporting only mapping results for unique mappers. However, if a bam file contains multiple mappers, all hits will be imported into MEDIPS. This behavior is determined by the internal MEDIPS function getGRange(). The relevant line is scanFlag = scanBamFlag(isUnmappedQuery = F) Here, all unmapped reads are neglected in the scanBamFlag constructor. Among others, the scanBamFlag constructor has the parameter isNotPrimaryRead which is NA by default and the parameter is not changed by MEDIPS. Consequently, all alignments are returned regardless of their primary status. More information on the primary status of an alignment (http://www.bioconductor.org/packages/release/b ioc/manuals/Rsamtools/man/Rsamtools.pdf): As far as I remember, I previously encountered problems when importing CSEM alignment results into MEDIPS, but this is a while ago. What you can always do is to convert your mapping results/ bam file into a plain bed file, regardless of the presence of multiple alignments per read and regardless of the appropriate use of flags of any alignment tool. MEDIPS will then consider each given entry as a mapping hit and calculate the genomic coverage at genome wide windows accordingly. Best regards, Lukas On Tue, Aug 26, 2014 at 9:28 AM, Davis, Wade <davisjwa at="" health.missouri.edu<mailto:davisjwa="" at="" health.missouri.edu="">> wrote: Hi Allen, It seems there are two somewhat related issues that are relevant here: multi-mapped reads (i.e., NOT uniquely MAPPABLE read) and duplicate reads (i.e., not the only read with a given same start/stop). I like to avoid the term "unique" by itself because it can be ambiguous as you can see. To my knowledge, MEDIPS doesn't address the multi-mapped reads; I think many would argue that is something best handled by the aligner or afterward by filtering your bam files based on certain flags. Specifically, it is more efficient for the aligner to implement whatever you desire (e.g., spreading them around uniformly or based on some estimated probability) during alignment than for MEDIPS to do it. However, MEDIPS does deal with duplicate reads via the uniq flag, as noted in the vignette: "MEDIPS will replace all reads which map to exactly the same start and end positions on the same strand by only one representative: > uniq=TRUE" See ?MEDIPS.createSet Most of what I have read about for ChiP/MeDIP lead us to only count duplicate reads once (i.e., use uniq=TRUE). We looked at a number of samples analyzed both ways and looked at the Irreproducible Discovery Rate (IDR) plots and found better reproducibility using this approach. Another paper suggests using the duplicates or discarding them depending on the purpose: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcb i.1003326 I hope this helps a little, Wade -----Original Message----- From: Allen [guest] [mailto:guest@bioconductor.org<mailto:guest@bioconductor.org>] Sent: Tuesday, August 26, 2014 4:50 AM To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">; patcksa at nus.edu.sg<mailto:patcksa at="" nus.edu.sg=""> Cc: MEDIPS Maintainer Subject: [BioC] How does MEDIPS handle multiple mapping of a read Hi, I was wondering if someone can tell me what happens in MEDIPS a particular read maps to multiple locations on the genome? I have just aligned my MeDIPS-seq data using BWA and only about 40% have a mapping quality of above 30. Does MEDIPS only take into account unique reads and throws out the rest? I looked at the MEDIPS manual (http://www.bioconductor.org/packages/re lease/bioc/vignettes/MEDIPS/inst/doc/MEDIPS.pdf) and can't find any information on this. I read on various forums and many people suggest just using unique reads. However, for ChIP-seq, it is suggested that one could use CSEM so that information from multiple mapping is not completely lost. Thanks in advance for your help. Allen -- output of sessionInfo(): error -- Sent via the guest posting facility at bioconductor.org<http: bioconductor.org="">. _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Important: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately; you should not copy or use it for any purpose, nor disclose its contents to any other person. Thank you.

ADD REPLY • link 11.4 years ago Chong Kim San Allen ▴ 70

0

Entering edit mode

Lukas Chavez ▴ 570

@lukas-chavez-5781

Last seen 7.9 years ago

USA/La Jolla/UCSD

Dear Allen, it is certainly possible to modify the getGRange() according to your specific needs, please see the description of the scanBamFlag constructor in the Rsamtools manual. The getGRange() function is part of the file MEDIPS.readRegionsFile.R. However, I do not know how straight forward it will be to source and override functions in a session. In case you intend to ignore multiple mappers in your downstream analysis, you might want to suppress them in the output of the alignment tool and use MEDIPS as it is. Stacked reads might be artifacts due to PCR amplification, but they can also occur in case of over-sequencing. Ambitious PCR amplification might result in an enrichment of reads at exactly the same position not related to an enrichment of the actual mark of interest (e.g. mC). All the best, Lukas On Wed, Aug 27, 2014 at 10:07 AM, Chong Kim San Allen <patcksa at="" nus.edu.sg=""> wrote: > Dear Lukas and Wade. > Thanks for your replies. It is very much appreciated. > > Lukas, you mentioned that the internal MEDIPS function getGRange() is set > so as to only ignore all unmapped reads and this is done through > scanBamFlag. I was wondering if it is possible for me to modify the > getGRange() function so as to only use unique single-mapped reads or do I > really need to filter my bam file as Wade suggested. > > Wade, thanks for highlighting "uniq=TRUE" for duplicate/stacked reads. I > actually had not considered this would be problem because (correct me if I > am wrong) I thought that the number of reads for a peak was an indicator of > the degree of methylation, much like the quantitation of methylation given > by beta or M values for arrays. So, my question is why would one want to > remove such stacked reads? > > I look forward to hearing to your answers and thank you for your time and > assistance. > > best, > Allen > ________________________________________ > From: Lukas Chavez [lukas.chavez.mailings at googlemail.com] > Sent: Wednesday, August 27, 2014 3:50 AM > To: Davis, Wade > Cc: Allen [guest]; bioconductor at r-project.org; Chong Kim San Allen; > MEDIPS Maintainer > Subject: Re: [BioC] How does MEDIPS handle multiple mapping of a read > > Dear Allen and Wade, > > indeed "multiple mappers" and reads that align to exactly the same > position (sometimes named "clonal" or "stacked" reads) are two different > things and need to be discussed separately. Wade has given a great overview > about "stacked reads" by referring to the uniq=TRUE parameter of the MEDIPS > package, sharing his results with the IDR, and by pointing to a study that > further discusses this topic. Here, I would like to add that some tools > provide the opportunity to estimate a global or local maximum of "stacked" > reads. MEDIPS currently only allows for considering one representative per > genomic position and strand or all reads. It is somewhere on my ToDo list > to add a third option allowing a certain maximum of stacked reads. > > Reads that align to several positions in the genome ("multiple mapper") > are common in MeDIP experiments, because of high abundance of methylation > at repetitive regions. MEDIPS uses functions of the Rsamtools package to > import mapping results from a bam file. Personally, I do not have > experience using bam files including multiple alignments per read, because > I am typically reporting only mapping results for unique mappers. However, > if a bam file contains multiple mappers, all hits will be imported into > MEDIPS. This behavior is determined by the internal MEDIPS function > getGRange(). The relevant line is > > scanFlag = scanBamFlag(isUnmappedQuery = F) > > Here, all unmapped reads are neglected in the scanBamFlag constructor. > Among others, the scanBamFlag constructor has the parameter > isNotPrimaryRead which is NA by default and the parameter is not changed by > MEDIPS. Consequently, all alignments are returned regardless of their > primary status. More information on the primary status of an alignment ( > http://www.bioconductor.org/packages/release/bioc/manuals/Rsamtools/ man/Rsamtools.pdf > ): > > As far as I remember, I previously encountered problems when importing > CSEM alignment results into MEDIPS, but this is a while ago. What you can > always do is to convert your mapping results/ bam file into a plain bed > file, regardless of the presence of multiple alignments per read and > regardless of the appropriate use of flags of any alignment tool. MEDIPS > will then consider each given entry as a mapping hit and calculate the > genomic coverage at genome wide windows accordingly. > > Best regards, > Lukas > > > > On Tue, Aug 26, 2014 at 9:28 AM, Davis, Wade <davisjwa at="" health.missouri.edu=""> <mailto:davisjwa at="" health.missouri.edu="">> wrote: > Hi Allen, > It seems there are two somewhat related issues that are relevant here: > multi-mapped reads (i.e., NOT uniquely MAPPABLE read) and duplicate reads > (i.e., not the only read with a given same start/stop). I like to avoid the > term "unique" by itself because it can be ambiguous as you can see. > > To my knowledge, MEDIPS doesn't address the multi-mapped reads; I think > many would argue that is something best handled by the aligner or afterward > by filtering your bam files based on certain flags. Specifically, it is > more efficient for the aligner to implement whatever you desire (e.g., > spreading them around uniformly or based on some estimated probability) > during alignment than for MEDIPS to do it. > > However, MEDIPS does deal with duplicate reads via the uniq flag, as noted > in the vignette: > "MEDIPS will replace all reads which map to exactly the same start and end > positions on the same strand by only one representative: > > uniq=TRUE" > > See ?MEDIPS.createSet > > Most of what I have read about for ChiP/MeDIP lead us to only count > duplicate reads once (i.e., use uniq=TRUE). We looked at a number of > samples analyzed both ways and looked at the Irreproducible Discovery Rate > (IDR) plots and found better reproducibility using this approach. Another > paper suggests using the duplicates or discarding them depending on the > purpose: > > http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.p cbi.1003326 > > I hope this helps a little, > Wade > > -----Original Message----- > From: Allen [guest] [mailto:guest at bioconductor.org<mailto:> guest at bioconductor.org>] > Sent: Tuesday, August 26, 2014 4:50 AM > To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">; > patcksa at nus.edu.sg<mailto:patcksa at="" nus.edu.sg=""> > Cc: MEDIPS Maintainer > Subject: [BioC] How does MEDIPS handle multiple mapping of a read > > Hi, > I was wondering if someone can tell me what happens in MEDIPS a particular > read maps to multiple locations on the genome? I have just aligned my > MeDIPS-seq data using BWA and only about 40% have a mapping quality of > above 30. Does MEDIPS only take into account unique reads and throws out > the rest? > > I looked at the MEDIPS manual ( > http://www.bioconductor.org/packages/release/bioc/vignettes/MEDIPS/i nst/doc/MEDIPS.pdf) > and can't find any information on this. > > I read on various forums and many people suggest just using unique reads. > However, for ChIP-seq, it is suggested that one could use CSEM so that > information from multiple mapping is not completely lost. > > Thanks in advance for your help. > > Allen > > > > -- output of sessionInfo(): > > error > > -- > Sent via the guest posting facility at bioconductor.org< > http://bioconductor.org>. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > Important: This email is confidential and may be privileged. If you are > not the intended recipient, please delete it and notify us immediately; you > should not copy or use it for any purpose, nor disclose its contents to any > other person. Thank you. > [[alternative HTML version deleted]]

ADD COMMENT • link 11.4 years ago Lukas Chavez ▴ 570

0

Entering edit mode

Dear Lukas Thank you so much for your explanation. I can now understand the problem present by stacked reads. Have a great weekend! Allen ________________________________________ From: Lukas Chavez [lukas.chavez.mailings@googlemail.com] Sent: Thursday, August 28, 2014 6:48 AM To: Chong Kim San Allen Cc: Davis, Wade; Allen [guest]; bioconductor at r-project.org; MEDIPS Maintainer Subject: Re: [BioC] How does MEDIPS handle multiple mapping of a read Dear Allen, it is certainly possible to modify the getGRange() according to your specific needs, please see the description of the scanBamFlag constructor in the Rsamtools manual. The getGRange() function is part of the file MEDIPS.readRegionsFile.R. However, I do not know how straight forward it will be to source and override functions in a session. In case you intend to ignore multiple mappers in your downstream analysis, you might want to suppress them in the output of the alignment tool and use MEDIPS as it is. Stacked reads might be artifacts due to PCR amplification, but they can also occur in case of over-sequencing. Ambitious PCR amplification might result in an enrichment of reads at exactly the same position not related to an enrichment of the actual mark of interest (e.g. mC). All the best, Lukas On Wed, Aug 27, 2014 at 10:07 AM, Chong Kim San Allen <patcksa at="" nus.edu.sg<mailto:patcksa="" at="" nus.edu.sg="">> wrote: Dear Lukas and Wade. Thanks for your replies. It is very much appreciated. Lukas, you mentioned that the internal MEDIPS function getGRange() is set so as to only ignore all unmapped reads and this is done through scanBamFlag. I was wondering if it is possible for me to modify the getGRange() function so as to only use unique single-mapped reads or do I really need to filter my bam file as Wade suggested. Wade, thanks for highlighting "uniq=TRUE" for duplicate/stacked reads. I actually had not considered this would be problem because (correct me if I am wrong) I thought that the number of reads for a peak was an indicator of the degree of methylation, much like the quantitation of methylation given by beta or M values for arrays. So, my question is why would one want to remove such stacked reads? I look forward to hearing to your answers and thank you for your time and assistance. best, Allen ________________________________________ From: Lukas Chavez [lukas.chavez.mailings@googlemail.com<mailto:lukas. chavez.mailings@googlemail.com="">] Sent: Wednesday, August 27, 2014 3:50 AM To: Davis, Wade Cc: Allen [guest]; bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">; Chong Kim San Allen; MEDIPS Maintainer Subject: Re: [BioC] How does MEDIPS handle multiple mapping of a read Dear Allen and Wade, indeed "multiple mappers" and reads that align to exactly the same position (sometimes named "clonal" or "stacked" reads) are two different things and need to be discussed separately. Wade has given a great overview about "stacked reads" by referring to the uniq=TRUE parameter of the MEDIPS package, sharing his results with the IDR, and by pointing to a study that further discusses this topic. Here, I would like to add that some tools provide the opportunity to estimate a global or local maximum of "stacked" reads. MEDIPS currently only allows for considering one representative per genomic position and strand or all reads. It is somewhere on my ToDo list to add a third option allowing a certain maximum of stacked reads. Reads that align to several positions in the genome ("multiple mapper") are common in MeDIP experiments, because of high abundance of methylation at repetitive regions. MEDIPS uses functions of the Rsamtools package to import mapping results from a bam file. Personally, I do not have experience using bam files including multiple alignments per read, because I am typically reporting only mapping results for unique mappers. However, if a bam file contains multiple mappers, all hits will be imported into MEDIPS. This behavior is determined by the internal MEDIPS function getGRange(). The relevant line is scanFlag = scanBamFlag(isUnmappedQuery = F) Here, all unmapped reads are neglected in the scanBamFlag constructor. Among others, the scanBamFlag constructor has the parameter isNotPrimaryRead which is NA by default and the parameter is not changed by MEDIPS. Consequently, all alignments are returned regardless of their primary status. More information on the primary status of an alignment (http://www.bioconductor.org/packages/release/b ioc/manuals/Rsamtools/man/Rsamtools.pdf): As far as I remember, I previously encountered problems when importing CSEM alignment results into MEDIPS, but this is a while ago. What you can always do is to convert your mapping results/ bam file into a plain bed file, regardless of the presence of multiple alignments per read and regardless of the appropriate use of flags of any alignment tool. MEDIPS will then consider each given entry as a mapping hit and calculate the genomic coverage at genome wide windows accordingly. Best regards, Lukas On Tue, Aug 26, 2014 at 9:28 AM, Davis, Wade <davisjwa at="" health.missouri.edu<mailto:davisjwa="" at="" health.missouri.edu=""><mailto:davisjwa at="" health.missouri.edu<mailto:davisjwa="" at="" health.missouri.edu="">>> wrote: Hi Allen, It seems there are two somewhat related issues that are relevant here: multi-mapped reads (i.e., NOT uniquely MAPPABLE read) and duplicate reads (i.e., not the only read with a given same start/stop). I like to avoid the term "unique" by itself because it can be ambiguous as you can see. To my knowledge, MEDIPS doesn't address the multi-mapped reads; I think many would argue that is something best handled by the aligner or afterward by filtering your bam files based on certain flags. Specifically, it is more efficient for the aligner to implement whatever you desire (e.g., spreading them around uniformly or based on some estimated probability) during alignment than for MEDIPS to do it. However, MEDIPS does deal with duplicate reads via the uniq flag, as noted in the vignette: "MEDIPS will replace all reads which map to exactly the same start and end positions on the same strand by only one representative: > uniq=TRUE" See ?MEDIPS.createSet Most of what I have read about for ChiP/MeDIP lead us to only count duplicate reads once (i.e., use uniq=TRUE). We looked at a number of samples analyzed both ways and looked at the Irreproducible Discovery Rate (IDR) plots and found better reproducibility using this approach. Another paper suggests using the duplicates or discarding them depending on the purpose: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcb i.1003326 I hope this helps a little, Wade -----Original Message----- From: Allen [guest] [mailto:guest@bioconductor.org<mailto:guest@biocon ductor.org=""><mailto:guest@bioconductor.org<mailto:guest@bioconductor.or g="">>] Sent: Tuesday, August 26, 2014 4:50 AM To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">>; patcksa at nus.edu.sg<mailto:patcksa at="" nus.edu.sg=""><mailto:patcksa at="" nus.edu.sg<mailto:patcksa="" at="" nus.edu.sg="">> Cc: MEDIPS Maintainer Subject: [BioC] How does MEDIPS handle multiple mapping of a read Hi, I was wondering if someone can tell me what happens in MEDIPS a particular read maps to multiple locations on the genome? I have just aligned my MeDIPS-seq data using BWA and only about 40% have a mapping quality of above 30. Does MEDIPS only take into account unique reads and throws out the rest? I looked at the MEDIPS manual (http://www.bioconductor.org/packages/re lease/bioc/vignettes/MEDIPS/inst/doc/MEDIPS.pdf) and can't find any information on this. I read on various forums and many people suggest just using unique reads. However, for ChIP-seq, it is suggested that one could use CSEM so that information from multiple mapping is not completely lost. Thanks in advance for your help. Allen -- output of sessionInfo(): error -- Sent via the guest posting facility at bioconductor.org<http: bioconductor.org=""><http: bioconductor.org="">. _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Important: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately; you should not copy or use it for any purpose, nor disclose its contents to any other person. Thank you. Important: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately; you should not copy or use it for any purpose, nor disclose its contents to any other person. Thank you.

ADD REPLY • link 11.4 years ago Chong Kim San Allen ▴ 70

Login before adding your answer.