Segfault in MEDIPS MEDIPS.methylProfiling() with ROI file
1
0
Entering edit mode
@stephen-turner-4916
Last seen 3.5 years ago
United States
Lukas, Sorry for the late response to this one. I missed your response on the list previously. I checked that the IDs are unique, and I'm still getting the segfault. I've updated the ROI file here: Let me ask you a related question about the MEDIPS workflow. Would it make sense in your opinion to use MEDIPS.methylProfiling() on my MeDIP samples only to get the rms/ams scores for sliding windows or regions of interest for each biological replicate, and export those to a table for downstream analysis of differential methylation in a tool like DESeq/edgeR/limma that can utilize biological replicates? Thanks, Stephen On Thu, Feb 21, 2013 at 4:29 PM, Lukas Chavez <lukas.chavez.mailings at="" googlemail.com=""> wrote: > > Hi Stephen, > > thank you for the additional details about the segfault error. You have > convinced me that your genomic regions appear to be fine. The only thing > that came to my attention is that your IDs are not unique. The > MEDIPS.methylProfiling() function is calling the C function roiprofile.c > which has problems with non-unique IDs for the ROIs. It would be great, if > you can once more try your analysis with unique IDs (e.g. just adding a > counter to each ID)? > > One more thing: the majority of your regions of interest are very large (up > to 2,960,899 nt). In my opinion, a single averaged methylation value for > such a long region is difficult to interpret. MeDIP signals typically vary > in a much smaller resolution along the chromosomes and a single methylation > as well as CpG density value for a long stretch will be probably just an > average of many local enriched regions and many non enriched region. > However, as I do not know what exactly you are analyzing your approach might > be reasonable and I hope we get the segfault fixed. > > As I mentioned previously, the soon to be released update does not depend on > C functions anymore but makes extensive use of the GenomicRanges and other > packages so that the current segfault error will be avoided anyway. > > Thank you, > Lukas > > > On Wed, Feb 20, 2013 at 10:24 AM, Stephen Turner <vustephen at="" gmail.com=""> > wrote: >> >> Lukas, >> >> I checked and there are no positions where start>end position. I'm >> getting my regions of interest by using biomaRt to map Illumina WG6v2 >> identifiers to position and gene name, and using this as my ROI file. >> >> Here's how I'm getting the position information: >> >> ######################### >> library(biomaRt) >> genes <- read.table("genes.txt", T) # genes$illumina contains the wg6v2 id >> mart <- useMart("ensembl", dataset="mmusculus_gene_ensembl") >> listAttributes(mart) >> attributes <- c("illumina_mousewg_6_v2", "chromosome_name", >> "start_position","end_position", "ensembl_gene_id", >> "external_gene_id", "description") >> genes_annotated <- getBM(attributes=attributes, >> filters="illumina_mousewg_6_v2", values=genes$illumina, mart=mart, >> uniqueRows=T) >> roi_file <- subset(genes_annotated, chromosome_name!="Y", >> select=c(chromosome_name, start_position, end_position, >> external_gene_id)) >> roi_file$chromosome_name <- paste("chr", roi_file$chromosome_name, sep="") >> subset(roi_file, end_position<start_position) #returns="" nothing="">> write.table(roi_file, file="ROI_file.txt", row=F, col=F, quote=F, >> sep="\t") >> ######################### >> >> You can get my ROI file here: >> >> >> >> The error it throws suggests region #379 is the problem. That >> corresponds to chr1:135720061-135752232 (ENSMUSG00000026421), which >> isn't outside the length of the chromosome. Not sure what the problem >> is. Here's the error below: >> >> ######################### >> > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET, >> > ROI_file="ROI_file.txt", select=2) >> Preprocessing... >> Reading ROIs... >> Extract data according to given ROI... >> Differential methylation will be calculated on the ROI data set >> Analysed 379 / 2893 >> *** caught segfault *** >> address 0x2ad79ec9ce98, cause 'memory not mapped' >> >> Traceback: >> 1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), >> as.integer(chr_binposition), data1, data2, environment(wilcox.test), >> wilcox.test, environment(var), var, environment(math), math, >> t.test, environment(t.test), as.numeric(factor(chr_names(data1)))) >> 2: withCallingHandlers(expr, warning = function(w) >> invokeRestart("muffleWarning")) >> 3: suppressWarnings(.Call("roiprofile", input, as.numeric(select), >> as.matrix(ROI2), as.integer(chr_binposition), data1, data2, >> environment(wilcox.test), wilcox.test, environment(var), var, >> environment(math), math, t.test, environment(t.test), >> as.numeric(factor(chr_names(data1))))) >> 4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET, >> ROI_file = "ROI_file.txt", select = 2) >> aborting ... >> ######################### >> >> Thanks for any help you can provide! >> >> Stephen >> >> On Tue, Feb 19, 2013 at 3:18 PM, Lukas Chavez >> <lukas.chavez.mailings at="" googlemail.com=""> wrote: >> > >> > Hi Stephen, >> > >> > please excuse my late respond. >> > >> > Indeed, the segfault occurs whenever the currently processed genomic >> > region >> > is of negative length (i.e. the start coordinate is larger than the end >> > coordinate), the coordinates are outside of the length of the >> > chromosome, or >> > the chromosome of the ROI is not represented by the regions used as >> > input >> > for creating the MEDIPS SETs. Typically, the error can be avoided by >> > correcting the ROI file. Although you have already checked problematic >> > genomic regions, I would be more than happy, if you once more check >> > these >> > regions (and also the immediate neighbors) with respect to the >> > constraints I >> > just mentioned. >> > >> > Please let me know, if your ROI file appears to be fine. In this case I >> > will >> > try to back-track the error message. However, please note that I have >> > extensively revised the MEDIPS package (also avoiding segfault errors) >> > which >> > I intend to update as soon as possible, especially in advance of the >> > next >> > Bioconductor release. I strongly recommend to switch to the new version >> > as >> > soon as available. >> > >> > Thank you and all the best, >> > Lukas >> > >> > >> > >> > >> > >> > On Fri, Feb 15, 2013 at 9:23 AM, Stephen Turner <vustephen at="" gmail.com=""> >> > wrote: >> >> >> >> Lukas, and others: >> >> >> >> I'm trying to use the MEDIPS package to look for differentially >> >> methylated regions, supplying a regions of interest file (essentially >> >> a bed file). I was able to successfully run MEDIPS.methylProfiling >> >> supplying the frame_size=500 argument to look for DMRs in 500-bp >> >> windows. Now I'd like to supply my own regions of interest to look for >> >> DMR around genes that are differentially expressed from microarray. >> >> >> >> I get the following segfault: >> >> >> >> ############### >> >> > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET, >> >> > ROI_file="ROI_file.txt", select=2) >> >> Preprocessing... >> >> Reading ROIs... >> >> Extract data according to given ROI... >> >> Differential methylation will be calculated on the ROI data set >> >> Analysed 379 / 2893 >> >> *** caught segfault *** >> >> address 0x2b61ee9d5e98, cause 'memory not mapped' >> >> >> >> Traceback: >> >> 1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), >> >> as.integer(chr_binposition), data1, data2, environment(wilcox.test), >> >> wilcox.test, environment(var), var, environment(math), math, >> >> t.test, environment(t.test), as.numeric(factor(chr_names(data1)))) >> >> 2: withCallingHandlers(expr, warning = function(w) >> >> invokeRestart("muffleWarning")) >> >> 3: suppressWarnings(.Call("roiprofile", input, as.numeric(select), >> >> as.matrix(ROI2), as.integer(chr_binposition), data1, data2, >> >> environment(wilcox.test), wilcox.test, environment(var), var, >> >> environment(math), math, t.test, environment(t.test), >> >> as.numeric(factor(chr_names(data1))))) >> >> 4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET, >> >> ROI_file = "ROI_file.txt", select = 2) >> >> aborting ... >> >> ############### >> >> >> >> I ran this on a machine with 128GB RAM, so I know that wasn't the >> >> problem. It looks like the segfault was happening with line 379 in the >> >> sample above. I went into the regions of interest (ROI) file >> >> containing gene coordinates. Nothing looked weird about this line, but >> >> I deleted it anyway. When re-running, I still get segfaults, just at >> >> different positions. >> >> >> >> Thanks for any insight you might have. >> >> Stephen >> >> >> >> >> >> > sessionInfo() >> >> R version 2.15.2 (2012-10-26) >> >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> >> >> locale: >> >> [1] C >> >> >> >> attached base packages: >> >> [1] stats graphics grDevices utils datasets methods base >> >> >> >> other attached packages: >> >> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.8.0 >> >> [3] BSgenome_1.26.1 Biostrings_2.26.3 >> >> [5] GenomicRanges_1.10.6 IRanges_1.16.4 >> >> [7] BiocGenerics_0.4.0 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] gtools_2.7.0 parallel_2.15.2 stats4_2.15.2 >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > > >
0
Entering edit mode
Lukas Chavez ▴ 570
@lukas-chavez-5781
Last seen 3.9 years ago
USA/La Jolla/UCSD
Hello, please let me grep up your email as an opportunity to announce the immediate availability of a major MEDIPS update. The deeply revised MEDIPS package is available as version 1.9.3 in the devel branch working with the current development version of R (3.0.0 alpha (2013-03-09 r62188) -- "Unsuffered Consequences") and will be available as MEDIPS 1.10.0 with the release of Bioconductor 2.12 at April 4th 2013. I strongly recommend to switch to MEDIPS 1.9.3 or higher (also the described segfault will not occur any more). The new MEDIPS version is capable of processing an arbitrary number of replicates per group (control and treatment) as well as two corresponding groups of Input replicates, if available. MEDIPS will read your bam or bed files and calculates genome wide coverage at genome wide windows (counts and the relative methylation score, if applicable). At the same time, MEDIPS will calculate differential coverage between two groups by employing the sophisticated statistical framework of the edgeR package (if selected). Please note, in order to calculate genome wide differential coverage using edgeR, MEDIPS transfers the raw counts (and not the rms values) to the edgeR environment, because of the discrete character of the underlying negative binomial models. For more information about further functionality of the MEDIPS update, please see the manual that comes with the updated package. I am looking forward to any comments and suggestions. Thank you, Lukas On Thu, Mar 14, 2013 at 8:29 AM, Stephen Turner <vustephen@gmail.com> wrote: > Lukas, > > Sorry for the late response to this one. I missed your response on the > list previously. I checked that the IDs are unique, and I'm still > getting the segfault. I've updated the ROI file here: > > > > Let me ask you a related question about the MEDIPS workflow. Would it > make sense in your opinion to use MEDIPS.methylProfiling() on my MeDIP > samples only to get the rms/ams scores for sliding windows or regions > of interest for each biological replicate, and export those to a table > for downstream analysis of differential methylation in a tool like > DESeq/edgeR/limma that can utilize biological replicates? > > Thanks, > > Stephen > > On Thu, Feb 21, 2013 at 4:29 PM, Lukas Chavez > <lukas.chavez.mailings@googlemail.com> wrote: > > > > Hi Stephen, > > > > thank you for the additional details about the segfault error. You have > > convinced me that your genomic regions appear to be fine. The only thing > > that came to my attention is that your IDs are not unique. The > > MEDIPS.methylProfiling() function is calling the C function roiprofile.c > > which has problems with non-unique IDs for the ROIs. It would be great, > if > > you can once more try your analysis with unique IDs (e.g. just adding a > > counter to each ID)? > > > > One more thing: the majority of your regions of interest are very large > (up > > to 2,960,899 nt). In my opinion, a single averaged methylation value for > > such a long region is difficult to interpret. MeDIP signals typically > vary > > in a much smaller resolution along the chromosomes and a single > methylation > > as well as CpG density value for a long stretch will be probably just an > > average of many local enriched regions and many non enriched region. > > However, as I do not know what exactly you are analyzing your approach > might > > be reasonable and I hope we get the segfault fixed. > > > > As I mentioned previously, the soon to be released update does not > depend on > > C functions anymore but makes extensive use of the GenomicRanges and > other > > packages so that the current segfault error will be avoided anyway. > > > > Thank you, > > Lukas > > > > > > On Wed, Feb 20, 2013 at 10:24 AM, Stephen Turner <vustephen@gmail.com> > > wrote: > >> > >> Lukas, > >> > >> I checked and there are no positions where start>end position. I'm > >> getting my regions of interest by using biomaRt to map Illumina WG6v2 > >> identifiers to position and gene name, and using this as my ROI file. > >> > >> Here's how I'm getting the position information: > >> > >> ######################### > >> library(biomaRt) > >> genes <- read.table("genes.txt", T) # genes$illumina contains the wg6v2 > id > >> mart <- useMart("ensembl", dataset="mmusculus_gene_ensembl") > >> listAttributes(mart) > >> attributes <- c("illumina_mousewg_6_v2", "chromosome_name", > >> "start_position","end_position", "ensembl_gene_id", > >> "external_gene_id", "description") > >> genes_annotated <- getBM(attributes=attributes, > >> filters="illumina_mousewg_6_v2", values=genes$illumina, mart=mart, > >> uniqueRows=T) > >> roi_file <- subset(genes_annotated, chromosome_name!="Y", > >> select=c(chromosome_name, start_position, end_position, > >> external_gene_id)) > >> roi_file$chromosome_name <- paste("chr", roi_file$chromosome_name, > sep="") > >> subset(roi_file, end_position<start_position) #returns="" nothing=""> >> write.table(roi_file, file="ROI_file.txt", row=F, col=F, quote=F, > >> sep="\t") > >> ######################### > >> > >> You can get my ROI file here: > >> > >> > >> > >> The error it throws suggests region #379 is the problem. That > >> corresponds to chr1:135720061-135752232 (ENSMUSG00000026421), which > >> isn't outside the length of the chromosome. Not sure what the problem > >> is. Here's the error below: > >> > >> ######################### > >> > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET, > >> > ROI_file="ROI_file.txt", select=2) > >> Preprocessing... > >> Reading ROIs... > >> Extract data according to given ROI... > >> Differential methylation will be calculated on the ROI data set > >> Analysed 379 / 2893 > >> *** caught segfault *** > >> address 0x2ad79ec9ce98, cause 'memory not mapped' > >> > >> Traceback: > >> 1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), > >> as.integer(chr_binposition), data1, data2, environment(wilcox.test), > >> wilcox.test, environment(var), var, environment(math), math, > >> t.test, environment(t.test), as.numeric(factor(chr_names(data1)))) > >> 2: withCallingHandlers(expr, warning = function(w) > >> invokeRestart("muffleWarning")) > >> 3: suppressWarnings(.Call("roiprofile", input, as.numeric(select), > >> as.matrix(ROI2), as.integer(chr_binposition), data1, data2, > >> environment(wilcox.test), wilcox.test, environment(var), var, > >> environment(math), math, t.test, environment(t.test), > >> as.numeric(factor(chr_names(data1))))) > >> 4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET, > >> ROI_file = "ROI_file.txt", select = 2) > >> aborting ... > >> ######################### > >> > >> Thanks for any help you can provide! > >> > >> Stephen > >> > >> On Tue, Feb 19, 2013 at 3:18 PM, Lukas Chavez > >> <lukas.chavez.mailings@googlemail.com> wrote: > >> > > >> > Hi Stephen, > >> > > >> > please excuse my late respond. > >> > > >> > Indeed, the segfault occurs whenever the currently processed genomic > >> > region > >> > is of negative length (i.e. the start coordinate is larger than the > end > >> > coordinate), the coordinates are outside of the length of the > >> > chromosome, or > >> > the chromosome of the ROI is not represented by the regions used as > >> > input > >> > for creating the MEDIPS SETs. Typically, the error can be avoided by > >> > correcting the ROI file. Although you have already checked problematic > >> > genomic regions, I would be more than happy, if you once more check > >> > these > >> > regions (and also the immediate neighbors) with respect to the > >> > constraints I > >> > just mentioned. > >> > > >> > Please let me know, if your ROI file appears to be fine. In this case > I > >> > will > >> > try to back-track the error message. However, please note that I have > >> > extensively revised the MEDIPS package (also avoiding segfault errors) > >> > which > >> > I intend to update as soon as possible, especially in advance of the > >> > next > >> > Bioconductor release. I strongly recommend to switch to the new > version > >> > as > >> > soon as available. > >> > > >> > Thank you and all the best, > >> > Lukas > >> > > >> > > >> > > >> > > >> > > >> > On Fri, Feb 15, 2013 at 9:23 AM, Stephen Turner <vustephen@gmail.com> > >> > wrote: > >> >> > >> >> Lukas, and others: > >> >> > >> >> I'm trying to use the MEDIPS package to look for differentially > >> >> methylated regions, supplying a regions of interest file (essentially > >> >> a bed file). I was able to successfully run MEDIPS.methylProfiling > >> >> supplying the frame_size=500 argument to look for DMRs in 500-bp > >> >> windows. Now I'd like to supply my own regions of interest to look > for > >> >> DMR around genes that are differentially expressed from microarray. > >> >> > >> >> I get the following segfault: > >> >> > >> >> ############### > >> >> > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, > data2=BPA.SET, > >> >> > ROI_file="ROI_file.txt", select=2) > >> >> Preprocessing... > >> >> Reading ROIs... > >> >> Extract data according to given ROI... > >> >> Differential methylation will be calculated on the ROI data set > >> >> Analysed 379 / 2893 > >> >> *** caught segfault *** > >> >> address 0x2b61ee9d5e98, cause 'memory not mapped' > >> >> > >> >> Traceback: > >> >> 1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), > >> >> as.integer(chr_binposition), data1, data2, environment(wilcox.test), > >> >> wilcox.test, environment(var), var, environment(math), math, > >> >> t.test, environment(t.test), as.numeric(factor(chr_names(data1)))) > >> >> 2: withCallingHandlers(expr, warning = function(w) > >> >> invokeRestart("muffleWarning")) > >> >> 3: suppressWarnings(.Call("roiprofile", input, as.numeric(select), > >> >> as.matrix(ROI2), as.integer(chr_binposition), data1, data2, > >> >> environment(wilcox.test), wilcox.test, environment(var), var, > >> >> environment(math), math, t.test, environment(t.test), > >> >> as.numeric(factor(chr_names(data1))))) > >> >> 4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET, > >> >> ROI_file = "ROI_file.txt", select = 2) > >> >> aborting ... > >> >> ############### > >> >> > >> >> I ran this on a machine with 128GB RAM, so I know that wasn't the > >> >> problem. It looks like the segfault was happening with line 379 in > the > >> >> sample above. I went into the regions of interest (ROI) file > >> >> containing gene coordinates. Nothing looked weird about this line, > but > >> >> I deleted it anyway. When re-running, I still get segfaults, just at > >> >> different positions. > >> >> > >> >> Thanks for any insight you might have. > >> >> Stephen > >> >> > >> >> > >> >> > sessionInfo() > >> >> R version 2.15.2 (2012-10-26) > >> >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> >> > >> >> locale: > >> >> [1] C > >> >> > >> >> attached base packages: > >> >> [1] stats graphics grDevices utils datasets methods base > >> >> > >> >> other attached packages: > >> >> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.8.0 > >> >> [3] BSgenome_1.26.1 Biostrings_2.26.3 > >> >> [5] GenomicRanges_1.10.6 IRanges_1.16.4 > >> >> [7] BiocGenerics_0.4.0 > >> >> > >> >> loaded via a namespace (and not attached): > >> >> [1] gtools_2.7.0 parallel_2.15.2 stats4_2.15.2 > >> >> > >> >> _______________________________________________ > >> >> Bioconductor mailing list > >> >> Bioconductor@r-project.org > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> Search the archives: > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > > > > > > [[alternative HTML version deleted]]