MEDIPS error extending reads
1
0
Entering edit mode
@stephen-turner-4916
Last seen 5.7 years ago
United States
Lukas (and others): I'm using the new version of MEDIPS (1.10.0) and I'm getting an error when trying to use MEDIPS.createSet(): MEDIPS.createSet(file="myalnment.bam", BSgenome="BSgenome.Mmusculus.UCSC.mm9", extend=300, shift=0, uniq=T, window_size=100) Here's the error I get: Reading bam alignment myalnment.bam Total number of imported short reads: 6150717 Extending reads... Error in regions$stop[regions$strand == "+"] = regions$stop[regions$strand == : NAs are not allowed in subscripted assignments Not sure what to do here. Thanks for any help. Stephen > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.10.0 [3] DNAcopy_1.34.0 BSgenome_1.28.0 [5] Biostrings_2.28.0 GenomicRanges_1.12.1 [7] IRanges_1.18.0 BiocGenerics_0.6.0 [9] BiocInstaller_1.10.0 loaded via a namespace (and not attached): [1] RCurl_1.95-4.1 Rsamtools_1.12.0 XML_3.96-1.1 biomaRt_2.16.0 [5] bitops_1.0-5 gtools_2.7.1 stats4_3.0.0 tools_3.0.0 [9] zlibbioc_1.6.0
Alignment BSgenome BSgenome MEDIPS Alignment BSgenome BSgenome MEDIPS • 1.6k views
ADD COMMENT
0
Entering edit mode
Lukas Chavez ▴ 570
@lukas-chavez-5781
Last seen 6.1 years ago
USA/La Jolla/UCSD
Hi Stephen, I was able to reproduce the error. It appears that you have unmapped reads in your bam file which we always filter out during conversion of sam to bam files using samtools view -S -F 4 -b -o out.bam in.sam (single end sequening data). In case you filter your bam files accordingly, the error message should disappear. However, I will add an additional flag scanBamFlag(isUnmappedQuery = F) to the according scanBam() function in dev. Therefore, unmapped reads in a given bam file should be ignored in the future (MEDIPS version >=1.11.1). Lukas [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thanks Lukas. That solved that problem. I now have another problem. I'm trying to look at regions of interest as before. After I've called MEDIPS.meth(), I then try to run: dmr_rois <- MEDIPS.selectROIs(results=dmr_all, rois=roi_file, columns=c("counts"), summarize=TRUE) where roi_file is created using read.csv() on this file of ROIs: I'm now getting the error: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 2602, 0 I'm guessing this has something to do with my ROI file, but I'm having trouble figuring out what to do. I have 2611 ROIs, not 2602. I've double checked that I have no duplicates - these both return 2611: length(unique(roi_file$ID)) length(unique(with(roi_file, paste(chr, start, stop)))) There may be some overlapping regions, but I'm unsure how to best handle this. Could this be causing the error? Thanks, Stephen On Mon, Apr 8, 2013 at 3:58 PM, Lukas Chavez <lukas.chavez.mailings at="" googlemail.com=""> wrote: > > Hi Stephen, > > I was able to reproduce the error. It appears that you have unmapped reads > in your bam file which we always filter out during conversion of sam to bam > files using samtools view -S -F 4 -b -o out.bam in.sam (single end sequening > data). In case you filter your bam files accordingly, the error message > should disappear. However, I will add an additional flag > scanBamFlag(isUnmappedQuery = F) to the according scanBam() function in dev. > Therefore, unmapped reads in a given bam file should be ignored in the > future (MEDIPS version >=1.11.1). > > Lukas
ADD REPLY
0
Entering edit mode
Hi Stephen, we have slightly modified the MEDIPS.selectROIs() function and I forgot to update the manual accordingly, please excuse any inconveniences. It is not possible to simply specify columns=c("counts") any more but concrete column names must be given. In case you still want to summarize all 'count' columns of a result table 'resultTable' you have to specify something like: columns = colnames(resultTable)[grep("counts", colnames(resultTable))] Another example is shown via ?MEDIPS.selectROIs: columns=names(resultTable)[grep("counts|rpkm|logFC",names(resultTable) )] Please let me know, if this works for you. Two more things: 1) In addition to the MEDIPS.createSet() function, there is a MEDIPS.createROIset() function not described in the manual (see also ?MEDIPS.createROIset). In case you want to calculate coverage, rms, and/or differential coverage for selected regions only, you might want to try this function. Here, the coverage is only calculated at the given regions instead of genome wide windows. Please note, normalization, CpG calibration etc. are calculated based on the given regions only and I have not yet investigated a minimal number of genomic regions so that the applied models remain reasonable. 2) Your annotation file contains genomic regions for chrs 1 to 19 plus the X chromosome. Therefore, I assume that this refers to the mouse genome. I have processed some sequencing data (bam files) from mouse mapped against mm9 and specified BSgenome.Mmusculus.UCSC.mm9 in MEDIPS. When specifying the counts parameter of the MEDIPS.selectROIs() as described above (summarize = T), I receive only 2602 entries although there are 2611 genomic regions in your annotation file (as reported in your error message). See below the annotation IDs that are not returned by the MEDIPS.selectROIs() function. I have checked the first three of them and the genomic coordinates are outside of the length of the chromosomes. Your annotations might fit to mm10. I recommend to check, if your annotations and your bam files refer to the same genome version. All the best, Lukas ### chr10 130111841 130112788 Olfr823 --> chr10 length: 129993255 (mm9) chr15 103503298 103530052 Pde1b --> chr15 length: 103494974 (mm9) chr2 181763332 181827797 Myt1 --> chr2 length: 181748087 (mm9) Polr3k Vwa1 Agrn Plekhn1 Frmpd4 CR536618.1 On Tue, Apr 9, 2013 at 7:33 AM, Stephen Turner <vustephen@gmail.com> wrote: > Thanks Lukas. That solved that problem. I now have another problem. > I'm trying to look at regions of interest as before. After I've called > MEDIPS.meth(), I then try to run: > > dmr_rois <- MEDIPS.selectROIs(results=dmr_all, rois=roi_file, > columns=c("counts"), summarize=TRUE) > > where roi_file is created using read.csv() on this file of ROIs: > > > I'm now getting the error: > > Error in data.frame(..., check.names = FALSE) : > arguments imply differing number of rows: 2602, 0 > > I'm guessing this has something to do with my ROI file, but I'm having > trouble figuring out what to do. I have 2611 ROIs, not 2602. I've > double checked that I have no duplicates - these both return 2611: > > length(unique(roi_file$ID)) > length(unique(with(roi_file, paste(chr, start, stop)))) > > There may be some overlapping regions, but I'm unsure how to best > handle this. Could this be causing the error? > > Thanks, > > Stephen > > On Mon, Apr 8, 2013 at 3:58 PM, Lukas Chavez > <lukas.chavez.mailings@googlemail.com> wrote: > > > > Hi Stephen, > > > > I was able to reproduce the error. It appears that you have unmapped > reads > > in your bam file which we always filter out during conversion of sam to > bam > > files using samtools view -S -F 4 -b -o out.bam in.sam (single end > sequening > > data). In case you filter your bam files accordingly, the error message > > should disappear. However, I will add an additional flag > > scanBamFlag(isUnmappedQuery = F) to the according scanBam() function in > dev. > > Therefore, unmapped reads in a given bam file should be ignored in the > > future (MEDIPS version >=1.11.1). > > > > Lukas > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Lukas, Thanks for the quick response. I went back and mapped my IDs using mm9, and MEDIPS.selectROIs() worked without a problem. However, MEDIPS.createROIset() is what I was looking for - to conduct differential methylation on gene regions of interest. I'm able to create my MEDIPSroiSets, but when I run: dmr_rois <- MEDIPS.meth(MSet1=ctlset.roi, MSet2=bpaset.roi, CSet=CSet.roi, p.adj="fdr", diff.method="edgeR", minRowSum=10) I get the following error: Preprocessing MEDIPS SET 1 in MSet1... Calculating calibration curve... Error in if (is.null(max_signal_index) & mean_signal[i] < mean_signal[i - : missing value where TRUE/FALSE needed Not sure what to do here. Looking at my CSet.roi created from the control MEDIPSroiSet above looks like this: > CSet.roi S4 Object of class COUPLINGset ======================================= Sequence pattern: CG Genome: BSgenome.Mmusculus.UCSC.mm9 Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 149517037 152524553 131738871 124076172 129993255 121843856 121257530 120284312 125194864 103494974 98319150 95272651 90772031 61342430 16299 166650296 15902555 Number of sequence pattern in genome: 21342779 Window size: -1 My control MEDIPSroiSet (ctlset.roi) looks like this. I'm wondering if it's the NA's in the seqlengths that are causing the problem: > ctlset.roi [[1]] S4 Object of class MEDIPSset ======================================= Regions file: nounmapped.m54_ctl_fem_medip_r1.bam File path: /.automount/nas8-s.itc.virginia.edu/export/vol46/uvabx/pro jects/connelly/bpa/newmedips Genome: BSgenome.Mmusculus.UCSC.mm9 Number of regions: 18106486 Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 149517037 152524553 131738871 124076172 129993255 121843856 121257530 120284312 125194864 103494974 98319150 95272651 90772031 61342430 16299 166650296 15902555 Number of bins per ROI: 1 Reads extended to: 0 Reads shifted by: 0 Parameter uniq: TRUE ROIs: GRanges with 2508 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> Mrpl15 chr1 [ 4763287, 4775820] * Sntg1 chr1 [ 8351556, 9289959] * A830018L16Rik chr1 [11404186, 11965982] * Sulf1 chr1 [12708626, 12850452] * Prdm14 chr1 [13103538, 13117244] * ... ... ... ... Il13ra2 chrX [143818019, 143863735] * Sh3kbp1 chrX [156065204, 156416001] * Txlng chrX [159216851, 159267386] * Pir chrX [160707303, 160810943] * Frmpd4 chrX [163909241, 165015163] * --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX NA NA NA NA NA NA ... NA NA NA NA NA NA [[2]] S4 Object of class MEDIPSset ======================================= Regions file: nounmapped.m58_ctl_mal_medip_r1.bam File path: /.automount/nas8-s.itc.virginia.edu/export/vol46/uvabx/pro jects/connelly/bpa/newmedips Genome: BSgenome.Mmusculus.UCSC.mm9 Number of regions: 16441823 Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 149517037 152524553 131738871 124076172 129993255 121843856 121257530 120284312 125194864 103494974 98319150 95272651 90772031 61342430 16299 166650296 15902555 Number of bins per ROI: 1 Reads extended to: 0 Reads shifted by: 0 Parameter uniq: TRUE ROIs: GRanges with 2508 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> Mrpl15 chr1 [ 4763287, 4775820] * Sntg1 chr1 [ 8351556, 9289959] * A830018L16Rik chr1 [11404186, 11965982] * Sulf1 chr1 [12708626, 12850452] * Prdm14 chr1 [13103538, 13117244] * ... ... ... ... Il13ra2 chrX [143818019, 143863735] * Sh3kbp1 chrX [156065204, 156416001] * Txlng chrX [159216851, 159267386] * Pir chrX [160707303, 160810943] * Frmpd4 chrX [163909241, 165015163] * --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX NA NA NA NA NA NA ... NA NA NA NA NA NA Thanks for your continuing help! Stephen > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.10.0 [3] DNAcopy_1.34.0 BSgenome_1.28.0 [5] Biostrings_2.28.0 GenomicRanges_1.12.1 [7] IRanges_1.18.0 BiocGenerics_0.6.0 [9] BiocInstaller_1.10.0 loaded via a namespace (and not attached): [1] RCurl_1.95-4.1 Rsamtools_1.12.0 XML_3.96-1.1 biomaRt_2.16.0 [5] bitops_1.0-5 gtools_2.7.1 stats4_3.0.0 tools_3.0.0 [9] zlibbioc_1.6.0 On Wed, Apr 10, 2013 at 11:03 PM, Lukas Chavez <lukas.chavez.mailings at="" googlemail.com=""> wrote: > > Hi Stephen, > > we have slightly modified the MEDIPS.selectROIs() function and I forgot to > update the manual accordingly, please excuse any inconveniences. > > It is not possible to simply specify columns=c("counts") any more but > concrete column names must be given. In case you still want to summarize all > 'count' columns of a result table 'resultTable' you have to specify > something like: > > columns = colnames(resultTable)[grep("counts", colnames(resultTable))] > > Another example is shown via ?MEDIPS.selectROIs: > columns=names(resultTable)[grep("counts|rpkm|logFC",names(resultTabl e))] > > Please let me know, if this works for you. > > Two more things: > 1) In addition to the MEDIPS.createSet() function, there is a > MEDIPS.createROIset() function not described in the manual (see also > ?MEDIPS.createROIset). In case you want to calculate coverage, rms, and/or > differential coverage for selected regions only, you might want to try this > function. Here, the coverage is only calculated at the given regions instead > of genome wide windows. Please note, normalization, CpG calibration etc. are > calculated based on the given regions only and I have not yet investigated a > minimal number of genomic regions so that the applied models remain > reasonable. > > 2) Your annotation file contains genomic regions for chrs 1 to 19 plus the X > chromosome. Therefore, I assume that this refers to the mouse genome. I have > processed some sequencing data (bam files) from mouse mapped against mm9 and > specified BSgenome.Mmusculus.UCSC.mm9 in MEDIPS. When specifying the counts > parameter of the MEDIPS.selectROIs() as described above (summarize = T), I > receive only 2602 entries although there are 2611 genomic regions in your > annotation file (as reported in your error message). See below the > annotation IDs that are not returned by the MEDIPS.selectROIs() function. I > have checked the first three of them and the genomic coordinates are outside > of the length of the chromosomes. Your annotations might fit to mm10. I > recommend to check, if your annotations and your bam files refer to the same > genome version. > > All the best, > Lukas > > ### > chr10 130111841 130112788 Olfr823 --> chr10 length: 129993255 (mm9) > chr15 103503298 103530052 Pde1b --> chr15 length: 103494974 (mm9) > chr2 181763332 181827797 Myt1 --> chr2 length: 181748087 (mm9) > Polr3k > Vwa1 > Agrn > Plekhn1 > Frmpd4 > CR536618.1 > > > On Tue, Apr 9, 2013 at 7:33 AM, Stephen Turner <vustephen at="" gmail.com=""> wrote: >> >> Thanks Lukas. That solved that problem. I now have another problem. >> I'm trying to look at regions of interest as before. After I've called >> MEDIPS.meth(), I then try to run: >> >> dmr_rois <- MEDIPS.selectROIs(results=dmr_all, rois=roi_file, >> columns=c("counts"), summarize=TRUE) >> >> where roi_file is created using read.csv() on this file of ROIs: >> >> >> I'm now getting the error: >> >> Error in data.frame(..., check.names = FALSE) : >> arguments imply differing number of rows: 2602, 0 >> >> I'm guessing this has something to do with my ROI file, but I'm having >> trouble figuring out what to do. I have 2611 ROIs, not 2602. I've >> double checked that I have no duplicates - these both return 2611: >> >> length(unique(roi_file$ID)) >> length(unique(with(roi_file, paste(chr, start, stop)))) >> >> There may be some overlapping regions, but I'm unsure how to best >> handle this. Could this be causing the error? >> >> Thanks, >> >> Stephen >> >> On Mon, Apr 8, 2013 at 3:58 PM, Lukas Chavez >> <lukas.chavez.mailings at="" googlemail.com=""> wrote: >> > >> > Hi Stephen, >> > >> > I was able to reproduce the error. It appears that you have unmapped >> > reads >> > in your bam file which we always filter out during conversion of sam to >> > bam >> > files using samtools view -S -F 4 -b -o out.bam in.sam (single end >> > sequening >> > data). In case you filter your bam files accordingly, the error message >> > should disappear. However, I will add an additional flag >> > scanBamFlag(isUnmappedQuery = F) to the according scanBam() function in >> > dev. >> > Therefore, unmapped reads in a given bam file should be ignored in the >> > future (MEDIPS version >=1.11.1). >> > >> > Lukas > >
ADD REPLY
0
Entering edit mode
Hi Stephen, I am glad to hear that genome wide analysis of your data works fine. I assume that you will be able to calculate differential coverage for you regions of interest (using MEDIPS.createROIset instead of MEDIPS.createSet) by specifying the parameter MeDIP=FALSE in the MEDIPS.meth() function. The NA's in the seqlengths should not be the problem. The error occurs when the calibration curve is calculated. I assume that the amount and characteristic of your selected regions of interest is not suitable for modeling the dependency of CpG density and MeDIP-seq signals as it applies for genome wide small windows. CpG density normalization can be avoided as mentioned above and, thus, no rms values will be calculated. However, differential coverage will still be calculated. Please let me know, if your error message will remain anyway. Nevertheless, let me emphasize again that I have not investigated a necessary minimal and representative number of genomic regions of interest so that the applied models- CpG density normalization and differential coverage- remain reasonable. This becomes a more general question that goes beyond the support I can provide here. All the best, Lukas On Thu, Apr 11, 2013 at 8:08 AM, Stephen Turner <vustephen@gmail.com> wrote: > Lukas, > > Thanks for the quick response. I went back and mapped my IDs using > mm9, and MEDIPS.selectROIs() worked without a problem. > > However, MEDIPS.createROIset() is what I was looking for - to conduct > differential methylation on gene regions of interest. I'm able to > create my MEDIPSroiSets, but when I run: > > dmr_rois <- MEDIPS.meth(MSet1=ctlset.roi, MSet2=bpaset.roi, > CSet=CSet.roi, p.adj="fdr", diff.method="edgeR", minRowSum=10) > > I get the following error: > > Preprocessing MEDIPS SET 1 in MSet1... > Calculating calibration curve... > Error in if (is.null(max_signal_index) & mean_signal[i] < mean_signal[i - > : > missing value where TRUE/FALSE needed > > Not sure what to do here. Looking at my CSet.roi created from the > control MEDIPSroiSet above looks like this: > > > CSet.roi > S4 Object of class COUPLINGset > ======================================= > Sequence pattern: CG > Genome: BSgenome.Mmusculus.UCSC.mm9 > Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 > chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY > Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 > 149517037 152524553 131738871 124076172 129993255 121843856 121257530 > 120284312 125194864 103494974 98319150 95272651 90772031 61342430 > 16299 166650296 15902555 > Number of sequence pattern in genome: 21342779 > Window size: -1 > > My control MEDIPSroiSet (ctlset.roi) looks like this. I'm wondering if > it's the NA's in the seqlengths that are causing the problem: > > > ctlset.roi > [[1]] > S4 Object of class MEDIPSset > ======================================= > Regions file: nounmapped.m54_ctl_fem_medip_r1.bam > File path: /.automount/ > nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/connelly/bpa/new medips > Genome: BSgenome.Mmusculus.UCSC.mm9 > Number of regions: 18106486 > Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 > chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY > Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 > 149517037 152524553 131738871 124076172 129993255 121843856 121257530 > 120284312 125194864 103494974 98319150 95272651 90772031 61342430 > 16299 166650296 15902555 > Number of bins per ROI: 1 > Reads extended to: 0 > Reads shifted by: 0 > Parameter uniq: TRUE > ROIs: > GRanges with 2508 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > Mrpl15 chr1 [ 4763287, 4775820] * > Sntg1 chr1 [ 8351556, 9289959] * > A830018L16Rik chr1 [11404186, 11965982] * > Sulf1 chr1 [12708626, 12850452] * > Prdm14 chr1 [13103538, 13117244] * > ... ... ... ... > Il13ra2 chrX [143818019, 143863735] * > Sh3kbp1 chrX [156065204, 156416001] * > Txlng chrX [159216851, 159267386] * > Pir chrX [160707303, 160810943] * > Frmpd4 chrX [163909241, 165015163] * > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 > chrX > NA NA NA NA NA NA ... NA NA NA NA NA > NA > > [[2]] > S4 Object of class MEDIPSset > ======================================= > Regions file: nounmapped.m58_ctl_mal_medip_r1.bam > File path: /.automount/ > nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/connelly/bpa/new medips > Genome: BSgenome.Mmusculus.UCSC.mm9 > Number of regions: 16441823 > Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 > chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY > Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 > 149517037 152524553 131738871 124076172 129993255 121843856 121257530 > 120284312 125194864 103494974 98319150 95272651 90772031 61342430 > 16299 166650296 15902555 > Number of bins per ROI: 1 > Reads extended to: 0 > Reads shifted by: 0 > Parameter uniq: TRUE > ROIs: > GRanges with 2508 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > Mrpl15 chr1 [ 4763287, 4775820] * > Sntg1 chr1 [ 8351556, 9289959] * > A830018L16Rik chr1 [11404186, 11965982] * > Sulf1 chr1 [12708626, 12850452] * > Prdm14 chr1 [13103538, 13117244] * > ... ... ... ... > Il13ra2 chrX [143818019, 143863735] * > Sh3kbp1 chrX [156065204, 156416001] * > Txlng chrX [159216851, 159267386] * > Pir chrX [160707303, 160810943] * > Frmpd4 chrX [163909241, 165015163] * > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 > chrX > NA NA NA NA NA NA ... NA NA NA NA NA > NA > > Thanks for your continuing help! > > Stephen > > > sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.10.0 > [3] DNAcopy_1.34.0 BSgenome_1.28.0 > [5] Biostrings_2.28.0 GenomicRanges_1.12.1 > [7] IRanges_1.18.0 BiocGenerics_0.6.0 > [9] BiocInstaller_1.10.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.95-4.1 Rsamtools_1.12.0 XML_3.96-1.1 biomaRt_2.16.0 > [5] bitops_1.0-5 gtools_2.7.1 stats4_3.0.0 tools_3.0.0 > [9] zlibbioc_1.6.0 > > On Wed, Apr 10, 2013 at 11:03 PM, Lukas Chavez > <lukas.chavez.mailings@googlemail.com> wrote: > > > > Hi Stephen, > > > > we have slightly modified the MEDIPS.selectROIs() function and I forgot > to > > update the manual accordingly, please excuse any inconveniences. > > > > It is not possible to simply specify columns=c("counts") any more but > > concrete column names must be given. In case you still want to summarize > all > > 'count' columns of a result table 'resultTable' you have to specify > > something like: > > > > columns = colnames(resultTable)[grep("counts", colnames(resultTable))] > > > > Another example is shown via ?MEDIPS.selectROIs: > > columns=names(resultTable)[grep("counts|rpkm|logFC",names(resultTa ble))] > > > > Please let me know, if this works for you. > > > > Two more things: > > 1) In addition to the MEDIPS.createSet() function, there is a > > MEDIPS.createROIset() function not described in the manual (see also > > ?MEDIPS.createROIset). In case you want to calculate coverage, rms, > and/or > > differential coverage for selected regions only, you might want to try > this > > function. Here, the coverage is only calculated at the given regions > instead > > of genome wide windows. Please note, normalization, CpG calibration etc. > are > > calculated based on the given regions only and I have not yet > investigated a > > minimal number of genomic regions so that the applied models remain > > reasonable. > > > > 2) Your annotation file contains genomic regions for chrs 1 to 19 plus > the X > > chromosome. Therefore, I assume that this refers to the mouse genome. I > have > > processed some sequencing data (bam files) from mouse mapped against mm9 > and > > specified BSgenome.Mmusculus.UCSC.mm9 in MEDIPS. When specifying the > counts > > parameter of the MEDIPS.selectROIs() as described above (summarize = T), > I > > receive only 2602 entries although there are 2611 genomic regions in your > > annotation file (as reported in your error message). See below the > > annotation IDs that are not returned by the MEDIPS.selectROIs() > function. I > > have checked the first three of them and the genomic coordinates are > outside > > of the length of the chromosomes. Your annotations might fit to mm10. I > > recommend to check, if your annotations and your bam files refer to the > same > > genome version. > > > > All the best, > > Lukas > > > > ### > > chr10 130111841 130112788 Olfr823 --> chr10 length: 129993255 (mm9) > > chr15 103503298 103530052 Pde1b --> chr15 length: 103494974 (mm9) > > chr2 181763332 181827797 Myt1 --> chr2 length: 181748087 (mm9) > > Polr3k > > Vwa1 > > Agrn > > Plekhn1 > > Frmpd4 > > CR536618.1 > > > > > > On Tue, Apr 9, 2013 at 7:33 AM, Stephen Turner <vustephen@gmail.com> > wrote: > >> > >> Thanks Lukas. That solved that problem. I now have another problem. > >> I'm trying to look at regions of interest as before. After I've called > >> MEDIPS.meth(), I then try to run: > >> > >> dmr_rois <- MEDIPS.selectROIs(results=dmr_all, rois=roi_file, > >> columns=c("counts"), summarize=TRUE) > >> > >> where roi_file is created using read.csv() on this file of ROIs: > >> > >> > >> I'm now getting the error: > >> > >> Error in data.frame(..., check.names = FALSE) : > >> arguments imply differing number of rows: 2602, 0 > >> > >> I'm guessing this has something to do with my ROI file, but I'm having > >> trouble figuring out what to do. I have 2611 ROIs, not 2602. I've > >> double checked that I have no duplicates - these both return 2611: > >> > >> length(unique(roi_file$ID)) > >> length(unique(with(roi_file, paste(chr, start, stop)))) > >> > >> There may be some overlapping regions, but I'm unsure how to best > >> handle this. Could this be causing the error? > >> > >> Thanks, > >> > >> Stephen > >> > >> On Mon, Apr 8, 2013 at 3:58 PM, Lukas Chavez > >> <lukas.chavez.mailings@googlemail.com> wrote: > >> > > >> > Hi Stephen, > >> > > >> > I was able to reproduce the error. It appears that you have unmapped > >> > reads > >> > in your bam file which we always filter out during conversion of sam > to > >> > bam > >> > files using samtools view -S -F 4 -b -o out.bam in.sam (single end > >> > sequening > >> > data). In case you filter your bam files accordingly, the error > message > >> > should disappear. However, I will add an additional flag > >> > scanBamFlag(isUnmappedQuery = F) to the according scanBam() function > in > >> > dev. > >> > Therefore, unmapped reads in a given bam file should be ignored in the > >> > future (MEDIPS version >=1.11.1). > >> > > >> > Lukas > > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Lukas, Thanks for the help and the cautionary note about this design. One more question for you: if I want to test these gene regions in windows (kind of combining the functionality of MEDIPS.meth(..., window_size=500) with a MEDIPSroiSet class, will that window_size argument do what I would expect - calculate DMR in 500bp windows only across the regions of interest? If not, would a reasonable strategy be to filter out alignments that are not in my regions of interest, then running MEDIPS.meth() with a minRowSum and window_size argument? Thanks, Stephen On Thu, Apr 11, 2013 at 2:35 PM, Lukas Chavez <lukas.chavez.mailings at="" googlemail.com=""> wrote: > Hi Stephen, > > I am glad to hear that genome wide analysis of your data works fine. > > I assume that you will be able to calculate differential coverage for you > regions of interest (using MEDIPS.createROIset instead of MEDIPS.createSet) > by specifying the parameter MeDIP=FALSE in the MEDIPS.meth() function. > > The NA's in the seqlengths should not be the problem. The error occurs when > the calibration curve is calculated. I assume that the amount and > characteristic of your selected regions of interest is not suitable for > modeling the dependency of CpG density and MeDIP-seq signals as it applies > for genome wide small windows. CpG density normalization can be avoided as > mentioned above and, thus, no rms values will be calculated. However, > differential coverage will still be calculated. Please let me know, if your > error message will remain anyway. > > Nevertheless, let me emphasize again that I have not investigated a > necessary minimal and representative number of genomic regions of interest > so that the applied models- CpG density normalization and differential > coverage- remain reasonable. This becomes a more general question that goes > beyond the support I can provide here. > > All the best, > Lukas > > > > > On Thu, Apr 11, 2013 at 8:08 AM, Stephen Turner <vustephen at="" gmail.com=""> wrote: >> >> Lukas, >> >> Thanks for the quick response. I went back and mapped my IDs using >> mm9, and MEDIPS.selectROIs() worked without a problem. >> >> However, MEDIPS.createROIset() is what I was looking for - to conduct >> differential methylation on gene regions of interest. I'm able to >> create my MEDIPSroiSets, but when I run: >> >> dmr_rois <- MEDIPS.meth(MSet1=ctlset.roi, MSet2=bpaset.roi, >> CSet=CSet.roi, p.adj="fdr", diff.method="edgeR", minRowSum=10) >> >> I get the following error: >> >> Preprocessing MEDIPS SET 1 in MSet1... >> Calculating calibration curve... >> Error in if (is.null(max_signal_index) & mean_signal[i] < mean_signal[i - >> : >> missing value where TRUE/FALSE needed >> >> Not sure what to do here. Looking at my CSet.roi created from the >> control MEDIPSroiSet above looks like this: >> >> > CSet.roi >> S4 Object of class COUPLINGset >> ======================================= >> Sequence pattern: CG >> Genome: BSgenome.Mmusculus.UCSC.mm9 >> Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 >> chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY >> Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 >> 149517037 152524553 131738871 124076172 129993255 121843856 121257530 >> 120284312 125194864 103494974 98319150 95272651 90772031 61342430 >> 16299 166650296 15902555 >> Number of sequence pattern in genome: 21342779 >> Window size: -1 >> >> My control MEDIPSroiSet (ctlset.roi) looks like this. I'm wondering if >> it's the NA's in the seqlengths that are causing the problem: >> >> > ctlset.roi >> [[1]] >> S4 Object of class MEDIPSset >> ======================================= >> Regions file: nounmapped.m54_ctl_fem_medip_r1.bam >> File path: >> /.automount/nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/con nelly/bpa/newmedips >> Genome: BSgenome.Mmusculus.UCSC.mm9 >> Number of regions: 18106486 >> Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 >> chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY >> Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 >> 149517037 152524553 131738871 124076172 129993255 121843856 121257530 >> 120284312 125194864 103494974 98319150 95272651 90772031 61342430 >> 16299 166650296 15902555 >> Number of bins per ROI: 1 >> Reads extended to: 0 >> Reads shifted by: 0 >> Parameter uniq: TRUE >> ROIs: >> GRanges with 2508 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> Mrpl15 chr1 [ 4763287, 4775820] * >> Sntg1 chr1 [ 8351556, 9289959] * >> A830018L16Rik chr1 [11404186, 11965982] * >> Sulf1 chr1 [12708626, 12850452] * >> Prdm14 chr1 [13103538, 13117244] * >> ... ... ... ... >> Il13ra2 chrX [143818019, 143863735] * >> Sh3kbp1 chrX [156065204, 156416001] * >> Txlng chrX [159216851, 159267386] * >> Pir chrX [160707303, 160810943] * >> Frmpd4 chrX [163909241, 165015163] * >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 >> chrX >> NA NA NA NA NA NA ... NA NA NA NA NA >> NA >> >> [[2]] >> S4 Object of class MEDIPSset >> ======================================= >> Regions file: nounmapped.m58_ctl_mal_medip_r1.bam >> File path: >> /.automount/nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/con nelly/bpa/newmedips >> Genome: BSgenome.Mmusculus.UCSC.mm9 >> Number of regions: 16441823 >> Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 >> chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY >> Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 >> 149517037 152524553 131738871 124076172 129993255 121843856 121257530 >> 120284312 125194864 103494974 98319150 95272651 90772031 61342430 >> 16299 166650296 15902555 >> Number of bins per ROI: 1 >> Reads extended to: 0 >> Reads shifted by: 0 >> Parameter uniq: TRUE >> ROIs: >> GRanges with 2508 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> Mrpl15 chr1 [ 4763287, 4775820] * >> Sntg1 chr1 [ 8351556, 9289959] * >> A830018L16Rik chr1 [11404186, 11965982] * >> Sulf1 chr1 [12708626, 12850452] * >> Prdm14 chr1 [13103538, 13117244] * >> ... ... ... ... >> Il13ra2 chrX [143818019, 143863735] * >> Sh3kbp1 chrX [156065204, 156416001] * >> Txlng chrX [159216851, 159267386] * >> Pir chrX [160707303, 160810943] * >> Frmpd4 chrX [163909241, 165015163] * >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 >> chrX >> NA NA NA NA NA NA ... NA NA NA NA NA >> NA >> >> Thanks for your continuing help! >> >> Stephen >> >> > sessionInfo() >> R version 3.0.0 (2013-04-03) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.10.0 >> [3] DNAcopy_1.34.0 BSgenome_1.28.0 >> [5] Biostrings_2.28.0 GenomicRanges_1.12.1 >> [7] IRanges_1.18.0 BiocGenerics_0.6.0 >> [9] BiocInstaller_1.10.0 >> >> loaded via a namespace (and not attached): >> [1] RCurl_1.95-4.1 Rsamtools_1.12.0 XML_3.96-1.1 biomaRt_2.16.0 >> [5] bitops_1.0-5 gtools_2.7.1 stats4_3.0.0 tools_3.0.0 >> [9] zlibbioc_1.6.0 >> >> On Wed, Apr 10, 2013 at 11:03 PM, Lukas Chavez >> <lukas.chavez.mailings at="" googlemail.com=""> wrote: >> > >> > Hi Stephen, >> > >> > we have slightly modified the MEDIPS.selectROIs() function and I forgot >> > to >> > update the manual accordingly, please excuse any inconveniences. >> > >> > It is not possible to simply specify columns=c("counts") any more but >> > concrete column names must be given. In case you still want to summarize >> > all >> > 'count' columns of a result table 'resultTable' you have to specify >> > something like: >> > >> > columns = colnames(resultTable)[grep("counts", colnames(resultTable))] >> > >> > Another example is shown via ?MEDIPS.selectROIs: >> > columns=names(resultTable)[grep("counts|rpkm|logFC",names(resultT able))] >> > >> > Please let me know, if this works for you. >> > >> > Two more things: >> > 1) In addition to the MEDIPS.createSet() function, there is a >> > MEDIPS.createROIset() function not described in the manual (see also >> > ?MEDIPS.createROIset). In case you want to calculate coverage, rms, >> > and/or >> > differential coverage for selected regions only, you might want to try >> > this >> > function. Here, the coverage is only calculated at the given regions >> > instead >> > of genome wide windows. Please note, normalization, CpG calibration etc. >> > are >> > calculated based on the given regions only and I have not yet >> > investigated a >> > minimal number of genomic regions so that the applied models remain >> > reasonable. >> > >> > 2) Your annotation file contains genomic regions for chrs 1 to 19 plus >> > the X >> > chromosome. Therefore, I assume that this refers to the mouse genome. I >> > have >> > processed some sequencing data (bam files) from mouse mapped against mm9 >> > and >> > specified BSgenome.Mmusculus.UCSC.mm9 in MEDIPS. When specifying the >> > counts >> > parameter of the MEDIPS.selectROIs() as described above (summarize = T), >> > I >> > receive only 2602 entries although there are 2611 genomic regions in >> > your >> > annotation file (as reported in your error message). See below the >> > annotation IDs that are not returned by the MEDIPS.selectROIs() >> > function. I >> > have checked the first three of them and the genomic coordinates are >> > outside >> > of the length of the chromosomes. Your annotations might fit to mm10. I >> > recommend to check, if your annotations and your bam files refer to the >> > same >> > genome version. >> > >> > All the best, >> > Lukas >> > >> > ### >> > chr10 130111841 130112788 Olfr823 --> chr10 length: 129993255 (mm9) >> > chr15 103503298 103530052 Pde1b --> chr15 length: 103494974 (mm9) >> > chr2 181763332 181827797 Myt1 --> chr2 length: 181748087 (mm9) >> > Polr3k >> > Vwa1 >> > Agrn >> > Plekhn1 >> > Frmpd4 >> > CR536618.1 >> > >> > >> > On Tue, Apr 9, 2013 at 7:33 AM, Stephen Turner <vustephen at="" gmail.com=""> >> > wrote: >> >> >> >> Thanks Lukas. That solved that problem. I now have another problem. >> >> I'm trying to look at regions of interest as before. After I've called >> >> MEDIPS.meth(), I then try to run: >> >> >> >> dmr_rois <- MEDIPS.selectROIs(results=dmr_all, rois=roi_file, >> >> columns=c("counts"), summarize=TRUE) >> >> >> >> where roi_file is created using read.csv() on this file of ROIs: >> >> >> >> >> >> I'm now getting the error: >> >> >> >> Error in data.frame(..., check.names = FALSE) : >> >> arguments imply differing number of rows: 2602, 0 >> >> >> >> I'm guessing this has something to do with my ROI file, but I'm having >> >> trouble figuring out what to do. I have 2611 ROIs, not 2602. I've >> >> double checked that I have no duplicates - these both return 2611: >> >> >> >> length(unique(roi_file$ID)) >> >> length(unique(with(roi_file, paste(chr, start, stop)))) >> >> >> >> There may be some overlapping regions, but I'm unsure how to best >> >> handle this. Could this be causing the error? >> >> >> >> Thanks, >> >> >> >> Stephen >> >> >> >> On Mon, Apr 8, 2013 at 3:58 PM, Lukas Chavez >> >> <lukas.chavez.mailings at="" googlemail.com=""> wrote: >> >> > >> >> > Hi Stephen, >> >> > >> >> > I was able to reproduce the error. It appears that you have unmapped >> >> > reads >> >> > in your bam file which we always filter out during conversion of sam >> >> > to >> >> > bam >> >> > files using samtools view -S -F 4 -b -o out.bam in.sam (single end >> >> > sequening >> >> > data). In case you filter your bam files accordingly, the error >> >> > message >> >> > should disappear. However, I will add an additional flag >> >> > scanBamFlag(isUnmappedQuery = F) to the according scanBam() function >> >> > in >> >> > dev. >> >> > Therefore, unmapped reads in a given bam file should be ignored in >> >> > the >> >> > future (MEDIPS version >=1.11.1). >> >> > >> >> > Lukas >> > >> > > >
ADD REPLY
0
Entering edit mode
Hi Stephen, the MEDIPS.createROIset() function and the according MEDIPSroiSet class do not have a window_size parameter like the MEDIPS.createSet() function. Instead, there is the parameter bn (default 1) which stands for 'bin number'. Each given ROI will be divided into an according number of bins and coverage/ rms/ differential coverage will be calculated at these bins when the MEDIPS.meth() function is applied. The resulting length per bin depends on the length of the ROI and on the given parameter bn. >If not, would a reasonable strategy be to filter out alignments that are not in my regions of interest, then running MEDIPS.meth() with a minRowSum and window_size argument? This approach will reduce the runtime for calculating differential coverage, because all the windows without any data (according to minRowSum) will not be tested. In addition, there will be an according impact on the adjusted p-value, because of the reduced number of performed tests. The result table will still be genome wide. I think I prefer the MEDIPSroiSet over this approach. However, why don't you calculate genome wide coverage / differential coverage at small windows and test afterwards, if you can connect regions with significant differential coverage to your regions of interest? Lukas On Mon, Apr 15, 2013 at 11:49 AM, Stephen Turner <vustephen@gmail.com>wrote: > Lukas, > > Thanks for the help and the cautionary note about this design. > > One more question for you: if I want to test these gene regions in > windows (kind of combining the functionality of MEDIPS.meth(..., > window_size=500) with a MEDIPSroiSet class, will that window_size > argument do what I would expect - calculate DMR in 500bp windows only > across the regions of interest? > > If not, would a reasonable strategy be to filter out alignments that > are not in my regions of interest, then running MEDIPS.meth() with a > minRowSum and window_size argument? > > Thanks, > > Stephen > > On Thu, Apr 11, 2013 at 2:35 PM, Lukas Chavez > <lukas.chavez.mailings@googlemail.com> wrote: > > Hi Stephen, > > > > I am glad to hear that genome wide analysis of your data works fine. > > > > I assume that you will be able to calculate differential coverage for you > > regions of interest (using MEDIPS.createROIset instead of > MEDIPS.createSet) > > by specifying the parameter MeDIP=FALSE in the MEDIPS.meth() function. > > > > The NA's in the seqlengths should not be the problem. The error occurs > when > > the calibration curve is calculated. I assume that the amount and > > characteristic of your selected regions of interest is not suitable for > > modeling the dependency of CpG density and MeDIP-seq signals as it > applies > > for genome wide small windows. CpG density normalization can be avoided > as > > mentioned above and, thus, no rms values will be calculated. However, > > differential coverage will still be calculated. Please let me know, if > your > > error message will remain anyway. > > > > Nevertheless, let me emphasize again that I have not investigated a > > necessary minimal and representative number of genomic regions of > interest > > so that the applied models- CpG density normalization and differential > > coverage- remain reasonable. This becomes a more general question that > goes > > beyond the support I can provide here. > > > > All the best, > > Lukas > > > > > > > > > > On Thu, Apr 11, 2013 at 8:08 AM, Stephen Turner <vustephen@gmail.com> > wrote: > >> > >> Lukas, > >> > >> Thanks for the quick response. I went back and mapped my IDs using > >> mm9, and MEDIPS.selectROIs() worked without a problem. > >> > >> However, MEDIPS.createROIset() is what I was looking for - to conduct > >> differential methylation on gene regions of interest. I'm able to > >> create my MEDIPSroiSets, but when I run: > >> > >> dmr_rois <- MEDIPS.meth(MSet1=ctlset.roi, MSet2=bpaset.roi, > >> CSet=CSet.roi, p.adj="fdr", diff.method="edgeR", minRowSum=10) > >> > >> I get the following error: > >> > >> Preprocessing MEDIPS SET 1 in MSet1... > >> Calculating calibration curve... > >> Error in if (is.null(max_signal_index) & mean_signal[i] < mean_signal[i > - > >> : > >> missing value where TRUE/FALSE needed > >> > >> Not sure what to do here. Looking at my CSet.roi created from the > >> control MEDIPSroiSet above looks like this: > >> > >> > CSet.roi > >> S4 Object of class COUPLINGset > >> ======================================= > >> Sequence pattern: CG > >> Genome: BSgenome.Mmusculus.UCSC.mm9 > >> Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 > >> chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY > >> Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 > >> 149517037 152524553 131738871 124076172 129993255 121843856 121257530 > >> 120284312 125194864 103494974 98319150 95272651 90772031 61342430 > >> 16299 166650296 15902555 > >> Number of sequence pattern in genome: 21342779 > >> Window size: -1 > >> > >> My control MEDIPSroiSet (ctlset.roi) looks like this. I'm wondering if > >> it's the NA's in the seqlengths that are causing the problem: > >> > >> > ctlset.roi > >> [[1]] > >> S4 Object of class MEDIPSset > >> ======================================= > >> Regions file: nounmapped.m54_ctl_fem_medip_r1.bam > >> File path: > >> /.automount/ > nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/connelly/bpa/new medips > >> Genome: BSgenome.Mmusculus.UCSC.mm9 > >> Number of regions: 18106486 > >> Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 > >> chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY > >> Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 > >> 149517037 152524553 131738871 124076172 129993255 121843856 121257530 > >> 120284312 125194864 103494974 98319150 95272651 90772031 61342430 > >> 16299 166650296 15902555 > >> Number of bins per ROI: 1 > >> Reads extended to: 0 > >> Reads shifted by: 0 > >> Parameter uniq: TRUE > >> ROIs: > >> GRanges with 2508 ranges and 0 metadata columns: > >> seqnames ranges strand > >> <rle> <iranges> <rle> > >> Mrpl15 chr1 [ 4763287, 4775820] * > >> Sntg1 chr1 [ 8351556, 9289959] * > >> A830018L16Rik chr1 [11404186, 11965982] * > >> Sulf1 chr1 [12708626, 12850452] * > >> Prdm14 chr1 [13103538, 13117244] * > >> ... ... ... ... > >> Il13ra2 chrX [143818019, 143863735] * > >> Sh3kbp1 chrX [156065204, 156416001] * > >> Txlng chrX [159216851, 159267386] * > >> Pir chrX [160707303, 160810943] * > >> Frmpd4 chrX [163909241, 165015163] * > >> --- > >> seqlengths: > >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 > >> chrX > >> NA NA NA NA NA NA ... NA NA NA NA NA > >> NA > >> > >> [[2]] > >> S4 Object of class MEDIPSset > >> ======================================= > >> Regions file: nounmapped.m58_ctl_mal_medip_r1.bam > >> File path: > >> /.automount/ > nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/connelly/bpa/new medips > >> Genome: BSgenome.Mmusculus.UCSC.mm9 > >> Number of regions: 16441823 > >> Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 > >> chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY > >> Chromosome lengths: 197195432 181748087 159599783 155630120 152537259 > >> 149517037 152524553 131738871 124076172 129993255 121843856 121257530 > >> 120284312 125194864 103494974 98319150 95272651 90772031 61342430 > >> 16299 166650296 15902555 > >> Number of bins per ROI: 1 > >> Reads extended to: 0 > >> Reads shifted by: 0 > >> Parameter uniq: TRUE > >> ROIs: > >> GRanges with 2508 ranges and 0 metadata columns: > >> seqnames ranges strand > >> <rle> <iranges> <rle> > >> Mrpl15 chr1 [ 4763287, 4775820] * > >> Sntg1 chr1 [ 8351556, 9289959] * > >> A830018L16Rik chr1 [11404186, 11965982] * > >> Sulf1 chr1 [12708626, 12850452] * > >> Prdm14 chr1 [13103538, 13117244] * > >> ... ... ... ... > >> Il13ra2 chrX [143818019, 143863735] * > >> Sh3kbp1 chrX [156065204, 156416001] * > >> Txlng chrX [159216851, 159267386] * > >> Pir chrX [160707303, 160810943] * > >> Frmpd4 chrX [163909241, 165015163] * > >> --- > >> seqlengths: > >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 > >> chrX > >> NA NA NA NA NA NA ... NA NA NA NA NA > >> NA > >> > >> Thanks for your continuing help! > >> > >> Stephen > >> > >> > sessionInfo() > >> R version 3.0.0 (2013-04-03) > >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> > >> locale: > >> [1] C > >> > >> attached base packages: > >> [1] parallel stats graphics grDevices utils datasets methods > >> [8] base > >> > >> other attached packages: > >> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.10.0 > >> [3] DNAcopy_1.34.0 BSgenome_1.28.0 > >> [5] Biostrings_2.28.0 GenomicRanges_1.12.1 > >> [7] IRanges_1.18.0 BiocGenerics_0.6.0 > >> [9] BiocInstaller_1.10.0 > >> > >> loaded via a namespace (and not attached): > >> [1] RCurl_1.95-4.1 Rsamtools_1.12.0 XML_3.96-1.1 biomaRt_2.16.0 > >> [5] bitops_1.0-5 gtools_2.7.1 stats4_3.0.0 tools_3.0.0 > >> [9] zlibbioc_1.6.0 > >> > >> On Wed, Apr 10, 2013 at 11:03 PM, Lukas Chavez > >> <lukas.chavez.mailings@googlemail.com> wrote: > >> > > >> > Hi Stephen, > >> > > >> > we have slightly modified the MEDIPS.selectROIs() function and I > forgot > >> > to > >> > update the manual accordingly, please excuse any inconveniences. > >> > > >> > It is not possible to simply specify columns=c("counts") any more but > >> > concrete column names must be given. In case you still want to > summarize > >> > all > >> > 'count' columns of a result table 'resultTable' you have to specify > >> > something like: > >> > > >> > columns = colnames(resultTable)[grep("counts", colnames(resultTable))] > >> > > >> > Another example is shown via ?MEDIPS.selectROIs: > >> > > columns=names(resultTable)[grep("counts|rpkm|logFC",names(resultTabl e))] > >> > > >> > Please let me know, if this works for you. > >> > > >> > Two more things: > >> > 1) In addition to the MEDIPS.createSet() function, there is a > >> > MEDIPS.createROIset() function not described in the manual (see also > >> > ?MEDIPS.createROIset). In case you want to calculate coverage, rms, > >> > and/or > >> > differential coverage for selected regions only, you might want to try > >> > this > >> > function. Here, the coverage is only calculated at the given regions > >> > instead > >> > of genome wide windows. Please note, normalization, CpG calibration > etc. > >> > are > >> > calculated based on the given regions only and I have not yet > >> > investigated a > >> > minimal number of genomic regions so that the applied models remain > >> > reasonable. > >> > > >> > 2) Your annotation file contains genomic regions for chrs 1 to 19 plus > >> > the X > >> > chromosome. Therefore, I assume that this refers to the mouse genome. > I > >> > have > >> > processed some sequencing data (bam files) from mouse mapped against > mm9 > >> > and > >> > specified BSgenome.Mmusculus.UCSC.mm9 in MEDIPS. When specifying the > >> > counts > >> > parameter of the MEDIPS.selectROIs() as described above (summarize = > T), > >> > I > >> > receive only 2602 entries although there are 2611 genomic regions in > >> > your > >> > annotation file (as reported in your error message). See below the > >> > annotation IDs that are not returned by the MEDIPS.selectROIs() > >> > function. I > >> > have checked the first three of them and the genomic coordinates are > >> > outside > >> > of the length of the chromosomes. Your annotations might fit to mm10. > I > >> > recommend to check, if your annotations and your bam files refer to > the > >> > same > >> > genome version. > >> > > >> > All the best, > >> > Lukas > >> > > >> > ### > >> > chr10 130111841 130112788 Olfr823 --> chr10 length: 129993255 (mm9) > >> > chr15 103503298 103530052 Pde1b --> chr15 length: 103494974 (mm9) > >> > chr2 181763332 181827797 Myt1 --> chr2 length: 181748087 (mm9) > >> > Polr3k > >> > Vwa1 > >> > Agrn > >> > Plekhn1 > >> > Frmpd4 > >> > CR536618.1 > >> > > >> > > >> > On Tue, Apr 9, 2013 at 7:33 AM, Stephen Turner <vustephen@gmail.com> > >> > wrote: > >> >> > >> >> Thanks Lukas. That solved that problem. I now have another problem. > >> >> I'm trying to look at regions of interest as before. After I've > called > >> >> MEDIPS.meth(), I then try to run: > >> >> > >> >> dmr_rois <- MEDIPS.selectROIs(results=dmr_all, rois=roi_file, > >> >> columns=c("counts"), summarize=TRUE) > >> >> > >> >> where roi_file is created using read.csv() on this file of ROIs: > >> >> > >> >> > >> >> I'm now getting the error: > >> >> > >> >> Error in data.frame(..., check.names = FALSE) : > >> >> arguments imply differing number of rows: 2602, 0 > >> >> > >> >> I'm guessing this has something to do with my ROI file, but I'm > having > >> >> trouble figuring out what to do. I have 2611 ROIs, not 2602. I've > >> >> double checked that I have no duplicates - these both return 2611: > >> >> > >> >> length(unique(roi_file$ID)) > >> >> length(unique(with(roi_file, paste(chr, start, stop)))) > >> >> > >> >> There may be some overlapping regions, but I'm unsure how to best > >> >> handle this. Could this be causing the error? > >> >> > >> >> Thanks, > >> >> > >> >> Stephen > >> >> > >> >> On Mon, Apr 8, 2013 at 3:58 PM, Lukas Chavez > >> >> <lukas.chavez.mailings@googlemail.com> wrote: > >> >> > > >> >> > Hi Stephen, > >> >> > > >> >> > I was able to reproduce the error. It appears that you have > unmapped > >> >> > reads > >> >> > in your bam file which we always filter out during conversion of > sam > >> >> > to > >> >> > bam > >> >> > files using samtools view -S -F 4 -b -o out.bam in.sam (single end > >> >> > sequening > >> >> > data). In case you filter your bam files accordingly, the error > >> >> > message > >> >> > should disappear. However, I will add an additional flag > >> >> > scanBamFlag(isUnmappedQuery = F) to the according scanBam() > function > >> >> > in > >> >> > dev. > >> >> > Therefore, unmapped reads in a given bam file should be ignored in > >> >> > the > >> >> > future (MEDIPS version >=1.11.1). > >> >> > > >> >> > Lukas > >> > > >> > > > > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6