Hi,

I'm currently working with my MeDIP-seq data using MEDIPS in R. I have 18 IP samples (no Input) in 3 groups with PE50. For mapping, I have used bwa for Illumina with default settings in Galaxy.

**Arguments for creating MEDIPS SET: **

BSgenome <- 'BSgenome.Mmusculus.UCSC.mm9'

uniq <- 1e-3

ws <- 500

chr.select <- c("chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19")

paired = TRUE

However, when using MEDIPS.meth to calculate differential coverage between two groups (Control_MeDIP and LPS_MeDIP), I get the following warning:

> mr.edgeR_Control_LPS <- MEDIPS.meth(MSet1 = Control_MeDIP, MSet2 = LPS_MeDIP, chr = chr.select, p.adj = "BH", diff.method = "ttest", MeDIP = F, CNV = F, minRowSum = 10, diffnorm = "rpkm")

Calculating genomic coordinates...

Creating Granges object for genome wide windows...

Preprocessing MEDIPS SET 1 in MSet1...

Preprocessing MEDIPS SET 2 in MSet1...

Preprocessing MEDIPS SET 3 in MSet1...

Preprocessing MEDIPS SET 4 in MSet1...

Preprocessing MEDIPS SET 5 in MSet1...

Preprocessing MEDIPS SET 6 in MSet1...

Preprocessing MEDIPS SET 1 in MSet2...

Preprocessing MEDIPS SET 2 in MSet2...

Preprocessing MEDIPS SET 3 in MSet2...

Preprocessing MEDIPS SET 4 in MSet2...

Preprocessing MEDIPS SET 5 in MSet2...

Preprocessing MEDIPS SET 6 in MSet2...

Extracting data for chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 ...

4944695 windows on chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19

Differential coverage analysis...

Extracting count windows with at least 10 reads...

Calculating score for 573068 windows...

Adjusting p.values for multiple testing...

Please note, log2 ratios are reported as log2(MSet1/MSet2).

Creating results table...

Adding differential coverage results...

**Warning** messages:

1: In err0^4/(na0 - 1) :

longer object length is not a multiple of shorter object length

2: In err1^4/(na1 - 1) :

longer object length is not a multiple of shorter object length

3: In (err0^2 + err1^2)^2/(err0^4/(na0 - 1) + err1^4/(na1 - 1)) :

longer object length is not a multiple of shorter object length

4: In df[fi] <- (err0^2 + err1^2)^2/(err0^4/(na0 - 1) + err1^4/(na1 - :

number of items to replace is not a multiple of replacement length

If I increase minRowSum to 30 or change diff.method to "edgeR" I get rid of the warning, but since I have six replicates in each MEDIPS sets I would like to use "ttest", and I would like to decrease minRowSum to below 30.

Any suggestions why I get this warning, and how I can use "ttest" without having to increase minRowSum to 30?

Thanks!

Stine

**Output from sessionInfo():**

R version 3.3.1 (2016-06-21)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)

locale:

[1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C

[5] LC_TIME=Danish_Denmark.1252

attached base packages:

[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:

[1] BSgenome.Mmusculus.UCSC.mm9_1.4.0 MEDIPS_1.23.2 Rsamtools_1.25.1 BiocInstaller_1.23.6

[5] BSgenome_1.41.2 rtracklayer_1.33.11 Biostrings_2.41.4 XVector_0.13.7

[9] GenomicRanges_1.25.93 GenomeInfoDb_1.9.4 IRanges_2.7.12 S4Vectors_0.11.10

[13] BiocGenerics_0.19.2

loaded via a namespace (and not attached):

[1] AnnotationDbi_1.35.4 DNAcopy_1.47.1 edgeR_3.15.2 zlibbioc_1.19.0

[5] GenomicAlignments_1.9.6 BiocParallel_1.7.6 lattice_0.20-33 tools_3.3.1

[9] SummarizedExperiment_1.3.81 grid_3.3.1 Biobase_2.33.0 DBI_0.5

[13] gtools_3.5.0 preprocessCore_1.35.0 Matrix_1.2-6 bitops_1.0-6

[17] biomaRt_2.29.2 RCurl_1.95-4.8 RSQLite_1.0.0 limma_3.29.17

[21] locfit_1.5-9.1 XML_3.98-1.4

**Output from traceback():**

9: q2qnbinom(y, input.mean = input.mean, output.mean = output.mean,

dispersion = dispersion)

8: equalizeLibSizes.default(y, group = group, dispersion = disp,

lib.size = lib.size)

7: equalizeLibSizes(y, group = group, dispersion = disp, lib.size = lib.size)

6: estimateCommonDisp.default(y$counts, group = group, lib.size = lib.size,

tol = tol, rowsum.filter = rowsum.filter, verbose = verbose,

...)

5: estimateCommonDisp(y$counts, group = group, lib.size = lib.size,

tol = tol, rowsum.filter = rowsum.filter, verbose = verbose,

...)

4: estimateCommonDisp.DGEList(d)

3: edgeR::estimateCommonDisp(d)

2: MEDIPS.diffMeth(base = base, values = counts.medip, diff.method = "edgeR",

nMSets1 = nMSets1, nMSets2 = nMSets2, p.adj = p.adj, n.r.M1 = n.r.M1,

n.r.M2 = n.r.M2, MeDIP = MeDIP, minRowSum = minRowSum, diffnorm = diffnorm)

1: MEDIPS.meth(MSet1 = Control_MeDIP, MSet2 = LPS_MeDIP, chr = chr.select,

p.adj = "BH", diff.method = "edgeR", MeDIP = F, CNV = F,

minRowSum = 30, diffnorm = "none")

Dear Lukas,

Thanks for your reply.

I’ve looked in to it in more details in order to provid the needed information.

The warning comes up after the “Adding differential coverage results…”, and I do get a result table, as I describe in detail here:

If I run the following command:

TEST1MEDIPS.meth(MSet1 = Control_MeDIP, MSet2 = LPS_MeDIP, chr = chr.select, p.adj = "BH",

diff.method = "ttest", MeDIP = F, CNV = F,minRowSum = 10, diffnorm = "rpkm")I get the warning, but I also get a result table, and all the columns are there – score.log2.ratio, score.p.value, score.adj.p.value and score.

573,068 windows are tested, and the lowest score.p.value is 3.2025e-06 giving a score.adj.p.value of 0.9640.

If I instead run:

TEST2MEDIPS.meth(MSet1 = Control_MeDIP, MSet2 = LPS_MeDIP, chr = chr.select, p.adj = "BH",

diff.method = "ttest", MeDIP = F, CNV = F,minRowSum = 30, diffnorm = "rpkm")I avoid the warning, and get a result table with the same columns as in TEST1.

Only 4603 windows are tested, and the lowest score.p.value is 0.002602 giving a score.adj.p.value of 0.7023.

If I then change the method for calculating differential coverage to edgeR instead of ttest, it does not give a warning and result table is made with the columns edgeR.logFC, edgeR.logCPM, edgeR.p.value and edgeR.adj.p.value as expected.:

TEST3:MEDIPS.meth(MSet1 = Control_MeDIP, MSet2 = LPS_MeDIP, chr = chr.select, p.adj = "BH",

diff.method = "edgeR", MeDIP = F, CNV = F,minRowSum = 10, diffnorm = "tmm")However, I have a few question to this one:

1) 4,549,704 windows are tested here, out of a total number of 4,944,695 observations (window size = 500 bp, mm9 genome). I thought the number of tested windows were decided from the minRowSum value, but here I have the same number, 10, as in TEST1 where only 573,068 windows were tested?

2) The lowest edgeR.p.value in TEST3 is 1.149812e-07 giving an edgeR.adj.p.value of 0.5231. The second lowest edgeR.p.value is 5.577571e-07, but this gives an edgeR.adj.p.value of 1!? Can this be correct?

As a note, TEST3 takes much longer to finish than both TEST1 and TEST2, warning or not.

Hope the above information can help clarify why I get the warning.

Best regards,

Stine

Dear Lukas,

I checked the p.values and adjusted p values. As expected the p.values are identical within a window, and the adjusted p values are different using the ttest with different minRowSum.

It makes sense with the different numbers of windows then. Maybe I can increase the minRowSum when using edgeR to lower the number of tested windows.

As I can conclude, it seems like I can use the result table generated using ttest despite the warning.

Thanks for your help!

Best regards,

Stine