I ran following sets of commands for getting differential methylation profiles & CNVs :
library(MEDIPS)
library(BSgenome.Mmusculus.UCSC.mm9)
BSgenome="BSgenome.Mmusculus.UCSC.mm9"
uniq=1e-3
extend=300
shift=0
ws=100
WT_Input = MEDIPS.createSet(file = "/Medip-seq_project/Alignment_BAMs_and_BEDs/3-WT-Input_S3.bam", BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select=NULL, paired = T)
WT_5mC = MEDIPS.createSet(file = "/Medip-seq_project/Alignment_BAMs_and_BEDs/1-WT-5mC_S1.bam", BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select=NULL, paired = T)
LshKO_Input = MEDIPS.createSet(file = "/Medip-seq_project/Alignment_BAMs_and_BEDs/4-LshKO-Input_S4.bam", BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select=NULL, paired = T)
LshKO_5mC = MEDIPS.createSet(file = "/Medip-seq_project/Alignment_BAMs_and_BEDs/2-LshKO-5mC_S2.bam", BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select=NULL, paired = T) #chr.select=NULL doesn't affect further steps.
CS1 = MEDIPS.couplingVector(pattern = "CG", refObj = WT_Input)
mr.edgeR = MEDIPS.meth(MSet1 = WT_5mC, MSet2 = LshKO_5mC, CSet = CS1, ISet1 = WT_Input , ISet2 = LshKO_Input, p.adj = "bonferroni", diff.method = "edgeR", MeDIP = T, CNV = T, minRowSum = 10)
I got this error :
Creating results table...
Adding differential coverage results...
Adding CNV results...
Error in cnv.combined[i, 1]:cnv.combined[i, 2] : NA/NaN argument
In addition: Warning messages:
1: In MEDIPS.rms(MSet1[[i]], CSet, ccObj = ccObj) : NaNs produced
2: In MEDIPS.rms(MSet2[[i]], CSet, ccObj = ccObj) : NaNs produced
What does this error means? If I run function MEDIPS.meth with "CNV = F" jsut to get differntial methylation profiles then it's coming fine. But how to get CNVs?
Please help me out. Thank you :)
Hi Matthias,
Thanks a lot. It worked the way you told me. Sub-setting the chromosome set using -> chr.select=paste0("chr",c(1:19,"X",Y")) was the thing that was needed. Now MEDIPS.meth is also working with 'CNV = T'.
Can MEDIPS be used to get CNVs for one MSet say MSet1 ? :
mr.edgeR_LshKO = MEDIPS.meth(MSet1 = LshKO_5mC, CSet = CS1, ISet1 = LshKO_Input, p.adj = "bonferroni", diff.method = "edgeR", MeDIP = T, CNV = T, minRowSum = 10)
Running this above commands throws error : Cannot perform CNV analysis- please specify two groups of INPUT SETs!
Probably I'm not getting how CNVs' log2 ratios are calculated.
Regards,
Anupriya
Hi Anupriya,
within MEDIPS, the calculation of CNV works by comparing the local coverage of the input sample(s) that contain CNVs (ISet2) to the control input sample(s) (ISet1). Therefore it is required to specify both sets.
However, probably the read coverage is not sufficient to run the CNV detection on 100 base windows. Typical values for the CNV analysis would be 100000 to 1000000 base windows. In MEDIPS, this would be done by initially specifying CNV=FALSE, and than use MEDIPS.addCNV().
I again want to encourage you to have a look at the QSEA package, which is has a much more integrated and convenient way of inferring, using, and depicting CNV information.
Hi Matthias,
I've one more doubt. After getting annotations with MEDIPS.getAnnotation function I'm getting some annotations that are having only Ensembl enzyme/protein/transcript ID (no geneIDs) :
chr start stop chr.1 start.1 stop.1 CF X1.WT.5mC_S1.bam.counts X2.LshKO.5mC_S2.bam.counts X1.WT.5mC_S1.bam.rpkm X2.LshKO.5mC_S2.bam.rpkm X1.WT.5mC_S1.bam.rms X2.LshKO.5mC_S2.bam.rms X3.WT.Input_S3.bam.counts X4.LshKO.Input_S4.bam.counts X3.WT.Input_S3.bam.rpkm X4.LshKO.Input_S4.bam.rpkm MSets1.counts.mean MSets1.rpkm.mean MSets1.rms.mean MSets2.counts.mean MSets2.rpkm.mean MSets2.rms.mean edgeR.logFC edgeR.logCPM edgeR.p.value edgeR.adj.p.value CNV.log2.ratio DMR_ID 1_id 1_id 2_id 1_id 2_id 3_id 4_id 5_id 6_id
chrX 130684601 130684700 chrX 130684601 130684700 1 146 194 177.27 317.53 0.55 0.62 38 32 5.85 9.77 146 177.27 0.55 194 317.53 0.62 -0.7990199711 5.3553704235 7.23115947184283E-07 0.0214613582 -0.0736 ID_90 NA NA NA ENSMUSE00000546399 NA NA NA NA NA
chrX 130684701 130684800 chrX 130684701 130684800 2 147 196 178.48 320.8 0.55 0.61 39 31 6 9.47 147 178.48 0.55 196 320.8 0.61 -0.8039704999 5.3682293686 4.81168180090305E-07 0.0142805904 -0.0736 ID_90 NA NA NA ENSMUSE00000546399 NA NA NA NA NA
chr9 3033801 3033900 chr9 3033801 3033900 2 3094 1929 3756.66 3157.3 0.85 0.83 9978 5311 1535.81 1622.26 3094 3756.66 0.85 1929 3157.3 0.83 0.2920778945 9.1580107642 3.16711714802772E-10 9.39968698363148E-06 -0.0316 ID_49 NA TSS_ENSMUST00000170073 NA NA NA NA NA NA NA
Commands used:
anno.mart.gene = MEDIPS.getAnnotation(host="may2012.archive.ensembl.org",dataset=c("mmusculus_gene_ensembl")[1],annotation=c("TSS","EXON","GENE"),tssSz=c(-1000,500),chr=chr.select1) #for mm9 annotations.
roi_new = MEDIPS.setAnnotation(regions = rois, annotation = anno.mart.gene)
How is it possible not to get geneIDs associated with transcripts/protein IDs?
As mentioned by you, I'll be trying QSEA package after I'm done with MEDIPS.
Thank you,
Anupriya
Unfortunately, MEDIPS.getAnnotation does not fetch gene names, and the function cannot be configured, such that it does fetch them. You could either modify the function, or match the ids and change the resulting output tables manually.