Control probe QC plots for Infinium 450k chips (lumi, minfi...)
1
0
Entering edit mode
Jon Manning ▴ 90
@jon-manning-5708
Last seen 7.7 years ago
Hi Bioconductors, I'm trying to find out if there are good existing ways in Bioconductor of plotting out the QC data from Illumina Methylation 450k chips. I want to plot the control probe intensities separately for green and red channels, label the individual probe types properly (such as 'DNP (High)', 'Biotin (Med)' for the staining controls), and ideally indicate expected (qualitative) intensities. I've actually done this myself, but want to know if I've missed the BioC way. The detail: I have 'sample methylation profile', 'control probe profile' etc files (not the IDAT files). The way Lumi and friends read the controlData (retrievable from a controlData slot in the methyLumiM object), probe detail seems to be lost, such that for the staining controls ('DNP (High)', 'Biotin (Med)') I just get identifiers like STAINING, STAINING.1. The results of methylumi's qcplot() function are then supremely un-useful. Likewise Minfi's QC plotting functions seem from the docs not to label individual control types (though I don't have the IDAT data so Minfi isn't very useful to me anyway). I needed to finish my work, so I brewed my own method using ggplot() and cross-referencing the control data and annotation from the manifest (outside of Lumi et al), but I feel this is something others will have done and that I've therefore missed something. Any pointers? Many thanks, Jon Manning -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Annotation probe lumi minfi Annotation probe lumi minfi • 2.1k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.6 years ago
United States
heh. yeah I wrote something like this a long time ago for methylumi... not sure if there's an issue with the Final Report Format parser or what, but normally the control probe names are preserved. Anyways, here's the fugly old code, which I might as well update in methylumi in SVN. If a type of probe isn't found, the plot will try and search by name for the specified controls. If it finds those, it will break them out by name (the assumption here is that the user knows what they are doing if they ask for e.g. bisulfite conversion probes). You might also look into shinyMethyl, which is another way of approaching this. There are coercions within methylumi that will take a MethyLumiSet or a MethyLumiM object and turn it into a MethylSet or a RGChannelSet, so presumably it would be possible to use shinyMethyl (which builds on minfi) for QC. I happen to think that minfi has a nice clean codebase and makes it relatively easy to build upon, but that's just me. Fugly old code follows... library(methylumi) PBLs <- readRDS('PBLs.rds') ## a MethyLumiSet I had lying around ## caveat: I have always read in data from IDATs, so I didn't test the parser here ## examples: ## qc.probe.plot(PBLs, controltype='DNP') ## finds the probes by name ## qc.probe.plot(PBLs, controltype='bisulfite') ## finds the probes by type ## qc.probe.plot(PBLs, by.type=T, controltype='bisulfite') ## splits the probes by type ## avert your eyes, this code predates my learning how to be hygienic in R... qc.probe.plot <- function (obj, controltype="negnorm", log2=T, by.type=F, ...) { require("ggplot2") require("reshape2") require("scales") by.probe <- FALSE log2_trans <- log_trans(base = 2) if (class(obj) %in% c("MethyLumiSet", "MethyLumiM")) { qc <- controlData(obj) if (!identical(sampleNames(qc), sampleNames(obj))) { sampleNames(qc) <- sampleNames(obj) } } else if (class(obj) == "MethyLumiOOB" & tolower(controltype) == "oob") { qc <- obj } else if (class(obj) == "MethyLumiQC") { qc <- obj } else { stop("Don't know how to QC this data you've given me...") } if (tolower(controltype) == "negnorm" || missing(controltype)) { rows <- grep("(Negative|Norm)", fData(qc)$Type, ignore.case = T) } else { rows <- grep(paste("^", controltype, sep = ""), fData(qc)$Type, ignore.case = T) if(length(rows) < 1) { ## try looking by name instead rows <- grep(paste("^", controltype, sep = ""), featureNames(qc), ignore.case = T) by.probe <- TRUE } } if (tolower(controltype) == "oob") { dat <- intensities.OOB.allelic(obj) rownames(dat$Cy5$M) <- paste(rownames(dat$Cy5$M), "M", sep = "_") rownames(dat$Cy5$U) <- paste(rownames(dat$Cy5$U), "U", sep = "_") rownames(dat$Cy3$M) <- paste(rownames(dat$Cy3$M), "M", sep = "_") rownames(dat$Cy3$U) <- paste(rownames(dat$Cy3$U), "U", sep = "_") dat$Cy5 <- rbind(dat$Cy5$M, dat$Cy5$U) dat$Cy3 <- rbind(dat$Cy3$M, dat$Cy3$U) names(dat) = gsub("Cy3", "green", gsub("Cy5", "red", names(dat))) dat <- lapply(dat, function(d) { datum = data.frame(d) colnames(datum) = gsub("^X", "", colnames(datum)) m.probes = grepl("_M$", rownames(d)) datum$probe = as.factor(gsub("_(M|U)$", "", rownames(datum))) datum$type = "unmethylated" datum$type[which(m.probes)] = "methylated" datum$type = as.factor(datum$type) return(datum) }) dat$green$channel = "Cy5" dat$red$channel = "Cy3" dat.frame <- rbind(dat$red, dat$green) dat.frame$channel <- as.factor(dat.frame$channel) a.title <- "Out-of-band probe intensity plot" } else { probes <- featureNames(qc)[rows] type <- as.factor(fData(qc)$Type[rows]) if (tolower(controltype) == "negnorm") { if ("NORM_C" %in% unique(fData(qc)$Type)) { colour.settings <- c(NEGATIVE = "darkgray", NORM_A = "red", NORM_T = "darkred", NORM_C = "green", NORM_G = "darkgreen") type = factor(type, levels = names(colour.settings)) shape.settings <- c(20, 20, 20, 20, 20) } else { colour.settings <- c("darkgray", "darkgreen", "red") shape.settings <- c(20, 20, 20) } } dat <- c(red = "unmethylated", green = "methylated") Cy5 <- as.data.frame(assayDataElement(qc, dat[1])[rows, ]) Cy5$channel <- "Cy5 (Red)" Cy5$probe <- as.factor(probes) Cy3 <- as.data.frame(assayDataElement(qc, dat[2])[rows, ]) Cy3$channel <- "Cy3 (Green)" Cy3$probe <- as.factor(probes) Cy3$type <- Cy5$type <- type dat.frame <- rbind(Cy5, Cy3) dat.frame$channel <- as.factor(dat.frame$channel) a.title <- paste(controltype, "control probe plot") if (tolower(controltype) == "negnorm") { a.title <- "Negative & normalization control probes" } } more.args = list(...) if ("extra" %in% names(more.args)) a.title = paste(a.title, more.args[["extra"]]) qc <- melt(dat.frame, id = c("probe", "channel", "type")) geometry <- ifelse(tolower(controltype) == "negnorm", ifelse(tolower(controltype) == "oob", "boxplot", "jitter"), "point") if (tolower(controltype) == "oob") { qc$grouping = paste(qc$variable, qc$type, sep = ".") p <- ggplot2::qplot(data = qc, colour = type, fill = type, group = grouping, x = variable, y = value, geom = "boxplot", main = a.title, xlab = "Sample", ylab = "Intensities") + coord_flip() if (log2) p <- p + scale_x_continuous(trans = "log2", limits = c(2^4, 2^16)) else p <- p + scale_x_continuous(limits = c(2^4, 2^16)) } else { if(by.probe) { p <- ggplot2::qplot(data = qc, colour = probe, shape = probe, x = value, y = variable, geom = geometry, main = a.title, ylab = "Sample", xlab = "Intensities") } else { p <- ggplot2::qplot(data = qc, colour = type, shape = type, x = value, y = variable, geom = geometry, main = a.title, ylab = "Sample", xlab = "Intensities") } p <- p + scale_y_discrete(limits = rev(sampleNames(obj))) if (log2) p <- p + scale_x_continuous(trans="log2", limits=c(2^4, 2^16)) else p <- p + scale_x_continuous(limits = c(2^4, 2^16)) } if (by.type) { p <- p + facet_grid(type ~ channel) } else if(by.probe) { p <- p + facet_grid(. ~ channel) } else { p <- p + facet_grid(. ~ channel) } if (tolower(controltype) == "negnorm") { p <- p + scale_colour_manual(values = colour.settings) p <- p + scale_shape_manual(values = shape.settings) } p <- p + theme_bw() return(p) } On Fri, Aug 23, 2013 at 4:05 AM, Jon Manning <jonathan.manning@ed.ac.uk>wrote: > Hi Bioconductors, > > I'm trying to find out if there are good existing ways in Bioconductor of > plotting out the QC data from Illumina Methylation 450k chips. I want to > plot the control probe intensities separately for green and red channels, > label the individual probe types properly (such as 'DNP (High)', 'Biotin > (Med)' for the staining controls), and ideally indicate expected > (qualitative) intensities. I've actually done this myself, but want to know > if I've missed the BioC way. > > The detail: > > I have 'sample methylation profile', 'control probe profile' etc files > (not the IDAT files). The way Lumi and friends read the controlData > (retrievable from a controlData slot in the methyLumiM object), probe > detail seems to be lost, such that for the staining controls ('DNP (High)', > 'Biotin (Med)') I just get identifiers like STAINING, STAINING.1. The > results of methylumi's qcplot() function are then supremely un- useful. > Likewise Minfi's QC plotting functions seem from the docs not to label > individual control types (though I don't have the IDAT data so Minfi isn't > very useful to me anyway). > > I needed to finish my work, so I brewed my own method using ggplot() and > cross-referencing the control data and annotation from the manifest > (outside of Lumi et al), but I feel this is something others will have done > and that I've therefore missed something. Any pointers? > > Many thanks, > > Jon Manning > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* * * Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6