Question

Minfi returning incorrect beta values

0

Entering edit mode

ezrabekele17 • 0

@ezrabekele17-13225

Last seen 7.7 years ago

I'm trying to link GEOquery and minfi.

Specifically I want to obtain beta values from the idat files on GEOquery. I was following this guide: https://kasperdanielhansen.github.io/genbioconductor/html/minfi.html, up until the preprocessing part. Meaning that I was able to obtain the RGset. Then I used my own preprocessing code to obtain the beta values. However, I cross checked them with the beta values that were on GEO and they weren't consistent.

For example, the accession number I used was GSE68777. So I went to that study on GEO and clicked on the first sample: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1681154. Then I scrolled down and clicked "Download full table" to download the samples and beta values in a text file.

Then I typed head(beta) and chose the first sample. Then I did command F in the text file for that sample and it's value there wasn't the same as the value from the beta data table. Hopefully you can help find the error.

Here is the code I'm using:

library(GEOquery)
library(minfi)
library("IlluminaHumanMethylation450kanno.ilmn12.hg19")
library("IlluminaHumanMethylation450kmanifest")

#Code copied from 450k Guide

getGEOSuppFiles("GSE68777")
untar("GSE68777/GSE68777_RAW.tar", exdir = "GSE68777/idat")
head(list.files("GSE68777/idat", pattern = "idat"))
idatFiles <- list.files("GSE68777/idat", pattern = "idat.gz$", full = TRUE)
sapply(idatFiles, gunzip, overwrite = TRUE)
rgSet <- read.metharray.exp("GSE68777/idat")
geoMat <- getGEO("GSE68777")
pD.all <- pData(geoMat[[1]])
pD <- pD.all[, c("title", "geo_accession", "characteristics_ch1.1", "characteristics_ch1.2")]
names(pD)[c(3,4)] <- c("group", "sex")
pD$group <- sub("^diagnosis: ", "", pD$group)
pD$sex <- sub("^Sex: ", "", pD$sex)
sampleNames(rgSet) <- sub(".*_5", "5", sampleNames(rgSet))
rownames(pD) <- pD$title
pD <- pD[sampleNames(rgSet),]
pData(rgSet) <- pD

minfi illumina illumina450k methylation R • 1.7k views

ADD COMMENT • link 7.7 years ago ezrabekele17 • 0

score 1 · Answer 1 · 2017-07-11

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

The submission in question was uploaded in 2015, so the authors likely used a version of minfi from two or more years ago. You, on the other hand are using (presumably) a more current version. In addition, the submitters say that 'they used minfi' to analyze their data, without saying exactly what that means.

It's not inconceivable that the minfi package has changed in the intervening period (it has, in fact), and it is also not inconceivable that what you have done is not exactly what the original authors did. So it's not really surprising that you get different values, although I would be surprised if they were much different.

ADD COMMENT • link 7.7 years ago James W. MacDonald 68k

0

Entering edit mode

Do you think the beta values I'm getting now are more accurate and that I should just continue using the code of the newer minfi version?

ADD REPLY • link 7.7 years ago ezrabekele17 • 0

0

Entering edit mode

I have no idea. In this context, accurate means 'more closely represents the true underlying methylation that is being estimated'. Without knowing what the true underlying methylation is, how would anybody know if the new values are better than the old?

That said, some of the changes to minfi are intended to better control for known technical artifacts, and are intended to be improvements over previous code. So I think it would be a reasonable assumption that newer versions of minfi are better than older versions for various reasons, and would produce better results. But that is a non-specific 'better', as I don't know what exactly the submitters did to produce their data, and I haven't perused your code, so I can't say something like 'Well you used <some improved method goes here> and that should make your estimates better because of <some rationale that the improved method is better>.'

As the person analyzing the data, it's actually your job to know these things and have a reason for doing whatever it is that you are doing. Relying on some dude on a support site is not really a good strategy, as in an ideal world you would be able to tell your boss/collaborator/whatever what you did and why.

ADD REPLY • link 7.7 years ago James W. MacDonald 68k