Question

Normalization of IDAT files from Illumina HumanMethylation450 BeadChip without .bgx file

0

Entering edit mode

Sarah • 0

@345ed09e

Last seen 14 months ago

India

The dataset I am working on is Illumina HumanMethylation450 BeadChip GEO- GSE86829 and have too perform a similar normalization on other Illumina HumanMethylation 850k data too. Although normalised files are present the project requires me to do it from scratch to bring uniformity. I have found an R code for normalization of IDAT files but that requires a .bgx manifest file too. Posting the code below.

library(limma)
x <- read.idat(idatfiles, bgxfile)
y <- neqc(data)

My problem is that the above data and many others that I am working on does not have a .bgx file and instead have manifest file as .bpm or .csv provided along with GSE86829_RAW.tar file. I am new to processing microarray data and R libraries. Please help with a code that could work to simply normalize all the IDAT files for every sample in a directory and provide output as a single .txt with every sample in columns. Basically what I need is an equivalent of what ReadAffy does for CEL files, I need a similar one for IDAT files that does not need .bgx and work with .txt/.csv as manifest file.

library(affy)
celpath = "/mnt/store_room/Dataset1/processing/validation/GSE65663_RAW"
data = ReadAffy(celfile.path=celpath)

OR provide a different tool altogether to process IDAT files easily without much coding

IlluminaHumanMethylation450BeadChip illuminaio normalize450K • 2.3k views

ADD COMMENT • link 2.1 years ago • updated 2.0 years ago Sarah • 0

score 3 · Answer 1 · 2022-04-15

3

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

It's easy with minfi and GEOquery

> library(GEOquery)
> library(minfi)
> getGEOSuppFiles("GSE86829")
> setwd("GSE86829/")
## minfi won't use gzipped files
> sapply(dir(".", "gz$"), gunzip)
## at this point you could generate a targets file, but that might be more work than helpful. You can just read in however.
> z <- read.metharray.exp(".")
Warning message:
In readChar(con, nchars = n) : truncating string with embedded nuls
> z
class: RGChannelSet 
dim: 622399 15 
metadata(0):
assays(2): Green Red
rownames(622399): 10600313 10600322 ... 74810490 74810492
rowData names(0):
colnames(15): GSM2309154_6264509024_R01C02 GSM2309155_6264509024_R02C02
  ... GSM2309167_200190110117_R01C02 GSM2309168_200190110117_R03C02
colData names(0):
Annotation
  array: IlluminaHumanMethylation450k
  annotation: ilmn12.hg19

## at this point you could add the colData to describe the experiment (probably useful)
> colData(z) <- DataFrame(Sample = rep(c("LNCaP","PrEC","CAF","NAF","Guthrie"), c(2,2,3,3,5)))
> z
class: RGChannelSet 
dim: 622399 15 
metadata(0):
assays(2): Green Red
rownames(622399): 10600313 10600322 ... 74810490 74810492
rowData names(0):
colnames: NULL
colData names(1): Sample
Annotation
  array: IlluminaHumanMethylation450k
  annotation: ilmn12.hg19
> colData(z)
DataFrame with 15 rows and 1 column
         Sample
    <character>
1         LNCaP
2         LNCaP
3          PrEC
4          PrEC
5           CAF
...         ...
11      Guthrie
12      Guthrie
13      Guthrie
14      Guthrie
15      Guthrie

At this point you are at step 2.3 of this workflow, which is a useful explainer for the remaining steps.

ADD COMMENT • link 2.1 years ago James W. MacDonald 65k

0

Entering edit mode

From the workflow link you provided, I tried to proceed following their steps-

> rgSet <- read.metharray.exp(".")

Warning message: In readChar(con, nchars = n) : truncating string with embedded nuls

> rgSet

class: RGChannelSet dim: 622399 15 metadata(0): assays(2): Green Red rownames(622399): 10600313 10600322 ... 74810490 74810492 rowData names(0): colnames(15): GSM2309154_6264509024_R01C02 GSM2309155_6264509024_R02C02 ... GSM2309167_200190110117_R01C02 GSM2309168_200190110117_R03C02 colData names(0): Annotation array: IlluminaHumanMethylation450k annotation: ilmn12.hg19

> mSetSq <- preprocessQuantile(rgSet)

[preprocessQuantile] Mapping to genome. [preprocessQuantile] Fixing outliers. [preprocessQuantile] Quantile normalizing.

Seems like it worked till here. But I need the normalised values in matrix with the sample name as columns and probe_IDs as rows, hence trying the code below but it throws the following error-

> write.matrix(mSetSq, file = "/mnt/store_room/DatasetM/GSE86829_RAW", sep = "/t")

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'as.matrix': no method for coercing this S4 class to a vector

Similarly with > write.table(mSetSq, file='/mnt/store_room/DatasetM') Error in as.vector(x) : no method for coercing this S4 class to a vector

ADD REPLY • link 2.0 years ago Sarah • 0

1

Entering edit mode

I believe the write.matrix function is from the MASS package, and is intended to write a matrix or data.frame to a file. But that's not what you have, so it is not unexpected that it should fail. I would also note that having just the normalized values (which normalized values, btw? There are two sets!) in a matrix is almost surely not what you want. Part of the analysis of methylation data requires you to know where each CpG is located, and the GenomicRatioSet that is generated by preprocessQuantile contains that information. If you export the 'normalized values' to a file, you lose that information.

The minfi package is intended to generate objects that are then useful for analysis using other packages, and the workflow I pointed you to gives a very good explanation of how they all fit together. Is there some reason that the workflow is not useful for your purposes?

ADD REPLY • link 2.0 years ago James W. MacDonald 65k

0

Entering edit mode

ADD REPLY • link 2.0 years ago Sarah • 0

score 1 · Answer 2 · 2022-04-14

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 32 minutes ago

WEHI, Melbourne, Australia

The limma code is for Illumina expression arrays, which use bgx files. You are wanting to read Illumina methylation arrays, which use bpm or csv annotation files instead of bgx.

I suggest that you investigate Bioconductor packages designed for methylation arrays, especially the minfi or methylationArrayAnalysis packages.

ADD COMMENT • link 2.1 years ago Gordon Smyth 50k