QC and normalization of 450K data from CSVs instead of IDATs
Entering edit mode
Simone ▴ 180
Last seen 5.3 years ago


I have got some 450K data in CSV format, containing the following information (column names):  

CSV file 1:
SampleN.AVG_Beta, SampleN.Intensity, SampleN.Signal_A, SampleN.Signal_B

CSV file 2:
SampleN.Signal_Red, SampleN.Signal_Grn, SampleN.Pval

CSV file 3:
SampleID, Sample_Well, Sample_plate, Sentrix_ID, Sentrix_Position, [phenotype data...]

I have been able to create a MethyLumiSet from these data. However, to be able to normalize the data with ChAMP or minfi I would need an RGChannelSet and MethylSet for most normalization methods. When trying to convert the MethyLumiSet into an RGChannelSet I get the following error:

> methyl <- methylumiR(filename="data/mydata.csv", sampleDescriptions=annot)
> myrgset <- as(methyl, "RGChannelSet")
Error in methylumiToMinfi(from) :
  Cannot construct an RGChannelSet without full (OOB) intensities

I cannot get hold of out-of-band intensities or the IDATs of these data. I know I could simply perform PBC or standard quantile normalization on beta values. But I was wondering if there was a way of taking advantage of the fact that I have got more information than just beta values (i.e. red and green channel intensities, etc), for quality assessment and normalization procedures, even though I do not have OOB intensities. Is there a way of doing so in minfi or ChAMP, or do you have any other suggestions about how to deal with this?

Best wishes,

450K CSV normalization minfi ChAMP • 1.1k views
Entering edit mode
Yuan Tian ▴ 270
Last seen 2.0 years ago
United Kingdom

Hello Simone:

I am not sure where did you get these three CSV, but ChAMP actually does NOT need any MythyLumiSet object, I am sure about that. All ChAMP function support solo beta matrix, and a list of phenotype, that's all. So I suspect you can directly modify your CSV file, extract beta matrix, Intensity matrix .e.g from them. Then directly use ChAMP to do normalization, analysis...  I guess this information is hidden in your CSV1.

Maybe you can paste couple example of your CSV file here, thus more people could help you better.


Yuan Tian

Entering edit mode

Thanks for your reply, Yuan. I know that ChAMP does not need an MethyLumiSet and that I can run

norm_pbc <- champ.norm(beta=mybeta, method="PBC", arraytype="450K")

on just the beta values. The same for BMIQ.

But for FunctionNormalization, I would require an rgSet, and for SWAN normalization both an rgSet and MethylSet.

Importantly, I was also thinking that it might make sense to make use of the additional data I have got (Signal_A, Signal_B, Signal_Grn, Signal_Red, ...) for performing some QC (as far as possible without having the IDATs?).

I  had already described all information content of the CSV files in my original posts. I think these files have been generated using GenomeStudio, but I do not know for sure. Which additional information would you require? Please let me know.

Entering edit mode
em...I see. Yes currently Functional Normalization indeed call for S4 object from minfi's reading function. I was thinking BMIQ solution. um...I can not see any easy solution now. Maybe some hacking on minfi's code work. Your data looks very much similar to a stage between IDAT file and RGChannalSet. Best Yuan Tian

Login before adding your answer.

Traffic: 660 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6