Question: QC and normalization of 450K data from CSVs instead of IDATs
0
9 months ago by
Simone170
Simone170 wrote:

Hello,

I have got some 450K data in CSV format, containing the following information (column names):

CSV file 1:
SampleN.AVG_Beta, SampleN.Intensity, SampleN.Signal_A, SampleN.Signal_B

CSV file 2:
SampleN.Signal_Red, SampleN.Signal_Grn, SampleN.Pval

CSV file 3:
SampleID, Sample_Well, Sample_plate, Sentrix_ID, Sentrix_Position, [phenotype data...]

I have been able to create a MethyLumiSet from these data. However, to be able to normalize the data with ChAMP or minfi I would need an RGChannelSet and MethylSet for most normalization methods. When trying to convert the MethyLumiSet into an RGChannelSet I get the following error:

> methyl <- methylumiR(filename="data/mydata.csv", sampleDescriptions=annot)
> myrgset <- as(methyl, "RGChannelSet")
Error in methylumiToMinfi(from) :
Cannot construct an RGChannelSet without full (OOB) intensities

I cannot get hold of out-of-band intensities or the IDATs of these data. I know I could simply perform PBC or standard quantile normalization on beta values. But I was wondering if there was a way of taking advantage of the fact that I have got more information than just beta values (i.e. red and green channel intensities, etc), for quality assessment and normalization procedures, even though I do not have OOB intensities. Is there a way of doing so in minfi or ChAMP, or do you have any other suggestions about how to deal with this?

Best wishes,
Simone

normalization minfi champ 450k csv • 183 views
modified 9 months ago by Yuan Tian60 • written 9 months ago by Simone170
Answer: QC and normalization of 450K data from CSVs instead of IDATs
0
9 months ago by
Yuan Tian60
Shanghai Institute for Biology Science, Shanghai, China
Yuan Tian60 wrote:

Hello Simone:

I am not sure where did you get these three CSV, but ChAMP actually does NOT need any MythyLumiSet object, I am sure about that. All ChAMP function support solo beta matrix, and a list of phenotype, that's all. So I suspect you can directly modify your CSV file, extract beta matrix, Intensity matrix .e.g from them. Then directly use ChAMP to do normalization, analysis...  I guess this information is hidden in your CSV1.

Maybe you can paste couple example of your CSV file here, thus more people could help you better.

Best

Yuan Tian

Thanks for your reply, Yuan. I know that ChAMP does not need an MethyLumiSet and that I can run

norm_pbc <- champ.norm(beta=mybeta, method="PBC", arraytype="450K")

on just the beta values. The same for BMIQ.

But for FunctionNormalization, I would require an rgSet, and for SWAN normalization both an rgSet and MethylSet.

Importantly, I was also thinking that it might make sense to make use of the additional data I have got (Signal_A, Signal_B, Signal_Grn, Signal_Red, ...) for performing some QC (as far as possible without having the IDATs?).

I  had already described all information content of the CSV files in my original posts. I think these files have been generated using GenomeStudio, but I do not know for sure. Which additional information would you require? Please let me know.