QC and normalization of 450K data from CSVs instead of IDATs
1
0
Entering edit mode
Simone ▴ 180
@simone-5854
Last seen 3.3 years ago

Hello,

I have got some 450K data in CSV format, containing the following information (column names):

CSV file 1:
SampleN.AVG_Beta, SampleN.Intensity, SampleN.Signal_A, SampleN.Signal_B

CSV file 2:
SampleN.Signal_Red, SampleN.Signal_Grn, SampleN.Pval

CSV file 3:
SampleID, Sample_Well, Sample_plate, Sentrix_ID, Sentrix_Position, [phenotype data...]

I have been able to create a MethyLumiSet from these data. However, to be able to normalize the data with ChAMP or minfi I would need an RGChannelSet and MethylSet for most normalization methods. When trying to convert the MethyLumiSet into an RGChannelSet I get the following error:

> methyl <- methylumiR(filename="data/mydata.csv", sampleDescriptions=annot)
> myrgset <- as(methyl, "RGChannelSet")
Error in methylumiToMinfi(from) :
Cannot construct an RGChannelSet without full (OOB) intensities

I cannot get hold of out-of-band intensities or the IDATs of these data. I know I could simply perform PBC or standard quantile normalization on beta values. But I was wondering if there was a way of taking advantage of the fact that I have got more information than just beta values (i.e. red and green channel intensities, etc), for quality assessment and normalization procedures, even though I do not have OOB intensities. Is there a way of doing so in minfi or ChAMP, or do you have any other suggestions about how to deal with this?

Best wishes,
Simone

450K CSV normalization minfi ChAMP • 549 views
0
Entering edit mode
Yuan Tian ▴ 240
@yuan-tian-13904
Last seen 12 days ago
United Kingdom

Hello Simone:

I am not sure where did you get these three CSV, but ChAMP actually does NOT need any MythyLumiSet object, I am sure about that. All ChAMP function support solo beta matrix, and a list of phenotype, that's all. So I suspect you can directly modify your CSV file, extract beta matrix, Intensity matrix .e.g from them. Then directly use ChAMP to do normalization, analysis...  I guess this information is hidden in your CSV1.

Maybe you can paste couple example of your CSV file here, thus more people could help you better.

Best

Yuan Tian

0
Entering edit mode

Thanks for your reply, Yuan. I know that ChAMP does not need an MethyLumiSet and that I can run

norm_pbc <- champ.norm(beta=mybeta, method="PBC", arraytype="450K")

on just the beta values. The same for BMIQ.

But for FunctionNormalization, I would require an rgSet, and for SWAN normalization both an rgSet and MethylSet.

Importantly, I was also thinking that it might make sense to make use of the additional data I have got (Signal_A, Signal_B, Signal_Grn, Signal_Red, ...) for performing some QC (as far as possible without having the IDATs?).

I  had already described all information content of the CSV files in my original posts. I think these files have been generated using GenomeStudio, but I do not know for sure. Which additional information would you require? Please let me know.

0
Entering edit mode
em...I see. Yes currently Functional Normalization indeed call for S4 object from minfi's reading function. I was thinking BMIQ solution. um...I can not see any easy solution now. Maybe some hacking on minfi's code work. Your data looks very much similar to a stage between IDAT file and RGChannalSet. Best Yuan Tian