Question

QC and normalization of 450K data from CSVs instead of IDATs

0

Entering edit mode

Simone ▴ 180

@simone-5854

Last seen 5.9 years ago

Hello,

I have got some 450K data in CSV format, containing the following information (column names):

CSV file 1:
SampleN.AVG_Beta, SampleN.Intensity, SampleN.Signal_A, SampleN.Signal_B

CSV file 2:
SampleN.Signal_Red, SampleN.Signal_Grn, SampleN.Pval

CSV file 3:
SampleID, Sample_Well, Sample_plate, Sentrix_ID, Sentrix_Position, [phenotype data...]

I have been able to create a MethyLumiSet from these data. However, to be able to normalize the data with ChAMP or minfi I would need an RGChannelSet and MethylSet for most normalization methods. When trying to convert the MethyLumiSet into an RGChannelSet I get the following error:

> methyl <- methylumiR(filename="data/mydata.csv", sampleDescriptions=annot)
> myrgset <- as(methyl, "RGChannelSet")
Error in methylumiToMinfi(from) :
  Cannot construct an RGChannelSet without full (OOB) intensities

I cannot get hold of out-of-band intensities or the IDATs of these data. I know I could simply perform PBC or standard quantile normalization on beta values. But I was wondering if there was a way of taking advantage of the fact that I have got more information than just beta values (i.e. red and green channel intensities, etc), for quality assessment and normalization procedures, even though I do not have OOB intensities. Is there a way of doing so in minfi or ChAMP, or do you have any other suggestions about how to deal with this?

Best wishes,
Simone

450K CSV normalization minfi ChAMP • 1.3k views

ADD COMMENT • link updated 5.9 years ago by Yuan Tian ▴ 280 • written 5.9 years ago by Simone ▴ 180

score 0 · Answer 1 · 2018-06-08

0

Entering edit mode

Yuan Tian ▴ 280

@yuan-tian-13904

Last seen 22 hours ago

United Kingdom

Hello Simone:

I am not sure where did you get these three CSV, but ChAMP actually does NOT need any MythyLumiSet object, I am sure about that. All ChAMP function support solo beta matrix, and a list of phenotype, that's all. So I suspect you can directly modify your CSV file, extract beta matrix, Intensity matrix .e.g from them. Then directly use ChAMP to do normalization, analysis... I guess this information is hidden in your CSV1.

Maybe you can paste couple example of your CSV file here, thus more people could help you better.

Best

Yuan Tian

ADD COMMENT • link 5.9 years ago Yuan Tian ▴ 280

0

Entering edit mode

Thanks for your reply, Yuan. I know that ChAMP does not need an MethyLumiSet and that I can run

norm_pbc <- champ.norm(beta=mybeta, method="PBC", arraytype="450K")

on just the beta values. The same for BMIQ.

But for FunctionNormalization, I would require an rgSet, and for SWAN normalization both an rgSet and MethylSet.

Importantly, I was also thinking that it might make sense to make use of the additional data I have got (Signal_A, Signal_B, Signal_Grn, Signal_Red, ...) for performing some QC (as far as possible without having the IDATs?).

I had already described all information content of the CSV files in my original posts. I think these files have been generated using GenomeStudio, but I do not know for sure. Which additional information would you require? Please let me know.

ADD REPLY • link 5.9 years ago Simone ▴ 180

0

Entering edit mode

em...I see. Yes currently Functional Normalization indeed call for S4 object from minfi's reading function. I was thinking BMIQ solution. um...I can not see any easy solution now. Maybe some hacking on minfi's code work. Your data looks very much similar to a stage between IDAT file and RGChannalSet. Best Yuan Tian

ADD REPLY • link 5.9 years ago Yuan Tian ▴ 280