DNA methylation analysis without raw data IDAT files
I'm interested in working on DNA methylation data. I have downloaded TCGA data from GDC harmonised archive. There are no IDAT files. 

IDAT files are available only for GDC legacy archive. 

Dataframe "data" is with 485577 probes as rows and 439 columns. There are columns like Chromosome, Start position, End position, Gene and Samples with values for each probe.

For eg it looks like below:

        Chr    Start    End    Gene    GeneType    TranscriptID    TCGA-DD-A3A3-01A    TCGA-G3-AAV1-01A    TCGA-DD-AACX-01A    TCGA-DD-A4NI-01A    TCGA-G3-AAV4-01A    TCGA-DD-A1EG-11A
    cg00000029    chr16    53434200    53434201    RBL2    protein_coding    ENST00000262133.9    0.550913627    0.390846294    0.210664637    0.329930064    0.193362596    0.309831311
    cg00000108    chr3    37417715    37417716    C3orf35    lincRNA    ENST00000328376.8    NA    NA    NA    NA    NA    NA
    cg00000109    chr3    172198247    172198248    FNDC3B    protein_coding    ENST00000336824.7    NA    NA    NA    NA    NA    NA
    cg00000165    chr1    90729117    90729118    .    .    .    0.570880538    0.074518375    0.174949392    0.136944673    0.064590585    0.151404705
    cg00000236    chr8    42405776    42405777    VDAC3    protein_coding    ENST00000022615.7    0.914067333    0.845768766    0.901394742    0.922730081    0.910097231    0.887756996

I have seen many R packages like mini, Champ, miss methyl etc....But all the packages can be used only with IDAT files. And I'm not aware about how I can do the methylation analysis with data in a dataframe.

Any help is appreciated. 

?readTCGA in minfi may be helpful here.

Hi James,

I tried reading the methylation data with readTCGA. But it is not working. May be I'm wrong somewhere

The methylation data is in a dataframe "df" with rows as probes and columns like Chromosome, Start position, End position, Gene and Samples like mentioned in my question. 


This gave an error like below: 

Error in readLines(filename, n = 2) : 'con' is not a connection

Could you please show me an example. Thank you



Most help pages have an example, and readTCGA is no different. You can't just use a function without reading the help page, and if you had read the help page you wouldn't have tried to do what you did.

If you are planning to get anywhere with R and/or Bioconductor, you will need to become more self-sufficient. You either need to learn how to figure things out for yourself, by reading the help pages and vignettes and googling for answers, or you need to find somebody local who has the skills to do the work for you. Just trying something random and then asking for help on this support site isn't a good long term strategy.

Actually ChAMP can do analysis with simple beta matrix and pheno vector, it's not relying on IDAT file. So if you can extract beta information from your file into proper matrix, you can then use ChAMP for all following analysis. But ChAMP does not provide loading function from your file format to R session.


yes, I didn't find anything to read methylation data from matrix with champ. Any idea about any other functions?


