Hi all,
This question is directed to those who may have experience with the QSEA R package, which proceeds from MEDIPS.
I have MeDIP-seq data which I hope to begin analyzing very shortly, but unfortunately I'm not too familiar with processing genomic data. I've been able to following the tutorial for QSEA up until the normalization section, which suggests the use of a previously validated dataset. 450k TCGA data from lung is used in this example, and I such I presumed to use Head and Neck for mine. However I'm unable to figure out how to average the beta values within 500bp windows across all samples, as described within the tutorial, which looks like this:
I believe it may have been done by GRanges, or another R package which utilizes GRanges, but I am unable to generate the same object format. I've tried packages such as minfi, GRanges, but I lack the hands-on knowledge to further process the TCGA data after import.
I was wondering if anyone has any suggestions as to what R packages or software I can use to produce a similar R object.
Thank you,
Justin
Thank you for all the useful information! I'll take a look at the script and try to reproduce the tables to gain some more experience. I'll definitely contact you should I have difficulties and require the table you already prepared, very convenient! :)
Thanks again,
Justin
Hi Matthias,
Thanks for the detail steps. I generated the R table for COAD samples.
When I run the command: wd=findOverlaps(tcgacoad450kmeth, getRegions(qseaSet), select="first") I get some NULL values in wd.
When I pass this to the addEnrichmentParameters method I get below error: Error: subscript contains NAs
Shall I remove those windows from wd before running the addEnrichmentParameters?
Thanks
Gaurav
Hi Gaurav,
like that its difficult to judge whether this is normal and acceptable. Can you provide more details, that make it easier to reproduce the issue? Where exactly did the NAs (or NULLs?) appear? If you have a commented script, that contains the function output, that would be best.
Best, Matthias
Hi Matthias,
I was able to resolve this issue by selecting only those chromosomes in tcgacoad450kmeth that are in qseaSet. Any extra regions or windows in tcgacoad450kmeth object will give this error.
I am getting values > 1 in output of getOffset(qseaSet, scale="fraction"). Ideally, these values should be less than 1 as per the tutorial. What could be the reason for >1 numbers.
Regards
Gaurav
Hi Gaurav,
large estimated offset values around 1 indicate that the coverage in regions without any CpG are as high as in regions with CpGs. This suggests that the MeDIP enrichment was not effective or the samples are completely demethylated. Do you see this for all your samples?
Hi Matthias,
Yes, I am observing for all samples.
01-NT 04-NT 05-NT 08-NT 09-NT 10-NT 11-NT 12-NT 14-NT 0.9169249 1.1974449 1.1248954 1.0219849 0.9956591 0.5468642 0.9655438 1.1318748 1.2267099 22-NT 26-NT 38-NT 39-NT 40-NT 46-NT 50-NT 01-TT 04-TT 1.5692617 1.0610539 1.3977801 0.6516493 0.8128844 1.2260762 1.1285125 1.2004545 1.1669367 05-TT 08-TT 09-TT 10-TT 11-TT 12-TT 14-TT 22-TT 26-TT 1.0152239 1.0229919 1.2373184 0.7241613 1.2339220 1.4789610 1.2667688 1.1665162 1.0011935 38-TT 39-TT 40-TT 46-TT 50-TT 1.5121940 0.6874239 0.8598436 1.2930325 0.9949078
So, is the data has no enrichment or shall I same other signal matrix?
Data indicate that there is no enrichment, however, of course its worth digging into it, e.g. by looking at regions with / without expected enrichment in a genome browser. Of course there is a chance that the selected calibration regions are unfortunate.
Data indicate that there is no enrichment, however, of course its worth digging into it, e.g. by looking at regions with / without expected enrichment in a genome browser. Of course there is a chance that the selected calibration regions are unfortunate.
Hi Matthias, Is it possible for you to share the full result for LUAD/LUSC somewhere (not just the chr20-22), rather than me generating it again?
Thanks, Simon
Sorry, I missed that comment. Do you still need the TCGA calibration data?
Yes please Matthias, that would be appreciated.
See my new answer for the link to the data for 35 TCGA cohorts
Hi Matthias,
Is it possible to go through the workflow (stated above in your answer) in R? I am not familiar with the perl universe. I have an in-house 450K prostate cancer data set, which I would like to apply in the the calibration step, and as all my data map to hg19, I cannot use your TCGA cohort tables.
Best, Karoline
Sure, but can you please make a new thread or answer this one, as this comments get quite confusing. Please specify what you already tried and what did not work. The scripts should work without knowing perl as they work with command line options.