Question

qsea non-TCGA 450K Human Methylation Calibration Dataset

0

Entering edit mode

wildist • 0

@wildist-13288

Last seen 4.9 years ago

Hello,

I am using qsea to study the methylome in paediatric cancer. Since TCGA does not provide such calibration data, I have to do it myself. I mimic the tutorial example to create a Granges object contain all 450k probe data. After executing qseaSet=addEnrichmentParameters(qseaSet, enrichmentPattern="CpG", windowIdx=wd, signal=signal), I get following error:

Error in estimateEnrichmentLM(qs, windowIdx = windowIdx, signal = signal, : number of samples (20) does not match number of columns (6) in "signal".

I separately run the tutorial again and cross-check the variable "signal", their format is the same. I have no idea what is the number of columns(6) refer to. Could anyone help me with this problem?

Thank you,

Walter

qsea • 1.6k views

ADD COMMENT • link updated 6.0 years ago by Matthias Lienhard ▴ 240 • written 6.0 years ago by wildist • 0

0

Entering edit mode

Hello,

The error has been solved. But I have encountered another one. I successfully added the normalization file for calculation. Then I compared the data with/without normalization which give me the same number of windows (500bp each). I used nfpkm instead of beta as the norm_method. I would like to ask is it normal? Since I am expecting a change in the total number of window after using 450k data to normalize.

Thank you

Walter

ADD REPLY • link 6.0 years ago wildist • 0

score 0 · Answer 1 · 2018-05-08

0

Entering edit mode

Matthias Lienhard ▴ 240

@matthias-lienhard-6292

Last seen 3 months ago

Max Planck Institute for molecular Gene…

Hi Walter,

"signal" can either be

a single numeric value (e.g. 1 for 100% methylation) which is used for all selected windows and all samples, or
a numeric vector of the same length as "windowIdx" (the selected windows), or
a matrix, were each row corresponds to a selected window and each column to a sample.

It seems to me you provide a matrix, now (in the comment) with the correct dimension.

The calibration (either "blind" or with additional data) enables the estimation of beta-values, which for example allow comparison with 450k data. It does not change the number of genomic windows. The number of windows is determined by the window size (specified in the first step createQseaSet) and the genome size.

Best, Matthias

ADD COMMENT • link 6.0 years ago Matthias Lienhard ▴ 240

0

Entering edit mode

Hello Matthias,

Thank you for your reply. We finally complete the first set of analysis by using qsea and bisulfites sequencing has been used to confirm some DMR. It really works. In next stage, I want to use clustering method to see if there is any molecular subtype of the disease. I would like to ask if there is any method to get the beta value of all regions for doing cluster analysis?

Best,

Walter

ADD REPLY • link 5.8 years ago wildist • 0

0

Entering edit mode

Hi Walter,

great that you was able to validate the QSEA results! Sure you can extract all beta values using the makeTable function. Just leave the parameters "ROIs" and "keep" empty, and make sure to select norm_methods="beta" and samples=getSampleNames(qseaSet).

Note that for regions with low enrichment signal (due to low CpG density in the region), estimates become unreliable and are set to NA. You can control the threshold with the minEnrichment Parameter, e.g. minEnrichment=0 gives you values for all regions - however some at low confidence. You may also want to consider the credibility interval (CI) of the estimates, which can be produced by setting norm_methods=c("beta", "betaLB", "betaUB") for 95% CI or norm_methods=c("beta", "q5", "q95") for a 90% CI. You can adapt the numbers of the Interval bounderies. betaLB is alias for q2.5 and betaUB for q97.5.

see also: ?normMethod

ADD REPLY • link 5.8 years ago Matthias Lienhard ▴ 240

0

Entering edit mode

Solved. Thank you

Best, Walter

ADD REPLY • link 5.1 years ago wildist • 0

0

Entering edit mode

Hello Matthias,

     After update qsea to 1,10, I found problem in loading bam file into analysis. It happens in the addCoverage step which has following error code:

    [W::bam_hdr_read] bgzf_check_EOF: No error
    Error in lapply(ReadsL, "names<-", fields) : 
    'names' attribute [4] must be the same length as the vector [0]

     I tried both newly generated and old bam (previously ok), this error randomly happened.Could you help me to figure out?

    Thank you for your help.

Regards, Walter

ADD REPLY • link 5.0 years ago wildist • 0

0

Entering edit mode

Hi Walter, thanks for letting me know. However, I cannot reproduce the error here. Can you please confirm that you updated all packages to Bioconductor 3.6? By "this error randomly happened" you mean that it does not always occur, but only with specific data?

Anyway, if this error persists, please open another thread as this seems to be unrelated to the previous issues. Best, Matthias

ADD REPLY • link 5.0 years ago Matthias Lienhard ▴ 240

0

Entering edit mode

Hello Matthias,

     Thank you for your reply. The problem is persists. I will start another thread after summarise what we get.

Best, Walter

ADD REPLY • link 4.9 years ago wildist • 0