Question: qsea non-TCGA 450K Human Methylation Calibration Dataset
gravatar for wildist
13 months ago by
wildist0 wrote:


         I am using qsea to study the methylome in paediatric cancer. Since TCGA does not provide such calibration data, I have to do it myself. I mimic the tutorial example to create a Granges object contain all 450k probe data. After executing qseaSet=addEnrichmentParameters(qseaSet, enrichmentPattern="CpG", windowIdx=wd, signal=signal), I get following error: 

   Error in estimateEnrichmentLM(qs, windowIdx = windowIdx, signal = signal,  : number of samples (20) does not match number of columns (6) in "signal".

         I separately run the tutorial again and cross-check the variable "signal", their format is the same. I have no idea what is the number of columns(6) refer to. Could anyone help me with this problem?

Thank you,


qsea • 307 views
ADD COMMENTlink modified 13 months ago by Matthias Lienhard140 • written 13 months ago by wildist0


        The error has been solved. But I have encountered another one. I successfully added the normalization file for calculation. Then I compared the data with/without normalization which give me the same number of windows (500bp each). I used nfpkm instead of beta as the norm_method. I would like to ask is it normal? Since I am expecting a change in the total number of window after using 450k data to normalize.


Thank you


ADD REPLYlink modified 13 months ago • written 13 months ago by wildist0
Answer: qsea non-TCGA 450K Human Methylation Calibration Dataset
gravatar for Matthias Lienhard
13 months ago by
Max Planck Institute for molecular Genetics, Berlin, Germany
Matthias Lienhard140 wrote:

Hi Walter,

"signal" can either be

  • a single numeric value (e.g. 1 for 100% methylation) which is used for all selected windows and all samples, or
  • a numeric vector of the same length as "windowIdx" (the selected windows), or
  • a matrix, were each row corresponds to a selected window and each column to a sample.

It seems to me you provide a matrix, now (in the comment) with the correct dimension.

The calibration (either "blind" or with additional data) enables the estimation of beta-values, which for example allow comparison with 450k data. It does not change the number of genomic windows. The number of windows is determined by the window size (specified in the first step createQseaSet) and the genome size.

Best, Matthias

ADD COMMENTlink modified 13 months ago • written 13 months ago by Matthias Lienhard140

Hello Matthias,

         Thank you for your reply. We finally complete the first set of analysis by using qsea and bisulfites sequencing has been used to confirm some DMR. It really works. In next stage, I want to use clustering method to see if there is any molecular subtype of the disease. I would like to ask if there is any method to get the beta value of all regions for doing cluster analysis?




ADD REPLYlink written 11 months ago by wildist0

Hi Walter,

great that you was able to validate the QSEA results! Sure you can extract all beta values using the makeTable function. Just leave the parameters "ROIs" and "keep" empty, and make sure to select norm_methods="beta" and samples=getSampleNames(qseaSet).

Note that for regions with low enrichment signal (due to low CpG density in the region), estimates become unreliable and are set to NA. You can control the threshold with the minEnrichment Parameter, e.g. minEnrichment=0 gives you values for all regions - however some at low confidence. You may also want to consider the credibility interval (CI) of the estimates, which can be produced by setting norm_methods=c("beta", "betaLB", "betaUB") for 95% CI or norm_methods=c("beta", "q5", "q95") for a 90% CI. You can adapt the numbers of the Interval bounderies. betaLB is alias for q2.5 and betaUB for q97.5.

see also: ?normMethod

ADD REPLYlink written 11 months ago by Matthias Lienhard140

Solved. Thank you

Best, Walter

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by wildist0

Hello Matthias,

     After update qsea to 1,10, I found problem in loading bam file into analysis. It happens in the addCoverage step which has following error code:

    [W::bam_hdr_read] bgzf_check_EOF: No error
    Error in lapply(ReadsL, "names<-", fields) : 
    'names' attribute [4] must be the same length as the vector [0]

     I tried both newly generated and old bam (previously ok), this error randomly happened.Could you help me to figure out?

    Thank you for your help.

Regards, Walter

ADD REPLYlink written 5 weeks ago by wildist0

Hi Walter, thanks for letting me know. However, I cannot reproduce the error here. Can you please confirm that you updated all packages to Bioconductor 3.6? By "this error randomly happened" you mean that it does not always occur, but only with specific data?

Anyway, if this error persists, please open another thread as this seems to be unrelated to the previous issues. Best, Matthias

ADD REPLYlink written 4 weeks ago by Matthias Lienhard140

Hello Matthias,

     Thank you for your reply. The problem is persists. I will start another thread after summarise what we get.

Best, Walter

ADD REPLYlink written 28 days ago by wildist0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 107 users visited in the last hour