Using limma for the analysis of qPCR data
1
0
Entering edit mode
enricoferrero ▴ 580
@enricoferrero-6037
Last seen 9 days ago
Switzerland

Hello,

The limma user's guide states that:

The linear model and differential expression functions are applicable to data from any quantitative gene expression technology including microarrays, RNA-seq and quantitative PCR.

However, that's the only time qPCR gets mentioned, and I haven't found examples on how to work with this type of data.

For background, qPCR data comes in the form of cycle threshold (Ct) values, which are inversely correlated to the gene expression (the more the gene is expressed, the fewer cycles it will need to reach the threshold).

These Ct values are then typically normalized by subtracting the Ct values of reference/housekeeping genes to derive ∆Ct values:

∆Ct = Ct (gene of interest) – Ct (housekeeping gene)

Then one can calculate the difference across conditions of interest as ∆∆Ct values:

∆∆Ct = ∆Ct (treated sample) – ∆Ct (untreated sample)

And fold changes are calculated as:

2^-∆∆Ct

But this only works for very simple experimental designs/contrasts (e.g.: treated vs untreated), while I have a more complex study design that would benefit from the statistical modelling capabilities of limma.

So, here's my question: how should I process the qPCR Ct data for use in limma? Ultimately I'd like to transform the Ct values in expression values that I can fit into an ExpressionSet object and feed to lmFit() and downstream functions.

Thank you!

NormqPCR EasyqpcR qPCR HTqPCR limma • 626 views
2
Entering edit mode
@gordon-smyth
Last seen 8 hours ago
WEHI, Melbourne, Australia

limma can analye either Ct or deltaCt values directly.

deltaCt values is the simplest. Just compute y <- 40 - deltaCt, then you have log2-expression values. There's no need for any other transformation and no need for an ExpressionSet. Just input the matrix of y directly into limma as if they were microarry log-expression values.

Alternatively one can analysis Ct values and use limma's normalization procedures to improve on house-keeping gene normalization, as suggested here:

DE analysis of PCR array [was: dataset dim for siggenes]

In this approach, we compute y <- 40 - Ct and use cyclicLoessNormalization with the housekeeping gene (or genes) as controls. This approach is better than deltaCt when there are multiple house-keeping genes.

0
Entering edit mode

Thank you Gordon!

A couple follow-up questions:

• What's your intuition to know that y represents log2-expression values (and not linear expression values, to be log2-transformed)?
• Can you elaborate/provide an example on using normalizeCyclicLoess() for normalization of Ct values? I do have Ct values for 3 housekeeping genes, but I'm not sure how I would feed these to the function.
• Would you still recommend adding trend = TRUE and robust = TRUE as arguments in eBayes() to control heteroscedasicity and outliers?
1
Entering edit mode

Expression doubles during each cycle of PCR. Ct counts number of cycles needed to reach a threshold. It follows from a couple of lines of math that differences in Ct are on log2 scale.

For cycllc loess, x is the y matrix including house keep genes. weights is a vector of length nrow(x) equal to 50 for house-keeping probes and 1 otherwise.

I would probably not use trend=TRUE or robust=TRUE unless you have PCR for a lot of genes.

0
Entering edit mode

Dear Dr. Smyth:

I want to follow up this topic for limma: I have used limma method for many occasions of microarray datasets in the past, but besides microarrays, RNA-seq (with voom) and quantitative PCR data as mentioned here. What about other types of high-thoughput data that can be used for limma analysis? Is there a way or general principle/rule to assess whether the data types/distribution etc is suitable for limma analysis to derive differential features between interested contrasting groups? for example, one dataset I wish to use limma was derived from high-throughput assays, and had been scaled between around -2 and 1 (and we can essentially treat the data as log2 intensity like in microrray and high values mean highly expressed etc for the same probe after normalized), since more posiitve and more negative have opposite biological meanings. Another dataset we have is compound labeling high throghput assays for interested protein at intended sites/pockets as percentage levels of successful labeling, data ranged from 0 to 100%. highly lableing % certainly is favored as for good compounds. Maybe I shall log transform the percentage data if can be used for limma analysis?

I was wondering about the possiblity of using limma in these datasets. Any advice would be highly appreciated! Thanks a lot in advance! I shall say I had got great advices from you in the past, really appreciated! Best Ming