Question

Get back to individual measures to calculate coefficients like the lmFit fucntion (limma package)

0

Entering edit mode

stephane.cauchi ▴ 10

@stephanecauchi-12521

Last seen 7.1 years ago

Hi !

I need to compare the transcriptome of 5 cases and 5 controls using one-color Agilent microarrays. Weights were computed by the following function:

#My weight function
myFlagFun <- function(x) {
#Weight only strongly positive spots 1, everything else 0
present <- x$gIsPosAndSignif == 1
probe <- x$ControlType == 0
manual <- x$IsManualFlag == 0
strong <- x$gIsWellAboveBG == 1
y <- as.numeric(present & probe & manual & strong)

#Weight weak spots 0.5

weak <- strong == FALSE
weak <- (present & probe & manual & weak)
weak <- grep(TRUE,weak)
y[weak] <- 0.5

#Weight flagged spots 0.5

sat <- x$gIsSaturated == 0
xdr <- x$gIsLowPMTScaledUp == 0
featureOL1 <- x$gIsFeatNonUnifOL == 0
featureOL2 <- x$gIsFeatPopnOL == 0
flagged <- (sat & xdr & featureOL1 & featureOL2)
flagged <- grep(FALSE, flagged)
good <- grep(TRUE, y==1)
flagged <- intersect(flagged, good)
y[flagged] <- 0.5
y
}

Then I used the following script:

RG <- read.maimages(targets, source="agilent",green.only=TRUE, wt.fun=myFlagFun)

RGb <- backgroundCorrect(RG, method="none")

MA <- normalizeBetweenArrays(RGb, method="quantile")

MA.ave <- avereps(MA, ID=MA$genes$ProbeName)

f <- factor(targets$Condition, levels = unique(targets$Condition))
design <- model.matrix(~0 + f)
colnames(design) <- levels(f)

fit <- lmFit(MA.ave, design)

If I understood correctly, the weights are used in gene-wise weighted least squares regression to estimate the coefficients in the linear model. So they will affect the logFC values. My problem is that I need to get back to normalized individual measures so that their means are in line with coefficients calculated by the lmFit function.

Here is an example of what I get on a single gene:

In controls:

MA.ave values	5.61618123133188	5.0746766862945	4.8972404255748	4.90207357931074	5,68229237143083
MA.ave weights	1	0.5	0.5	0	1

The corresponding coefficient is

5.42814405

In cases

MA.ave values	5.18784690943957	5.46597446450407	5.19967234483636	5.18982455888002	5.53605290024021
MA.ave weights	0.5	0.5	0.5	0	0.5

The corresponding coefficient is

5.34738665475505

Please let me know how I can get back to individual normalized measures taking into account the attributed weights ? Their means should be similar to coefficients reported in the fit object.

Thank you very much for your help,

Best Regards,

Stephane

limma lmfit weight individual • 1.2k views

ADD COMMENT • link 7.1 years ago stephane.cauchi ▴ 10

score 0 · Answer 1 · 2017-03-07

If I understand you correctly, you want to compute some "normalised" value that accounts for the weight of each observation. In general, this is not possible, as the weights scale the contribution of each observation to the sum of squares and thus only make sense in the context of least-squares regression. However, you can get something for group means. Consider the weighted mean for your first example:

y <- c(5.61618123133188, 5.0746766862945, 4.8972404255748, 4.90207357931074, 5.68229237143083)
w <- c(1, 0.5, 0.5, 0, 1)
weighted.mean(y, w) # Gives 5.428144

The weighted mean of each group is calculated as sum(y*w)/sum(w), so you could define a modified expression value as:

z <- y*w/mean(w)
mean(z) # Gives 5.428144

Of course, var(z) will not be the sample variance, and z itself is largely gibberish when you look at its values - for example, a weight of zero gets a modified expression of zero, which is misleading as the corresponding value in y is decidedly non-zero. But, as requested, the (weighted) mean is the same as what you would get from performing weighted linear regression via lmFit.

I don't know what you intend to use z for, but it is far better to feed both the original observations and their associated weights into downstream procedures that can handle weights. If a procedure cannot handle weights, then supplying y without modification is almost certainly better than using z. The values of z are only "correct" in the sense that they give the same weighted mean as you would get from weighted linear regression - trying to compute any other statistic will probably give (wildly!) wrong results.

score 0 · Answer 2 · 2017-03-07

0

Entering edit mode

stephane.cauchi ▴ 10

@stephanecauchi-12521

Last seen 7.1 years ago

Thank you very much Aaron,

Actually, if I don't take into account the attributed weights the resulting logFC may be quite different from what I get with lmFit. When using the above example, I get -0.0807573981440699 if the weights are taken into account and 0.0813813767914962 if they are not. Even though the logFC is not significant we get opposite results. I told you that I would like to compare 5 cases and 5 controls but this is la little more complicated. I would like to compare two conditions over time using the TTCA R package. I would need to use normalized individual data to run the analysis. For each time point, I'm afraid that if I don't take into account these weighted corrections, discrepancies between limma and TTCA may lead to different results. If you have a better approach to handle this issue, please let me know what to do.

Thank you again for your time.

ADD COMMENT • link 7.1 years ago stephane.cauchi ▴ 10

0

Entering edit mode

Well, obviously the weights have some effect on the regression, otherwise we wouldn't need to use them. If TTCA does linear regression but doesn't accept weights, then there's no general solution for modifying the observations to mimic the effect of the weights in the linear model fit. Even if TTCA does accept weights, it may not be using the weights in the same way, e.g., if it's not doing linear regression. I would ignore the weights and accept the fact that the TTCA analysis may not be optimal. It's not like TTCA and limma would agree perfectly anyway, so you'll have to deal with different results regardless of what you do.

P.S. Respond to answers using "add comment" or "add reply" rather than adding a new answer.

ADD REPLY • link 7.1 years ago Aaron Lun ★ 28k