Fitting a continuous variable with batch effect using Limma. Also a question about normalization.
1
0
Entering edit mode
@261fa811
Last seen 9 days ago
United States

Hello, I have some RNA-Seq samples. These samples have no replicates, but I have lots of samples. What I am trying to do is find a trend based on a continuous variable (let's say age). I am basically using edgeR's cpm function with log as TRUE and prior count of 3. The first question is about Normalization. Is there a better way? Should I try the following? Find genes with the lowest variance, and use that to normalize the data. Which method would be better? Edger mentions this in the no-replicate section.

My main question here is how to use the ns() function when there is a batch effect. I also want to adjust for gender. Here is what I am doing, and I am wondering if there is a better way to do this. For example, can I use voom here? For the ns() function, I am using just one covariate.

X<-ns(x_ordered,df=5)
design2<-model.matrix(~X+gender_ordered+batch_ordered)
fit2<-lmFit(dat_ordered,design2)
fit2<-eBayes(fit2,trend=TRUE)
fitted_spline<-topTable(fit2,coef=2:6,n=Inf)


I am also trying a linear model as follows:

design=model.matrix(~x_ordered+gender_ordered+batch_ordered)
fit<-lmFit(dat_ordered,design)
fit<-eBayes(fit,trend = TRUE)
fitted_lm<-topTable(fit,coef=2,n=Inf)


Do I need to use the p-adjusted here? I am seeing nothing based on p-adjusted, but many based on just the p-values. This applies to both spline and linear model.

Also, using R's lm function, I am basically getting the same data.

I must say though that when I try a linear mixed effects using linear model, I am getting better fit in terms of the p-adjusted. Here, I am treating the batch effect as a random effect and everything else as fixed.
Then I am using lmertest to find significance in terms of the p-value. Finally, I am adjusting the p-value using the p.adjust() function. I am doing this without Limma, but I am wondering if there is a way to do this using Limma or similar packages.

  model<-lmer(y~x+gender+(1|batches),data=new_dat)


The problem with above is that it is a linear model. Is there a way to combine the above with spline.

limma edgeR rna • 298 views
0
Entering edit mode
@gordon-smyth
Last seen 47 minutes ago
WEHI, Melbourne, Australia

Normalization, batch correction, random effects, voom and p-value adjustment all work the same in limma when you have a continuous variate as when you don't. There is no extra complication -- it all works exactly the same way as usual.

Yes, you need to normalize, using either TMM or quantile normalization or whatever. So far you haven't described any normalization. No you can't normalize using low variance genes and the edgeR Guide doesn't suggest that. The edgeR Guide only mentions the use of control genes for dispersion estimation in the absence of replicates. The control genes are chosen on prior considerations, not from the data, so they're not low variance genes and they're not used for normalization.

Yes, you can use voom instead of limma-trend, same as usual.

Yes, you need to adjust for multiple testing.

You can fit batch as a random effect using duplicateCorrelation.

0
Entering edit mode

Hello Gordon, Thank you for your help.

For no replicates, how should I normalize the data? I thought TMM does not work without grouping? I decided to do cpm(log plus 1) to find an overall trend. My data is age data. Certainly, we have grouped the data, and did the normalization that way. We can group by, let's say, 10-20 yrs as one group, 20-30 as another group. We have generated data for that, and compared groups that way.
But what if I want to find the overall trend? Let's say, what gene trends up or down as you age?

0
Entering edit mode

Normalization is completely independent of whether you have replicates and it does not use groupings. And you do have replicates anyway. Your data all seems quite standard to me.

0
Entering edit mode

Hello Gordon, Thank you for the answer. In that case, this should not be very difficult. Nirad