Question

sva: get a sv for RIN scores

0

Entering edit mode

Lina ▴ 10

@hakusen03-10599

Last seen 3 months ago

United States

Hello,

Experiment background: 5 genotypes (Factor A) 32 subjects 3 sampling times (Factor B) 9 plant samples per subject We are interested in comparing the changes in gene expression between genotypes after accounting for expression before infestation. We ran a variance partition analysis and found that RIN has a median of 5%, so we are considering to include RIN scores as a covariate in our analysis. Previous posts mentioned that using a surrogate variable for RIN scores works better in RNAseq analysis instead of using the original RIN scores as a covariate. I have some questions on how to set up SVA to get the surrogate variable.

Do we set up the null model with our without RIN score?

mod = model.matrix(~0 + genotype + time.dpi + genotype*time.dpi + RIN_score, data= samples)
mod0 = model.matrix(~1, data= samples)
or
mod0 = model.matrix(~1 + RIN_score, data= samples)

Once we get the sv, do we include it in the model as modsv = model.matrix( ~ 0 + genotype + time.dpi + genotype*time.dpi + SV1)?

Below is the code we have run so far. We would appreciate any feedback on it. We noticed that the cor$consensus value dropped from 0.1 to 0.07 when including SV1 instead of RIN scores. Is this expected? Will it be better to not include RIN score in the model?

y <- DGEList(counts=x.2, genes=annotation)
design <- model.matrix(~ 0 + genotype + time.dpi + genotype*time.dpi + RIN_score)
keep <- filterByExpr(y, design)
y <- y[keep,,keep.lib.sizes=FALSE] 
y <- calcNormFactors(y, method="TMM")
y <- estimateDisp(y, design, robust=TRUE)

v <- voom(y, design)
cor<-duplicateCorrelation(v, design, block = subject)
cor$consensus
v <- voom(y, desig, block = subject, correlation = cor$consensus)
cor<-duplicateCorrelation(v, design, block = subject)
cor$consensus


# variance partition analysis step
form <- ~ (1| genotype) + (1| time.dpi) + (1| genotype:time.dpi) + (1 | subject)  + RIN_score 
vp <- fitExtractVarPartModel(v, form, samples)
plotVarPart(sortCols(vp))

# sva to get an sv for RIN score

cpm.tab <- cpm(y, normalized.lib.sizes=TRUE) 

mod = model.matrix(~0 + genotype + time.dpi + genotype*time.dpi + RIN_score, data= samples)
mod0 = model.matrix(~1, data= samples)

nsv = num.sv(cpm.tab, mod, method="be")
sva1 = svaseq(cpm.tab, mod, mod0, n.sv= nsv)
colnames(sva1$sv) <- "SVA1"
samples.sva = cbind(samples, sva1$sv)

y <- DGEList(counts=x.2, genes = annotation)
modsv <- model.matrix(~0 + genotype + time.dpi + genotype*time.dpi + samples.sva$SVA1)
keep <- filterByExpr(y, modsv)
y <- y[keep,,keep.lib.sizes=FALSE] 
table(keep)
y <- calcNormFactors(y, method="TMM")
y <- estimateDisp(y, modsv, robust=TRUE)

# possible way to analyze gene expression data

v <- voom(y, modsv)
cor<-duplicateCorrelation(v, modsv, block = subject)
cor$consensus
v <- voom(y, modsv, block = subject, correlation = cor$consensus)
cor<-duplicateCorrelation(v, modsv, block = subject)
cor$consensus

fit<-lmFit(v, modsv, block= subject, correlation=cor$consensus)
fit<-eBayes(fit)
summary(decideTests(fit))


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )

limma sva variancePartition voom RNASeq • 578 views

ADD COMMENT • link updated 3 months ago by Gordon Smyth 52k • written 3 months ago by Lina ▴ 10

score 0 · Answer 1 · 2024-12-06

Surrogate variables are not constructed from known variates, so one doesn't get surrogate variables for RIN scores. I am guessing that the suggestions that you have read have probably been to use SVA or RUV instead of RIN. Anyway, if you are not planning to include RIN in the final DE analysis, then it must not be in the models input to svaseq either.

You would find it useful to use voomLmFit instead of the multi-step procedure you are currently using with voom. voomLmFit iterates the duplicate correlation automatically, as well as having other advantages. You might also try using voomLmFit's sample weights with sample.weights=TRUE, as an alternative to surrogate variables.