sva: get a sv for RIN scores
1
0
Entering edit mode
Lina ▴ 10
@hakusen03-10599
Last seen 1 day ago
United States

Hello,

Experiment background: 5 genotypes (Factor A) 32 subjects 3 sampling times (Factor B) 9 plant samples per subject We are interested in comparing the changes in gene expression between genotypes after accounting for expression before infestation. We ran a variance partition analysis and found that RIN has a median of 5%, so we are considering to include RIN scores as a covariate in our analysis. Previous posts mentioned that using a surrogate variable for RIN scores works better in RNAseq analysis instead of using the original RIN scores as a covariate. I have some questions on how to set up SVA to get the surrogate variable.

  1. Do we set up the null model with our without RIN score?

    mod = model.matrix(~0 + genotype + time.dpi + genotype*time.dpi + RIN_score, data= samples)
    mod0 = model.matrix(~1, data= samples)
    or
    mod0 = model.matrix(~1 + RIN_score, data= samples)
    
  2. Once we get the sv, do we include it in the model as modsv = model.matrix( ~ 0 + genotype + time.dpi + genotype*time.dpi + SV1)?

Below is the code we have run so far. We would appreciate any feedback on it. We noticed that the cor$consensus value dropped from 0.1 to 0.07 when including SV1 instead of RIN scores. Is this expected? Will it be better to not include RIN score in the model?

y <- DGEList(counts=x.2, genes=annotation)
design <- model.matrix(~ 0 + genotype + time.dpi + genotype*time.dpi + RIN_score)
keep <- filterByExpr(y, design)
y <- y[keep,,keep.lib.sizes=FALSE] 
y <- calcNormFactors(y, method="TMM")
y <- estimateDisp(y, design, robust=TRUE)

v <- voom(y, design)
cor<-duplicateCorrelation(v, design, block = subject)
cor$consensus
v <- voom(y, desig, block = subject, correlation = cor$consensus)
cor<-duplicateCorrelation(v, design, block = subject)
cor$consensus


# variance partition analysis step
form <- ~ (1| genotype) + (1| time.dpi) + (1| genotype:time.dpi) + (1 | subject)  + RIN_score 
vp <- fitExtractVarPartModel(v, form, samples)
plotVarPart(sortCols(vp))

# sva to get an sv for RIN score

cpm.tab <- cpm(y, normalized.lib.sizes=TRUE) 

mod = model.matrix(~0 + genotype + time.dpi + genotype*time.dpi + RIN_score, data= samples)
mod0 = model.matrix(~1, data= samples)

nsv = num.sv(cpm.tab, mod, method="be")
sva1 = svaseq(cpm.tab, mod, mod0, n.sv= nsv)
colnames(sva1$sv) <- "SVA1"
samples.sva = cbind(samples, sva1$sv)

y <- DGEList(counts=x.2, genes = annotation)
modsv <- model.matrix(~0 + genotype + time.dpi + genotype*time.dpi + samples.sva$SVA1)
keep <- filterByExpr(y, modsv)
y <- y[keep,,keep.lib.sizes=FALSE] 
table(keep)
y <- calcNormFactors(y, method="TMM")
y <- estimateDisp(y, modsv, robust=TRUE)

# possible way to analyze gene expression data

v <- voom(y, modsv)
cor<-duplicateCorrelation(v, modsv, block = subject)
cor$consensus
v <- voom(y, modsv, block = subject, correlation = cor$consensus)
cor<-duplicateCorrelation(v, modsv, block = subject)
cor$consensus

fit<-lmFit(v, modsv, block= subject, correlation=cor$consensus)
fit<-eBayes(fit)
summary(decideTests(fit))

# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
limma sva variancePartition voom RNASeq • 419 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

Surrogate variables are not constructed from known variates, so one doesn't get surrogate variables for RIN scores. I am guessing that the suggestions that you have read have probably been to use SVA or RUV instead of RIN. Anyway, if you are not planning to include RIN in the final DE analysis, then it must not be in the models input to svaseq either.

You would find it useful to use voomLmFit instead of the multi-step procedure you are currently using with voom. voomLmFit iterates the duplicate correlation automatically, as well as having other advantages. You might also try using voomLmFit's sample weights with sample.weights=TRUE, as an alternative to surrogate variables.

ADD COMMENT
0
Entering edit mode

Agreed. I'll add that with small sample size, a covariate can explain some fraction of expression variation even under the null.

ADD REPLY

Login before adding your answer.

Traffic: 929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6