**480**wrote:

Dear all,

i am aiming to integrate limma and sva for RNAseq analysis, and i would appreciate please an advice on the following R code (it is below, is does it look correct ?), and your insights on 2 questions :

1 -- could SVASEQ work on logCPM values ? It works with raw counts, however I am receiving errors when using logCPM or CPM in svaseq function.

2 -- if SVA computes the surrogate variable based on RAW counts, and LIMMA (typically) uses logCPM, is it OK to do : fit <- lmFit(logCPM, modSv); where modSV has been estimated based on RAW counts ?

The R code that I am using is below :

`library("edgeR")`

library("sva")

`pheno <- read.delim("pheno.data", row.names="Samples")`

`#> pheno`

# sample subject experiment

#siCTL1 1 1 siCTL

#siCTL2 2 2 siCTL

#siCTL3 3 3 siCTL

#siX1 4 1 siX

#siX2 5 2 siX

#siX3 6 3 siX

`eset <- as.matrix(read.delim("eset.data", row.names="Gene"))`

`#> head(eset)`

# siCTL1 siCTL2 siCTL3 siX1 siX2 siX3

#Xkr4 51 0 0 50 0 0

#Rp1 6 0 0 1 0 0

#Sox17 373 0 0 206 0 0

#Mrpl15 973 2158 1251 1079 1749 1293

#Lypla1 6419 4172 882 5016 3001 1398

#Tcea1 2681 1942 717 2213 1404 929

`### keeping only non-zero values in eset`

nonzero_rows = apply(eset, 1, function(x) !all(x==0) )

eset_nonzero <- eset[nonzero_rows,]

`### setting up the FACTOR GROUP`

group <- factor(pheno$experiment)

group <- relevel(group, ref="siCTL")

`### setting up the FACTOR SUBJECT`

subject <- factor(pheno$subject)

#subject <- factor(c(1,2,3,1,2,3))

`## setting up the DESIGN MATRIX for EDGER/LIMMA`

design <- model.matrix(~group+subject, data=pheno)

`## estimating size factors on raw counts`

## running svaseq on normalized counts

`### slightly more filtering based on CPM values.`

`y <- DGEList(counts=eset_nonzero, group=group)`

`keep <- rowSums(cpm(y) > 0.5) >= 6`

y <- y[keep,,keep.lib.sizes=FALSE]

`y <- calcNormFactors(y)`

logCPM <- cpm(y,log=TRUE,prior.count=3)

plotMDS(logCPM)

`### the code related to SVA ....`

`mod = model.matrix(~group+subject, data=pheno)`

mod0 = model.matrix(~1, data=pheno)

`n.sv = num.sv(eset_nonzero, mod, method="leek")`

## n.sv = num.sv(logCPM, mod, method="leek")

## n.sv = num.sv(eset_nonzero, mod, method="be)

`svobj = svaseq(eset_nonzero, mod, mod0, n.sv=n.sv)`

plot(svobj$sv,pch=19,col="blue")

`modSv = cbind(mod, svobj$sv)`

`# mod0Sv = cbind(mod0, svobj$sv)`

# pValuesSv = f.pvalue(eset_nonzero,modSv,mod0Sv)

# qValuesSv = p.adjust(pValuesSv,method="fdr")

`## instead of :`

## fit <- lmFit(logCPM, design)

`fit <- lmFit(logCPM, modSv)`

fit <- eBayes(fit, trend=TRUE, robust=TRUE)

`results <- topTable(fit, coef=2, adjust="fdr", number=Inf)`

write.table(results, file="zz.be.results.edgeR.txt",

sep="\t", eol="\n", row.names=TRUE, col.names=TRUE)