Search
Question: RUVg with differential transcript expression?
0
5 months ago by
Clemson
Brian Gudenas0 wrote:

Hello all,

I am curious if the RUVg normalization method (ERCC spike in normalization) is applicable to transcript-level counts for differential transcript expression, I know RUVseq was developed for gene-level counts but it seems like it would still be OK for transcript level count analysis?

My  idea is to use Kallisto + Sleuth for DTE analysis coupled with RUVg spike in normalization since my samples are from different RNA library protocols (Poly A+, Poly A-, total RNA).

so = sleuth_prep(s2c_tmp, transformation_function = function(x) log2(x+0.5))

Expr = so$obs_norm %>% dplyr::select(target_id, sample, est_counts) Emat = tidyr::spread(Expr, key = sample, est_counts) ## need to round est_counts to integer for RUVseq Emat = round(Emat) # Filter transcripts filter <- apply(Emat, 1, function(x) length(x[x > 5]) >= 2) ##remove non-expressed transcripts filtered <- Emat[filter,] spikes <- rownames(filtered)[grep("^ERCC", rownames(filtered))] x <- as.factor(s2c_tmp$condition)

set <- newSeqExpressionSet(as.matrix(filtered),
phenoData = data.frame(x, row.names=colnames(filtered)))

## USE RUVg to normalize samples based on control spikeins
set1 <- RUVg(set, spikes, k=1)
## get normalization factors
wts = pData(set1)[ , 2:ncol(pData(set1))]

so$sample_to_covariates$W1 = wts

## Use W1 (RUVg factors) as batch
so <- sleuth_fit(so, ~ W1 + condition, 'full')
so <- sleuth_fit(so, ~ W1, 'reduced')
## Wald test for Log 2 Fold-change
so = sleuth_wt(so, "conditionX")

Does this approach/ code seem reasonable?

modified 5 months ago by davide risso720 • written 5 months ago by Brian Gudenas0
2
5 months ago by
davide risso720
Weill Cornell Medicine
davide risso720 wrote:

Hi Brian,

I have no direct experience with applying RUVSeq factors to a Sleuth analysis, but your code looks reasonable to me. I would examine the distribution of p-values with and without RUV factors in the model to see if it helped (you want to see a uniform distribution plus a spike at 0 corresponding to DE genes).

Since you know the source of unwanted variation, another potentially useful approach is to include directly the indicator of the protocol as a covariate in the model. I would try both approaches and see which gives more reasonable results.

Hope this helps.

Davide