Question: RUVg with differential transcript expression?
gravatar for Brian Gudenas
11 months ago by
Brian Gudenas0 wrote:

Hello all,

I am curious if the RUVg normalization method (ERCC spike in normalization) is applicable to transcript-level counts for differential transcript expression, I know RUVseq was developed for gene-level counts but it seems like it would still be OK for transcript level count analysis?

My  idea is to use Kallisto + Sleuth for DTE analysis coupled with RUVg spike in normalization since my samples are from different RNA library protocols (Poly A+, Poly A-, total RNA).

 so = sleuth_prep(s2c_tmp, transformation_function = function(x) log2(x+0.5))

 Expr = so$obs_norm %>%  dplyr::select(target_id, sample, est_counts)
 Emat = tidyr::spread(Expr, key  = sample, est_counts)

 ## need to round est_counts to integer for RUVseq
 Emat = round(Emat)

 # Filter transcripts
 filter <- apply(Emat, 1, function(x) length(x[x > 5]) >= 2) ##remove non-expressed transcripts
 filtered <- Emat[filter,]

 spikes <- rownames(filtered)[grep("^ERCC", rownames(filtered))]
 x <- as.factor(s2c_tmp$condition)

 set <- newSeqExpressionSet(as.matrix(filtered),
                             phenoData = data.frame(x, row.names=colnames(filtered)))

  ## USE RUVg to normalize samples based on control spikeins
  set1 <- RUVg(set, spikes, k=1)
  ## get normalization factors
  wts = pData(set1)[ , 2:ncol(pData(set1))]

  so$sample_to_covariates$W1 = wts

  ## Use W1 (RUVg factors) as batch
  so <- sleuth_fit(so, ~ W1 + condition, 'full')
  so <- sleuth_fit(so, ~ W1, 'reduced')
  ## Wald test for Log 2 Fold-change
  so = sleuth_wt(so, "conditionX")


Does this approach/ code seem reasonable?

ADD COMMENTlink modified 11 months ago by davide risso780 • written 11 months ago by Brian Gudenas0
gravatar for davide risso
11 months ago by
davide risso780
Weill Cornell Medicine
davide risso780 wrote:

Hi Brian,

I have no direct experience with applying RUVSeq factors to a Sleuth analysis, but your code looks reasonable to me. I would examine the distribution of p-values with and without RUV factors in the model to see if it helped (you want to see a uniform distribution plus a spike at 0 corresponding to DE genes).

Since you know the source of unwanted variation, another potentially useful approach is to include directly the indicator of the protocol as a covariate in the model. I would try both approaches and see which gives more reasonable results.

Hope this helps.


ADD COMMENTlink written 11 months ago by davide risso780

Thanks for your help Davide, just wanted to post a quick follow-up if this may help others. I checked the p-value distributions and  sleuth + RUVg does appear uniform with a spike a 0, so it does appear to work.  I also tried your second approach, solely including the protocol as a covariate, similar p-value distribution, however i found about 25% more differentially expressed transcripts (qval < 0.05) than the Sleuth + RUVg method, so i went with this approach. 

ADD REPLYlink written 11 months ago by Brian Gudenas0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 175 users visited in the last hour