Question

RUVg with differential transcript expression?

0

Entering edit mode

Brian Gudenas • 0

@brian-gudenas-14726

Last seen 6.2 years ago

Clemson

Hello all,

I am curious if the RUVg normalization method (ERCC spike in normalization) is applicable to transcript-level counts for differential transcript expression, I know RUVseq was developed for gene-level counts but it seems like it would still be OK for transcript level count analysis?

My idea is to use Kallisto + Sleuth for DTE analysis coupled with RUVg spike in normalization since my samples are from different RNA library protocols (Poly A+, Poly A-, total RNA).

 so = sleuth_prep(s2c_tmp, transformation_function = function(x) log2(x+0.5))

 Expr = so$obs_norm %>%  dplyr::select(target_id, sample, est_counts)
 Emat = tidyr::spread(Expr, key  = sample, est_counts)

 ## need to round est_counts to integer for RUVseq
 Emat = round(Emat)

 # Filter transcripts
 filter <- apply(Emat, 1, function(x) length(x[x > 5]) >= 2) ##remove non-expressed transcripts
 filtered <- Emat[filter,]

 spikes <- rownames(filtered)[grep("^ERCC", rownames(filtered))]
 x <- as.factor(s2c_tmp$condition)

 set <- newSeqExpressionSet(as.matrix(filtered),
                             phenoData = data.frame(x, row.names=colnames(filtered)))

  ## USE RUVg to normalize samples based on control spikeins
  set1 <- RUVg(set, spikes, k=1)
  ## get normalization factors
  wts = pData(set1)[ , 2:ncol(pData(set1))]

  so$sample_to_covariates$W1 = wts

  ## Use W1 (RUVg factors) as batch
  so <- sleuth_fit(so, ~ W1 + condition, 'full')
  so <- sleuth_fit(so, ~ W1, 'reduced')
  ## Wald test for Log 2 Fold-change
  so = sleuth_wt(so, "conditionX")

Does this approach/ code seem reasonable?

ruvseq sleuth • 2.0k views

ADD COMMENT • link updated 6.3 years ago by davide risso ▴ 950 • written 6.3 years ago by Brian Gudenas • 0

score 2 · Accepted Answer · 2018-01-10

2

Entering edit mode

davide risso ▴ 950

@davide-risso-5075

Last seen 5 weeks ago

University of Padova

Hi Brian,

I have no direct experience with applying RUVSeq factors to a Sleuth analysis, but your code looks reasonable to me. I would examine the distribution of p-values with and without RUV factors in the model to see if it helped (you want to see a uniform distribution plus a spike at 0 corresponding to DE genes).

Since you know the source of unwanted variation, another potentially useful approach is to include directly the indicator of the protocol as a covariate in the model. I would try both approaches and see which gives more reasonable results.

Hope this helps.

Davide

ADD COMMENT • link 6.3 years ago davide risso ▴ 950

0

Entering edit mode

Thanks for your help Davide, just wanted to post a quick follow-up if this may help others. I checked the p-value distributions and sleuth + RUVg does appear uniform with a spike a 0, so it does appear to work. I also tried your second approach, solely including the protocol as a covariate, similar p-value distribution, however i found about 25% more differentially expressed transcripts (qval < 0.05) than the Sleuth + RUVg method, so i went with this approach.

ADD REPLY • link 6.3 years ago Brian Gudenas • 0