I am trying to examine protein activity changes in diseased vs healthy skin using VIPER. I used 92 diseased and 82 healthy RNA-Seq samples to reverse engineer a skin-specific transcriptional network using the combined diseased and healthy samples with ARACNe.
Question 1: Is it OK to build the transcriptional network from both healthy and diseased samples together?
Following the VIPER vignette, I am able to generate protein activity scores for each sample and compare activity levels for specific proteins
#emat is an expression matrix
vpres <- viper(emat, regulons, verbose = TRUE)
dim(vpres)
# 7066 174 - includes scores for diseased and healthy samples
protein_id = 'LAG3'
t.test(vpres[protein_id, disease_samples], vpres[protein_id, healthy_samples])
The vignette also mentions that estimating a null model is more accurate. Following the vignette's code
# These are subsets of emat with the relevant samples
diseaseMat <- emat[, diseaseIdx]
healthyMat <- emat[, healthyIdx]
vpsig <- viperSignature(diseaseMat, healthyMat, cores = 2)
vpres2 <- viper(vpsig, regulons, verbose = TRUE)
dim(vpres2)
# 7066 92 - only includes scores for disease samples
Now vpres2
only has values for the diseased samples but not healthy samples. This prevents me from using a two-sample
t-test to compare diseased vs healthy protein values.
Question 2: How do I compare protein activity in healthy vs disease tissues when using a permutation-based null model?