I am trying to examine protein activity changes in diseased vs healthy skin using VIPER. I used 92 diseased and 82 healthy RNA-Seq samples to reverse engineer a skin-specific transcriptional network using the combined diseased and healthy samples with ARACNe.
Question 1: Is it OK to build the transcriptional network from both healthy and diseased samples together?
Following the VIPER vignette, I am able to generate protein activity scores for each sample and compare activity levels for specific proteins
#emat is an expression matrix vpres <- viper(emat, regulons, verbose = TRUE) dim(vpres) # 7066 174 - includes scores for diseased and healthy samples protein_id = 'LAG3' t.test(vpres[protein_id, disease_samples], vpres[protein_id, healthy_samples])
The vignette also mentions that estimating a null model is more accurate. Following the vignette's code
# These are subsets of emat with the relevant samples diseaseMat <- emat[, diseaseIdx] healthyMat <- emat[, healthyIdx] vpsig <- viperSignature(diseaseMat, healthyMat, cores = 2) vpres2 <- viper(vpsig, regulons, verbose = TRUE) dim(vpres2) # 7066 92 - only includes scores for disease samples
vpres2 only has values for the diseased samples but not healthy samples. This prevents me from using a two-sample
t-test to compare diseased vs healthy protein values.
Question 2: How do I compare protein activity in healthy vs disease tissues when using a permutation-based null model?