Hi,
I am working with the RNA-Seq dataset (one sample per condition per subject). There are a total of 6 subjects, of which 3 are unrelated healthy subjects (each healthy subject with 3 treatments + 1 untreated) and 1 patient (3 treatments + 1 untreated) + 2 relatives (3 treatments + 1 untreated). I calculated log2FC values for each subject and treatment condition relative to the untreated condition of the same subject using edgeR
. I subsetted and combined the log2FC
of each subject (now contains; Treatment 1, Treatment 2, and Treatment 3 w.r.t untreated) and have combined the data.frame (see below, example) in R.
How do I calculate and consider the genes with a |log2FC| >=1
in at least 2 out of 3 unrelated healthy subjects in vitro treatment-responsive genes. Additionally, I would like to residual response.
Note
: The residual responses of the subjects were calculated based on the number of responsive genes passing the above filter in 2 out of 3 healthy subjects (= number of responsive genes in a subject / total number of responsive genes in healthy controls) × 100). Residual responses are described in (PubMed: 31784499, 34427831, 34214472, 30143481). We would normally start establishing residual response based on the three healthy subjects and then check in the patient and 2 relatives how they deviate from the normal response.
Data
print(tail(log2FC))
HC1_Treatment_1.logFC HC1_Treatment_2.logFC HC1_Treatment_3.logFC
Gene_15 0.07164503 0.59904274 0.0278425
Gene_16 -0.28451267 0.62414548 -0.1684542
Gene_17 -0.03066383 -0.30650261 -0.2355185
Gene_18 -0.48784162 -0.15523340 -0.1797719
Gene_19 2.00000000 0.19152352 -0.9262081
Gene_20 0.95276316 0.07279556 3.0000000
HC2_Treatment_1.logFC HC2_Treatment_2.logFC HC2_Treatment_3.logFC
Gene_15 0.2327490 0.23485943 0.4010577
Gene_16 0.5390979 0.25679592 0.2962973
Gene_17 -0.2327766 -0.07059749 -0.3062303
Gene_18 0.1934787 -0.13324728 0.0931452
Gene_19 0.1658384 1.90000000 -0.1275337
Gene_20 4.0000000 5.60000000 0.2044699
HC3_Treatment_1.logFC HC3_Treatment_2.logFC HC3_Treatment_3.logFC
Gene_15 -0.03304623 0.254568488 0.27022068
Gene_16 -0.21044532 0.407651132 0.07366016
Gene_17 0.10053437 -0.000781975 0.18696837
Gene_18 -0.11581915 0.069750175 0.20076677
Gene_19 -0.82945255 0.218905487 5.00000000
Gene_20 0.16254931 4.000000000 -0.35071090
P1_Treatment_1.logFC P1_Treatment_2.logFC P1_Treatment_3.logFC
Gene_15 -0.2997141 -0.5171820 -0.466957656
Gene_16 0.3445041 -0.3491658 -0.010136887
Gene_17 0.9525266 0.8360276 0.255081070
Gene_18 1.0563388 0.8489729 0.530710073
Gene_19 6.0000000 0.4492499 0.009571011
Gene_20 6.0000000 0.1065795 0.624807496
R1_Treatment_1.logFC R1_Treatment_2.logFC R1_Treatment_3.logFC
Gene_15 -0.03307696 -0.07008209 0.006218129
Gene_16 0.66286592 1.11764324 0.451419358
Gene_17 -0.09582274 -0.36306320 -0.379897186
Gene_18 0.47976491 0.42302656 -0.164057640
Gene_19 4.40000000 -0.06769135 6.800000000
Gene_20 5.00000000 -0.50945501 0.060202575
R2_Treatment_1.logFC R2_Treatment_2.logFC R2_Treatment_3.logFC
Gene_15 0.3330158 0.05105085 0.07517517
Gene_16 0.5680208 0.39302658 0.20795389
Gene_17 0.1331289 -0.65545389 -0.27831143
Gene_18 0.2029048 -0.19068632 0.27433957
Gene_19 0.5384385 -0.98295489 -0.05461810
Gene_20 6.0000000 -0.29869977 -0.38437364
Code: filter criteria |log2FC| >= 1
# User-defined function to do the custom filtering:
my_filter <- function(xx) {
xx <- xx[!is.na(xx)]
sum(ifelse((xx >= 1) | (xx <= -1), TRUE, FALSE))
}
# Do it across all `Healthy` subject columns with "treatments" in the column name
library(dplyr)
log2FC %>%
summarise(across(contains("HC"), my_filter)) %>%
as.data.frame(row.names="counts") -> log2FC_v1
Best Regards,
Toufiq