Dear All,
I would like to clarify certain doubts and have opinions on the rna-seq data analysis I have been doing.
I have RNA-Seq data from paired experiment i.e I have treated and untreated data for the same tissue which is sequenced in the same batch. I have data from 40 different biological samples ( 40 treated and 40 untreated ) and more to come. This data tends to be heterogenous, so I am trying to have higher number of replicates to identify true biological signal.
Now, as this is a paired design, there is no need to worry about the batch effects. But I need to do some exploratory data analysis to see if all the samples shows similar response or if there are any subset of samples that behave differently ( due to various reasons like degradation, stress etc ) . So to identify different responses in the large panel of samples, I am doing clustering analysis along with PCA. Here the problem with the clustering if I take the normalised counts is that when the counts of treat in a sample closely matches with the untreated of another sample, they both (treat and untreated) tend to cluster together. But I want to cluster the samples based on the response to treatment i.e I don't care about the counts but I care if the expression has changed in the sample w.r.t treatment. So I have started to work on the log2 fold change ( calculated from normTransform()
of DESeq2 ) of each pair such that the log2 fold change represents the change in expression with in each sample and this log2 fold changes can be compared among samples such that the change in expression of genes could tell me which samples respond to treatment. So instead of working on normalised counts, I am working on a matrix of log2 fold changes of 40 biological replicates.
I do not know if its the best way to do exploratory data analysis on pair wise log2 fold change matrix. The log2 fold changes tend to be very small. Are there any better ways to identify the samples that show a different response to treatment ? I am using the default normalisation for DE analysis. Should I consider any other normalisation methods for paired design ? Any inputs are greatly appreciated.
Do you have replicates for the tissues, or is it a single pair per tissue?
I have updated the post. I have 40 biological replicates ( from different patients ) of the same tissue, with treated and untreated data. I need to check which samples showing similar response and if there are any non responders and samples that are responding differently ( due to stress or the the age/gender/BMI of patients etc ).