Higher Dimensional RNASeq Clustering Significance
2
0
Entering edit mode
James • 0
@73ef4518
Last seen 3 days ago
United States

Looking at the principal components of our RNASeq data, there is clear separation between the diseased and controlled, however, this separation is in the 5th principal component, which only accounts for 0.45% of variance. There is no clear separation in the lower dimensions, which mostly show batch separation.

How can I statistically leverage the genes associated with this PC when they aren't differentially expressed in DESeq2? I've attached an image of the plot.

DESeq2 RNASeq PrincipalComponent pcaExplorer • 141 views
0
Entering edit mode

You could perform GO on the genes that contribute the most to the variation along PC5, but indeed there is very very small difference between disease and control samples. Have you tried to perform a GSEA ?

0
Entering edit mode
@james-w-macdonald-5106
Last seen 18 hours ago
United States

The conventional answer is to adjust for the batch and other unobserved variability in the linear model using e.g., a batch factor and likely additional surrogate variables (using svaseq from the sva package), presuming that batch is orthogonal to your variable of interest.

0
Entering edit mode
@f1c1bda2
Last seen 8 days ago
Vietnam

A GO analysis could be run on the genes responsible for the majority of the PC5 variation, but the truth is that there is barely any difference between diseased and healthy samples. basketball stars