Hi - is it possible to correlate gene expression patterns with a linear numerical factor in DESeq? Here's the background: I have used DESeq to analyse gene expression in 40 RNAseq samples; three sample groups.
I have noticed on the PCA that gender is a very strong determinant (=PC1) and one of the other components (PC3) also separates the samples relatively well, into 2 rough clusters, but this clustering pattern or sample grouping does not correlate with any of the known factors I have for the samples. I have an image which shows this pattern - not sure how to upload/attach it though; the image button wants a URL but this is a PNG file (not on the web).
I was thinking of either splitting the samples into two groups (low and high) based on an arbitrary cutoff roughly half way along the PC, or perhaps using the principle component factor value (from the pca$x table) to find genes that correlate their expression with this value across the samples, to find the genes responsible for this stratification in PC3.
This is kind of a long shot as my main factor(s) didn't show any significant changes, and I'm wondering whether this factor could show some biological significance in itself (depending on the types of genes changed) or by accounting for it in the design model matrix, and removing it's individual effects which could be muddying the waters for my main factors, e.g. ~gender+PC3_factor+condition.
As always, here's my disclaimer: I'm learning this as I go along, no formal training in maths or stats, just a biologist moving toward bioinformatics as a set of skills essential for my field. I hope I've asked the question in a clear and concise way and not broken too many rules. As I don't have a specific problem or error with a package there is no code or sessionInfo() to include (I don't think).
Cheers for any help or input!