Question

Interpreting results of sample-to-sample PCA/clustering and changing assignment of condition levels

0

Entering edit mode

fl ▴ 20

@fl-16173

Last seen 4 days ago

Germany

Hello,

I have a dataset consisting of 16 pooled libraries sequenced on three lanes (2x125bp, 350bp fragment size, 40M reads per library). I isolated RNA from the same type of tissue across different individuals. There are three levels for one condition ("behavior"), and 4-6 biological replicates per level. I assessed the quality of the data using DESeq2 to calculate sample-to-sample VST distances for PCA and hierarchical clustering. I noticed that one of the "level2" replicates clusters with the "level1" replicates. I was wondering what might be the best way to proceed in this case. "Level1" individuals become "level2" individuals because they change behavior throughout their lifespan. Perhaps that "level2" individual had very recently transitioned from "level1"; although we followed the same criteria for collecting all "level2" individuals in the field. Would it be recommendable to treat that "level2" individual as "level1", or perhaps consider "level1 + level2" individuals as a single category and compare against "level3", since I'm mostly interested in the genes up- and down-regulated in "level3"?

I really appreciate any help.

PCA result: https://ibb.co/gygwAJ

Heatmap of the sample-to-sample distances: https://ibb.co/iRg2jd

deseq2 DESeq2 • 720 views

ADD COMMENT • link 5.9 years ago fl ▴ 20

score 2 · Answer 1 · 2018-06-18

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 7 hours ago

United States

"I noticed that one of the level1 replicates clusters with the level2 replicates."

I wouldn't worry about the PCA plot, and I wouldn't reclassify the samples.

The PCA plot is just a two dimensional summary and so lots of information is obviously lost (it's the 2D summary which loses the least "information" in terms of total variance but nevertheless information must be lost), but you may possibly have numerous genes where you find statistical significant differences between level 1 and 2.