I have a dataset consisting of 16 pooled libraries sequenced on three lanes (2x125bp, 350bp fragment size, 40M reads per library). I isolated RNA from the same type of tissue across different individuals. There are three levels for one condition ("behavior"), and 4-6 biological replicates per level. I assessed the quality of the data using DESeq2 to calculate sample-to-sample VST distances for PCA and hierarchical clustering. I noticed that one of the "level2" replicates clusters with the "level1" replicates. I was wondering what might be the best way to proceed in this case. "Level1" individuals become "level2" individuals because they change behavior throughout their lifespan. Perhaps that "level2" individual had very recently transitioned from "level1"; although we followed the same criteria for collecting all "level2" individuals in the field. Would it be recommendable to treat that "level2" individual as "level1", or perhaps consider "level1 + level2" individuals as a single category and compare against "level3", since I'm mostly interested in the genes up- and down-regulated in "level3"?
PCA result: https://ibb.co/gygwAJ
Heatmap of the sample-to-sample distances: https://ibb.co/iRg2jd