PCA of different bacterial species
1
0
Entering edit mode
Laura ▴ 10
@laura-24858
Last seen 9 months ago
Spain

Hi all! I have RNAseq data of different bacterial species in conditions A and B (three biological replicates each). I already performed differential expression analysis comparing condition B vs A for each species separately, but I would like to analyze how the different species react to the condition B. I was thinking of doing a PCA of the different species (including the 3 replicates) in the condition B set to see if certain species cluster together, i.e. they respond similar to the condition B. For that I have identified ~2500 single copy orthologues (orthologous groups that have only one gene in all species), so there are ~2500 orthologous genes across species. I was planning to filter the table of raw counts (previously obtained with featureCounts) to obtain the raw counts of these ~2500 orthologs across species replicates in condition B, normalize them with VST or rlog and perform PCA. Is this approach valid at least to get a general idea? or will I introduce too much bias so possible clusters wouldn't be reliable? is there a better way to do this? Thanks!

DESeq2 • 876 views
ADD COMMENT
1
Entering edit mode

Your approach sounds reasonable for getting a general idea of how different bacterial species respond to condition B. Performing PCA (Principal Component Analysis) on the normalized counts of single-copy orthologous genes across species can help you identify patterns and potential clusters among the species in response to the condition. Normalization: Normalizing the raw counts using methods like VST (variance stabilizing transformation) or rlog (regularized logarithm transformation) is a good practice to account for differences in sequencing depth and library size across samples. Ensure that the normalization method you choose is appropriate for your data and does not introduce bias. Dimensionality Reduction: PCA is a form of dimensionality reduction that projects high-dimensional data onto a lower-dimensional space while preserving the variance. It can help visualize relationships and patterns in your data. However, consider whether PCA is the best method for your specific analysis or if other dimensionality reduction techniques like t-SNE (t-distributed stochastic neighbor embedding) or UMAP (uniform manifold approximation and projection) might be more suitable. Orthologous Genes Selection: Ensure that the ~2500 single-copy orthologous genes you selected are representative of the overall expression profile of each species and condition. It's essential to choose genes that are reliably expressed across all species and conditions to avoid introducing bias into your analysis. scratch geometry dash Replicates: Including biological replicates in your analysis is crucial for assessing the reproducibility of your results and increasing the statistical power of your analysis. Ensure that the replicates are properly accounted for during normalization and PCA. Interpretation: Interpret the results of the PCA cautiously and consider other factors that may influence the clustering of species, such as phylogenetic relatedness or specific biological pathways affected by condition B.

ADD REPLY
0
Entering edit mode

I recommend asking at biostars.org as this is not related to DESeq2 in particular.

ADD REPLY
0
Entering edit mode

Your approach of using ~2500 single-copy orthologs and normalizing with VST or rlog for PCA is valid for exploring general clustering papers, please trends across species in condition B. Ensure the orthologs are representative of biological processes to avoid bias. Normalization is crucial to minimize technical variation, and PCA can reveal similarities in species' responses.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Subset after VST, not before.

ADD COMMENT

Login before adding your answer.

Traffic: 475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6