Question

PCA of different bacterial species

0

Entering edit mode

Laura ▴ 10

@laura-24858

Last seen 25 days ago

Spain

Hi all! I have RNAseq data of different bacterial species in conditions A and B (three biological replicates each). I already performed differential expression analysis comparing condition B vs A for each species separately, but I would like to analyze how the different species react to the condition B. I was thinking of doing a PCA of the different species (including the 3 replicates) in the condition B set to see if certain species cluster together, i.e. they respond similar to the condition B. For that I have identified ~2500 single copy orthologues (orthologous groups that have only one gene in all species), so there are ~2500 orthologous genes across species. I was planning to filter the table of raw counts (previously obtained with featureCounts) to obtain the raw counts of these ~2500 orthologs across species replicates in condition B, normalize them with VST or rlog and perform PCA. Is this approach valid at least to get a general idea? or will I introduce too much bias so possible clusters wouldn't be reliable? is there a better way to do this? Thanks!

DESeq2 • 336 views

ADD COMMENT • link updated 4 weeks ago by Michael Love 41k • written 4 weeks ago by Laura ▴ 10

1

Entering edit mode

Your approach sounds reasonable for getting a general idea of how different bacterial species respond to condition B. Performing PCA (Principal Component Analysis) on the normalized counts of single-copy orthologous genes across species can help you identify patterns and potential clusters among the species in response to the condition. Normalization: Normalizing the raw counts using methods like VST (variance stabilizing transformation) or rlog (regularized logarithm transformation) is a good practice to account for differences in sequencing depth and library size across samples. Ensure that the normalization method you choose is appropriate for your data and does not introduce bias. Dimensionality Reduction: PCA is a form of dimensionality reduction that projects high-dimensional data onto a lower-dimensional space while preserving the variance. It can help visualize relationships and patterns in your data. However, consider whether PCA is the best method for your specific analysis or if other dimensionality reduction techniques like t-SNE (t-distributed stochastic neighbor embedding) or UMAP (uniform manifold approximation and projection) might be more suitable. Orthologous Genes Selection: Ensure that the ~2500 single-copy orthologous genes you selected are representative of the overall expression profile of each species and condition. It's essential to choose genes that are reliably expressed across all species and conditions to avoid introducing bias into your analysis. scratch geometry dash Replicates: Including biological replicates in your analysis is crucial for assessing the reproducibility of your results and increasing the statistical power of your analysis. Ensure that the replicates are properly accounted for during normalization and PCA. Interpretation: Interpret the results of the PCA cautiously and consider other factors that may influence the clustering of species, such as phylogenetic relatedness or specific biological pathways affected by condition B.

ADD REPLY • link 4 weeks ago governduffer ▴ 10

0

Entering edit mode

I recommend asking at biostars.org as this is not related to DESeq2 in particular.

ADD REPLY • link 4 weeks ago ATpoint ★ 4.0k

score 1 · Answer 1 · 2024-03-27

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 5 hours ago

United States

Subset after VST, not before.

ADD COMMENT • link 4 weeks ago Michael Love 41k