Question about normalizing and removing batch effects from bulk RNA-seq data from NCBI-SRA
I have about 5210 runs from ~500 studies and want to use this matrix for downstream analysis like expression analysis. I want to look at the expression of certain genes across certain conditions I'm interested in. I have the expression matrix and need to normalize the matrix and remove batch effects in the data. I read up online and it seems that I run svaseq on the count data to find batches and then run CombatSeq to which I provide the batches which can then remove the batches. Is this the right way to do this? There are a lot of things online and I'm immensely confused. I would appreciate any feedback! Thank you!

You don't use svaseq prior to ComBat_seq, they are different things.

If you already know the batches and simply want to remove the technical differences between batches, then ComBat_seq will do that for you. If you suspect that there are technical differences between samples, one possibility being a batch effect, then you can use svaseq to estimate surrogate variables that you then use in your linear model to account for the technical variability. Have you read the vignette?