Hello, I have a basic question about RNA-seq data preprocessing and I am writing to seek clarification. I currently have two different RNA-seq datasets. One is composed of Ensembl gene IDs, while the other is composed of Ensembl gene IDs with version. Both are in raw count format and I want to merge them into a single dataset using sva::Combat_seq without losing any information. Is it possible to merge Ensembl gene IDs with and without version? Or should I use Biomart's getBM to find the common ones? Thank you for your help!
The obvious answer is to remove the version using gsub("\\..*", "", x) where x is the gene ID. Since you have differences in the gene IDs means that the datasets have been processe differently. That is not good, you should process identically to avoid batch effects. Even if processed the same in silico, be aware that batch correction has some assumptions, being that batch is not nested with the groups you have. Cannot comment further without details.