I am using the DeSeq2 package in Rstudio to analyze some RNAseq data for two different strains (COH1 & A909) of group B Streptococcus. The pipeline my lab has established was easy enough to tweak to my usage, but what I would like to do is pool my A909 and COH1 data together for analysis since they are share the majority of their genomes. Unfortunately, the gene description files I have use each strains unique locus tags, so I cannot combine them. I have the raw reads and the genome reference files, but the initial alignment was actually done by the sequencing company and I am new to using R so I am unsure what steps I would need to take to combine the reads of the different strains. The code below is a sample of my current workflow for analysis of a single strain.
#Create matrix and generate normalized read counts
DiabeticStatusMatrix <- DESeqDataSetFromMatrix(countData = CountDF, colData = MetaDF, design = ~Diabetic_status) #creates DESeq data matrix
DESeqDiabeticStatus <- DESeq(DiabeticStatusMatrix) #run DESeq2 on all samples: the program includes library size normalization
normCountsDiabeticStatus <- counts(DESeqDiabeticStatus, normalized = T) #make a list of normalized read counts
View(normCountsDiabeticStatus) #check output
write.csv(normCountsDiabeticStatus, "COH1_NormCounts_DiabeticStatus.csv") #export normalized read count
#Diabetic vs Nondiabetic analysis
DiabeticVsNonDEGs <- results(DESeqDiabeticStatus, contrast = c("Diabetic_status", "Diabetic","Nondiabetic"), alpha = 0.05, lfcThreshold = 1) #test for DEGs
print(DiabeticVsNonDEGs) #check output
summary(DiabeticVsNonDEGs) #summary of number of significantly up or down regulated genes
write.csv(DiabeticVsNonDEGs, "DEGs_DiabeticVsNon.csv") #Pull out all DEGs
SigDiabeticVsNon <- subset(DiabeticVsNonDEGs, padj < 0.05) #Pull out only significant DEGs that have a corrected p-value of < 0.05 (default is 0.1)
View(SigDiabeticVsNon) #check output
SigDvsN <- as.data.frame(SigDiabeticVsNon) #convert to data frame to merge with gene descriptions
View(SigDvsN) #check output
setDT(SigDvsN, keep.rownames = "Locustag") #changes locustags from row names to a data column for merging with gene descriptions
View(SigDvsN) #check that Locustag is now a column
SigDiabeticVsNon_GeneDesc <- merge(SigDvsN, GeneDescDF, by= "Locustag") #adds the gene names and descriptions to the significantly DEGs
write.csv(SigDiabeticVsNon_GeneDesc, "SigDEGs_DiabeticVsNon.csv") #export the list of significant DEGs with the gene descriptions