Combining two closely related bacterial annotated genomes to analyze as one pooled group
0
0
Entering edit mode
brayas13 • 0
@848f7540
Last seen 5 weeks ago
United States

I am using the DeSeq2 package in Rstudio to analyze some RNAseq data for two different strains (COH1 & A909) of group B Streptococcus. The pipeline my lab has established was easy enough to tweak to my usage, but what I would like to do is pool my A909 and COH1 data together for analysis since they are share the majority of their genomes. Unfortunately, the gene description files I have use each strains unique locus tags, so I cannot combine them. I have the raw reads and the genome reference files, but the initial alignment was actually done by the sequencing company and I am new to using R so I am unsure what steps I would need to take to combine the reads of the different strains. The code below is a sample of my current workflow for analysis of a single strain.

#Create matrix and generate normalized read counts
DiabeticStatusMatrix <- DESeqDataSetFromMatrix(countData = CountDF, colData = MetaDF, design = ~Diabetic_status) #creates DESeq data matrix 
DESeqDiabeticStatus <- DESeq(DiabeticStatusMatrix) #run DESeq2 on all samples: the program includes library size normalization
normCountsDiabeticStatus <- counts(DESeqDiabeticStatus, normalized = T) #make a list of normalized read counts
View(normCountsDiabeticStatus) #check output
write.csv(normCountsDiabeticStatus, "COH1_NormCounts_DiabeticStatus.csv") #export normalized read count
#Diabetic vs Nondiabetic analysis
DiabeticVsNonDEGs <- results(DESeqDiabeticStatus, contrast = c("Diabetic_status", "Diabetic","Nondiabetic"), alpha = 0.05, lfcThreshold = 1) #test for DEGs
print(DiabeticVsNonDEGs) #check output
summary(DiabeticVsNonDEGs) #summary of number of significantly up or down regulated genes
write.csv(DiabeticVsNonDEGs, "DEGs_DiabeticVsNon.csv") #Pull out all DEGs
SigDiabeticVsNon <- subset(DiabeticVsNonDEGs, padj < 0.05) #Pull out only significant DEGs that have a corrected p-value of < 0.05 (default is 0.1)
View(SigDiabeticVsNon) #check output
SigDvsN <- as.data.frame(SigDiabeticVsNon) #convert to data frame to merge with gene descriptions
View(SigDvsN) #check output
setDT(SigDvsN, keep.rownames = "Locustag") #changes locustags from row names to a data column for merging with gene descriptions
View(SigDvsN) #check that Locustag is now a column
SigDiabeticVsNon_GeneDesc <- merge(SigDvsN, GeneDescDF, by= "Locustag") #adds the gene names and descriptions to the significantly DEGs 
write.csv(SigDiabeticVsNon_GeneDesc, "SigDEGs_DiabeticVsNon.csv") #export the list of significant DEGs with the gene descriptions
genomes RNASeq • 146 views
ADD COMMENT

Login before adding your answer.

Traffic: 348 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6