I’m analyzing some RNA-Seq data for another investigator. The experimental design is an infection time course with 10 samples (2 biological replicates for each timepoint except 2), 3 timepoints (0h, 12h, 36h) and two strains of mice (WT, and ATG knockout (KO)):
The trimmed fastq files were aligned using bowtie2 and count files were generated using HTSeq. The following R code was used:
Condition<-c("WT.0h","WT.0h","WT.12h","WT.12h","WT.36h","WT.36h","KO.0h","KO.12h","KO.36h","KO.36h") RNA_data = DGEList(counts=Count_Table[,c(2,3,4:7,10:13)], group=Condition,genes = Count_Table$Gene_ID) # filter low expressing genes #Keeping Genes with greater than 1 count per million (cpm) in at least 2 samples # 1 cpm corresponds to a read count of aproximately 6 keep<-rowSums(cpm(RNA_data)>1)>=1 RNA_data_filtered<-RNA_data[keep,,keep.lib.sizes=F] dim(RNA_data_filtered) # normalization RNA_data_filtered<- calcNormFactors(RNA_data_filtered) #setting up design matrix Condition<-factor(Condition) design<-model.matrix(~0+Condition,data=RNA_data_filtered$samples) colnames(design)<-levels(Condition) # estimate dispersion RNA_data_filtered<-estimateGLMCommonDisp(RNA_data_filtered,design,verbose=T) #Disp = 0.03235 , BCV = 0.1799 RNA_data_filtered<-estimateGLMTrendedDisp(RNA_data_filtered,design) RNA_data_filtered<-estimateGLMTagwiseDisp(RNA_data_filtered,design) plotBCV(RNA_data_filtered) plotMDS(RNA_data_filtered)
I have 3 questions/concerns:
- Can I perform contrasts with the treatment groups that have no replicates and still get meaningful results?
- The BCV distribution seems a bit odd (I thought that as the logCPM increases the BCV values should remain the same, not increase as I see here, but I could be wrong).
- The MDS plot shows that the two samples in the WT 0h group are very dissimilar, and the K0 12h infected sample is more similar to the 0h uninfected samples.
Am I right in thinking that the odd BCV distribution is due to a possible mix-up between the WT.0h.b sample and the KO.12h sample?