I have to perform the differential gene expression analysis for 6 RNA-seq samples using DESeq2.
My colData looks like this:
sample tissue horse sample_60.bam ex_vivo 1 sample_65.bam ex_vivo 2 sample_75.bam ex_vivo 3 sample_80.bam in_vivo 4 sample_81.bam in_vivo 5 sample_82.bam in_vivo 6
I have biopsies collected from live animals (in_vivo) and collected from animals slaughtered at an abattoir (ex_vivo), and I want to investigate the differences in gene expression between these two "tissue" groups.
However, there are 6 distinct individuals and I wanted to perform the analysis controlling for individuals. But the matrix is not full rank (when I tried design ~ tissue + horse). But there is no way of adding that extra ".n" to the colData to run a matrix.model because the individuals only appear once (horses 1 to 6).
I ran the analysis as:
library(GenomicAlignments) bam_exp4 <-c("sample_60.bam", "sample_65.bam", "sample_75.bam", "sample_80.bam", "sample_81.bam", "sample_82.bam") library(Rsamtools) bamfiles <- BamFileList(bam_exp4) library(GenomicFeatures) txdb_exp4 <- makeTxDbFromGFF("Equus_caballus.EquCab2.89.gtf", format="gtf") ebg_exp4 <- exonsBy(txdb_exp4, by="gene") se_exp4 <- summarizeOverlaps(features=ebg_exp4, reads=bamfiles, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE ) sampleTableExp4 <- read.csv("exp4_metadata.csv") colData(se_exp4) <- DataFrame(sampleTableExp4) library(DESeq2) dds_exp4 <- DESeqDataSet(se_exp4, design = ~ tissue) dds_exp4 <- DESeq(dds_exp4) res_exp4 <- results(dds_exp4, contrast=c("tissue", "ex_vivo", "in_vivo"))
However, I am not sure that using the ~ tissue design is the correct thing to do.