Help creating a paired sample design matrix in edgeR
0
Entering edit mode
sfidemw • 0
@user-24736
Last seen 19 days ago

We are performing RNA-seq analysis in R using Rsubread and edgeR. We would like to be able to create a design matrix that allows our samples to be paired together.

We have been following the edgeR quasilikelihood pipeline (Link: Chen, Lu and Smith, 2020. We modified this pipeline to use glmLRT as opposed to the glmQLRT and then attempted to include multiple parameters in our design matrix, both our treatment condition as well as the pair identifier.

Our experimental design included one phytoplankton strain that was diluted to extinction for single cell isolation (Luria-Delbruck protocol) and from that, each of six clonal isolates were used to seed an individual paired treatment sample (400 ppm/800 ppm CO2).

In doing our analysis, we want to be able to compare between present (400ppm) and future(800ppm) conditions. However, we also want to be able to account for the within clonal variability that we may see within each of the paired treatment samples. Our y object has a group column (see below) where we labeled each sample with its treatment and pair ID. Our pairs should be Present.45--Future.45, etc. We understand that the difference in name may be causing an issue but are not sure how to troubleshoot this.

Additionally, our code is working but the design matrix is not structured in a way to allow us to make the desired comparisons. Looking at the output of our design matrix code, the only comparisons we can make are between the columns (TxFuture, TxPresent, Pair46, etc) We are unable to make a comparison between the two samples that are found in Pair 46, or Pair 47. We also are unsure of how to incorporate the within clonal variability in the overall Present-Future comparison.

As a final note, we are not posting any sessionInfo() information because we aren't technically getting any type of error message.

load("~/Desktop/database/Syn_analysis/Syn_9311.Rdata")

library(edgeR)

#call library--this is an organismal database that we curated ourselves
library(org.Ssp.CC9311.eg.db)

#modifying our DGEList object to include info from organismal database
y$genes$symbol <- mapIds(org.Ssp.CC9311.eg.db, rownames(y),
keytype = "GID", column = "GENENAME")
y$genes$product <- mapIds(org.Ssp.CC9311.eg.db, rownames(y),
keytype = "GID", column = "GENEPRODUCT")

#look at the samples included in our y object.
y$samples Output: group lib.size norm.factors SS0029.S66.L008.R1.001.fastq.gz.subread.BAM Future.45 2723395 0.7895361 SS0030.S67.L008.R1.001.fastq.gz.subread.BAM Future.50 1349356 0.9642628 SS0031.S68.L008.R1.001.fastq.gz.subread.BAM Future.46 1931205 0.8817554 SS0032.S69.L008.R1b.001.fastq.gz.subread.BAM Future.47 2371530 0.9668457 SS0033.S70.L008.R1.001.fastq.gz.subread.BAM Future.48 1662338 1.0713176 SS0034.S71.L008.R1.001.fastq.gz.subread.BAM Present.48 2485131 0.9618421 SS0036.S72.L008.R1.001.fastq.gz.subread.BAM Present.49 2397884 1.1038003 SS0037.S73.L008.R1.001.fastq.gz.subread.BAM Present.45 3102922 1.0451120 SS0038.S74.L008.R1.001.fastq.gz.subread.BAM Present.50 1853192 1.1969441 SS0039.S75.L008.R1.001.fastq.gz.subread.BAM Future.49 689415 1.0852899 SS0040.S20.L002.R1.001.fastq.gz.subread.BAM Present.47 1007293 1.1487667 SS035.S19.L002.R1.001.fastq.gz.subread.BAM Present.46 934291 0.868561 #now we are creating objects of both tx and paired sample ID Tx <- factor(targets$Tx)
Pair <- factor(targets$Pair) #create our design matrix design <- model.matrix(~0+Tx+Pair) rownames(design) <- levels(group) design Output: TxFuture TxPresent Pair46 Pair47 Pair48 Pair49 Pair50 [1,] 1 0 0 0 0 0 0 [2,] 1 0 0 0 0 0 1 [3,] 1 0 1 0 0 0 0 [4,] 1 0 0 1 0 0 0 [5,] 1 0 0 0 1 0 0 [6,] 0 1 0 0 1 0 0 [7,] 0 1 0 0 0 1 0 [8,] 0 1 0 0 0 0 0 [9,] 0 1 0 0 0 0 1 [10,] 1 0 0 0 0 1 0 [11,] 0 1 0 1 0 0 0 [12,] 0 1 1 0 0 0 0 attr(,"assign") [1] 1 1 2 2 2 2 2 attr(,"contrasts") attr(,"contrasts")$Tx
[1] "contr.treatment"

attr(,"contrasts")\$Pair
[1] "contr.treatment"

edgeR pairedsamples designmatrix • 77 views
0
Entering edit mode
Yunshun Chen • 720
@yunshun-chen-5451
Last seen 5 days ago
Australia

We are unable to make a comparison between the two samples that are found in Pair 46, or Pair 47.

You are using an additive model for your design matrix, which assumes the Present-Future differences are the same across all the pairs. You can't compare the two individual samples in each pair separately.

We also are unsure of how to incorporate the within clonal variability in the overall Present-Future comparison.

Do you mean comparing Present with Future in each Pair? To do that, you need to fit an interactive model, or something similar to the F1000Research paper on edgeR quasi-likelihood pipeline. But the problem is your data doesn't have enough replicate samples (it only has 1 Present and 1 Future sample in Pair 46, etc). You need more samples in order to do that.

0
Entering edit mode

Yunshun,

The interactive model will most likely be the way we have to approach the issue. Our entire sample set includes 6 replicates grown at 400 ppm CO2 and 6 replicates grown at 800 ppm CO2. Each of those 6 replicates were started from a single clonal culture isolated from a parent culture. We were under the impression that you could account for within clone variability by comparing the 400 and 800 sample of each clone. A "pairwise" comparison so to speak.

Thank you for the feedback! We appreciate the help.