Normalizing counts for cross-species correlation test
1
0
Entering edit mode
fl ▴ 20
@fl-16173
Last seen 4 months ago
Germany

Hello, I am interested in correlating counts between two insect species. For each species, I imported kallisto results and summarized to the gene-level using tximport, created a DESeqDataSet object, and normalized the counts using the "median ratio method":

counts_sp1 <- DESeq2::counts(dds_sp1, normalized = TRUE)
counts_sp2 <- DESeq2::counts(dds_sp2, normalized = TRUE)

Then, I used biomaRt to retrieve orthology information and created a count matrix including the two species; I replaced the original gene identifiers of each species with unique ortholog identifiers. For example:

| ortho_id | sp1_rep1 | sp1_rep2 | sp1_rep3 | sp1_rep4 | sp2_rep1 | sp2_rep2 | sp2_rep3 | sp2_rep4 |
|----------|----------|----------|----------|----------|----------|----------|----------|----------|
| ortho01  | 1130.25  | 1313.85  | 1662.45  | 1921.63  | 1135.85  | 1211.97  | 927.94   | 1258.67  |
| ortho02  | 680.00   | 1276.19  | 921.22   | 1182.74  | 1357.92  | 1124.72  | 1445.95  | 1072.90  |
| ortho03  | 797.55   | 639.31   | 712.38   | 712.36   | 748.45   | 795.46   | 667.38   | 890.66   |
| ortho04  | 1187.66  | 1522.17  | 1095.07  | 1201.00  | 1337.27  | 1392.64  | 1429.92  | 1404.23  |
| ortho05  | 3472.52  | 3878.38  | 3335.18  | 3553.46  | 806.13   | 796.10   | 969.89   | 858.79   |

Lastly, I used the count matrix including both species to compute size factors again and then divided the count values of each sample by its corresponding size factor:

ortho_size_factors <- DESeq2::estimateSizeFactorsForMatrix(ortho_matrix)
norm_ortho_matrix <- sweep(ortho_matrix, 2, ortho_size_factors, FUN = "/")

Here is the result:

enter image description here

Is there something fundamentally wrong with the last normalization step? I am planning to use the normalized counts of a subset of genes to calculate Spearman correlation coefficients between species. Any help would be greatly appreciated.

deseq2 • 699 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 4 days ago
United States

"created a DESeqDataSet object"

How? Using DESeqDataSetFromTximport?

And what is in ortho_matrix?

ADD COMMENT
0
Entering edit mode

Yes, I used DESeqDataSetFromTximport:

txi_sp1 <- tximport::tximport(files_sp1,
  type = "kallisto",
  tx2gene = tx2gene_sp1,
  txOut = FALSE
)

dds_sp1 <- DESeq2::DESeqDataSetFromTximport(txi_sp1,
  colData = coldata_sp1,
  design = ~1
)

dds_sp1 <- DESeq2::estimateSizeFactors(dds_sp1)
keep <- rowSums(DESeq2::counts(dds_sp1) >= 10) >= 4
dds_sp1 <- dds_sp1[keep, ]

counts_sp1 <- DESeq2::counts(dds_sp1, normalized = TRUE)

ortho_matrix only includes the normalized counts of both species. I used dplyr functions to bind the two data frames of normalized counts (e.g., dplyr::inner_join(ortho_counts_sp1, ortho_counts_sp2, by = "ortho_id").

Thank you for your help.

ADD REPLY
1
Entering edit mode

Ok this seems fine then, at least, what you have in the end is counts, scaled by effective gene length, and then scaled using median ratio method to remove global differences due to sequencing depth. I think that you will get the same output perhaps with just putting counts() into ortho_matrix, not normalized counts, because you run size factor estimation again.

ADD REPLY

Login before adding your answer.

Traffic: 591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6