Question

Running DESeq2 for different types of data

0

Entering edit mode

yair.gatt • 0

@yairgatt-14925

Last seen 3.7 years ago

Hello,
I would like to inquire about using DESeq2 package in order to compare different types of read data:

1. Would it be OK to use DESeq2 to compare read data between homologous genes in different strains of the same bacterial species?
2. In case I would like to analyze only a subset of the genes, classified by a certain independent feature - Would applying DESeq2 to only a subset of the genes be OK or would it affect the normalization?
3. We have recently developed in our lab an experimental methodology termed RIL-seq (Melamed et al., Mol. Cell, 2016) to find small RNA-target interactions. By this methodology we ligate RNA fragments and sequence them by paired end sequencing. In the mapping procedure we identify chimeric fragments, in which one read belongs to one RNA molecule, and the second read to another RNA molecule. My question is if it would be OK to compare the numbers of chimeric fragments of corresponding pairs between different growth conditions using DESeq2 (we have several samples for each condition).

Many thanks in advance,

Yair E. Gatt, MD-PhD student
Prof. Hanah Margalit's lab.
Molecular Genetics and Microbiology Department
Faculty of Medicine
The Hebrew University of Jerusalem

deseq2 • 554 views

ADD COMMENT • link updated 6.3 years ago by Michael Love 42k • written 6.3 years ago by yair.gatt • 0

score 1 · Answer 1 · 2018-02-01

I only have limited answers for you: 1) you would have to make sure that read assignment is not the reason you see differences in counts. You can’t just use one species gene sequence as this would lead to a mapping bias. I don’t have any experience with solving this problem. 2) the normalization needs to see a sufficient number of genes that are not DE. If all genes are DE then you can’t normalize. The method is robust — it uses the median — but you can’t have all genes DE. 3) you can use DESeq2 for many setups with vectors of counts, I can’t say in general when it will work. Going back to (2), this is a key issue: you need sufficient rows where there are no or only small differences, such that the method can come up with a parameter that explains any technical multiplicative factor effecting the columns.