Horizontal integration of two RNA-seq datasets derived from two different platforms
1
0
Entering edit mode
rjoana • 0
@ea48508a
Last seen 12 months ago
Portugal

Currently, I'm interested in horizontal integration of two RNA-seq datasets derived from different platforms (S5 and Illumina), one containing normal samples and the other containing abnormal samples. Since DESeq2 input is one raw count matrix, I'm trying to find a "conversion factor" based on raw count reads of the two datasets to address the issue of the normal and abnormal samples coming from different platforms. The stategy would be using the housekeeping genes shared by the datasets. After this step, I'd use estimateSizeFactors with controlGenes.

Do you think this would be a reliable strategy? Thanks in advance

DESeq2 dataIntegration • 669 views
ADD COMMENT
0
Entering edit mode

Just to be more clear: having the 2 raw count matrices, applying a conversion factor based on shared house keeping genes, and lastely combine the 2 count matrices for DESeq2 input. In the literature there are some algorithms that claim their methods work but when reading further, the integrated databases derive from the same platform.

ADD REPLY
1
Entering edit mode

Linear scaling is nothing different than the default normalization deseq2 does anyway. Different technologies measure different sets of genes, have different dynamic ranges and different ratios of genes compared to a set of housekeepers. A linear scaling will not do here. In any case, since here the treatment condition seems to be nested by the technology you anyway cannot do any integration. It's fully confounded, no stats magic will change that.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

I think you need more than size factor based scaling to deal with the different technology.

Do you have no control samples sequenced on the two platforms? It may be impossible to distinguish biological differences from technical ones.

ADD COMMENT
0
Entering edit mode

No, I don't. Control normal samples derived from the tissue I'm studying and sequenced at my lab are rarely included in RNA seq studies due to its limited availability. No compatible S5 data is available in public data bases. Thanks for all the comments.

ADD REPLY
1
Entering edit mode

Without any samples across the technology, I can't think of any way to harmonize. If you had samples across you could use RUV-seq, which has methods that take advantage of these types of samples.

ADD REPLY

Login before adding your answer.

Traffic: 510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6