rnaseqGene: Different output when generating step by step airway dataset
1
2
Entering edit mode
jorgeplaza ▴ 20
@dec8f401
Last seen 2.4 years ago
Spain

I have been following the workflow 'RNA-seq workflow: gene-level exploratory analysis and differential expression' and I have a problem:

I tried to generate step by step the airway dataset, downloading the fastq files from https://www.ebi.ac.uk/ena/browser/view/PRJNA229998?show=reads Then I download the transcriptome from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.transcripts.fa.gz Then I used Salmon to perform the alignment with the same parameters as the workflow. Finally, I imported the quants files in R by 'tximeta' but the counts did not match. Moreover, when I execute the DESeq2 function, I only get 4 differential expression genes, instead of the roughly 4000 that I should get.

Is there any prefiltering process or any reason for this issue? I would like to learn these prefiltering and dataset preparation procedures to compare different bioinformatic methods with RNA-Seq.

Thanks in advance and greetings.

tximeta DESeq2 rnaseqGene • 1.5k views
ADD COMMENT
1
Entering edit mode

Unless you show your code, nobody can help you. Just saying what you did is no substitute for showing exactly how you did it.

ADD REPLY
2
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

The counts do not match

Is it recognized as GENCODE v29 when you import?

Did you use the same version of Salmon? (I think we used 0.14.1) If not then the counts may not match exactly.

For the DE genes, my top guesses would be: design isn't correct, or samples aren't labelled correctly. Make plotCounts of the top gene from the workflow but on your dds.

ADD COMMENT
0
Entering edit mode

Yes, it is recognized. I've downloaded the same version (0.14.1) and the results are still 4 differential expression genes. The design is the same as the workflow and the colData(gse) matches too, so I guess the samples should be labelled correctly. I obtained the plotCounts for the top 5 genes from the workflow but on my dds and these are the outputs:

plotCounts_ENSG00000165995.19

plotCounts_ENSG00000162614.18

plotCounts_ENSG00000157214.13

plotCounts_ENSG00000152583.12

plotCounts_ENSG00000120129.5

ADD REPLY
0
Entering edit mode

Try a PCA plot.

Which samples are you comparing? Control and dexamethasone? Are you sure you got the same 8 samples as in the workflow? Are you using the CSV from the workflow or did you make your own?

ADD REPLY
0
Entering edit mode

PCA plot:

Yes, I'm comparing control and dexameyhasona. I download the CSV from ncbi Run Selector: enter image description here and this csv matches with the csv from de workflow:

enter image description here

I downloaded the fastq files from ENA (https://www.ebi.ac.uk/ena/browser) instead of NCBI, I don't know if that could be the problem.

ADD REPLY
0
Entering edit mode

It looks like a sample swap with your N61311 compared to my PCA plot.

I think that's the issue, but don't know the origin of this swap. It's pretty clear from the plotCounts as well that a single treated sample is grouping with the untreated, and the reverse...

ADD REPLY
0
Entering edit mode

I think so, I repeated the DESeq2 method swapping my N61311 and I got 4002 DE genes of which 3432 match.

ADD REPLY
0
Entering edit mode

Compare your counts for the top gene from the workflow. See 6.1 Counts plot.

Just to note: we got pretty much the same DE genes as in the original paper and these are clearly identifiable in the counts plots, so this will help you track down which samples may be mislabeled.

ADD REPLY
0
Entering edit mode

The counts are different, I don´t know the reason but the fastq files from ENA and the fastq files from NCBI are not the same. I repeated all the process (download the data and alignment with Salmon and DE analysis with R) but with de NCBI fastq files and now the results do match. Thank you Michael!

ADD REPLY
1
Entering edit mode

Oh wow. That’s surprising but thanks for reporting back.

ADD REPLY

Login before adding your answer.

Traffic: 501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6