salmon output file for differential expression analysis
1
0
Entering edit mode
Bob • 0
@bob-12005
Last seen 5.8 years ago

Dear All

I want to find predicted lncRNA expression according to RNA-seq data, I prepared a database from all of the predicted lncRNA and then I aligned the PE RNA-seq libs to the indexed lncRNA db via salmon. but in salmon output files there are some non-integer numReads, what is the best approach to handle this non-integer numbers for DE analysis. I don't have gtf and gene id files for db sequence. the database only contains the putative long non-coding RNA which retrieved from the genome. 

Thanks

   Name    Length  EffectiveLength  TPM  NumReads
CUFF.47.1   1011    845.627      21.0942   250.461
CUFF.53.2   734    570.457      45.1108    361.328
CUFF.54.1   760    596.362     87.4186    732
CUFF.57.1   825    661.123     268.776    2495
CUFF.58.2   338    176.503     356.296     883
CUFF.80.1   1348   1182.63    0.594387    9.86994

 

salmon rna seq deseq2 • 1.8k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 50 minutes ago
United States

hi,

I wouldn't quantify against just a small subset of the transcriptome. If you are working with human or mouse for example, I recommend to use the Gencode transcriptome files, which have protein coding and non-coding together. If you leave out transcripts from the reference which are in the sample it will result in worse quantification. You can subset to lncRNA after quantifying.

The non-integer is fine, see tximport package for importing from Salmon and then running DESeq2 or other inference packages (edgeR, limma).

ADD COMMENT
0
Entering edit mode

Thanks for your answer, this data is for a plant and I don't have Gencode transcriptome files for that. it just a prediction of lncRNA in plant and then find number expression for each putative lncRNA in our RNA-seq libs. so, in that case, Numreads of salmon is useful for quantifying? 

regarding tximport package, could you please help me to prepare tx2gene table for my data?

Thanks 

ADD REPLY
0
Entering edit mode

Preparing tx2gene is up to you as the analyst. If you want to combine transcripts to the gene level, you’ll need to provide that mapping.

Again I’d recommend to quantify against coding and non coding together.

ADD REPLY
0
Entering edit mode

due to the small subset of the transcript, if I select TPM of salmon output and then just compare the frequency of each non-coding RNA to find the most expressed one, it will be a robust result? 

ADD REPLY
0
Entering edit mode

I don't really follow what's going on here sorry. If you have a question about using DESeq2 or tximport, feel free to follow up, otherwise, you may get better feedback on a more general bioinformatics forum such as Biostars.

ADD REPLY

Login before adding your answer.

Traffic: 521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6