NGS public data analysis
0
0
Entering edit mode
@jill-pleasance-5512
Last seen 10.3 years ago
Hi I am writing as I am trying to analyse NGS data from public data (GEO) specifically datasets such as one sample per time point. The raw (somewhat processed data) is 3 samples at different time points where ‘The read count at exon, splice-junction, transcript and gene levels were summarized and normalized to relative abundance in Fragments Per Kilobase of exon model per Million (FPKM) in order to compare transcription level among samples.’ The authors of this paper then used The differentially expressed transcripts were identified using M-A based random sampling method implemented in DEGseq package in BioConductor ( http://bioconductor.org/packages/2.5/bioc/html/DEGseq.html). The transcripts were further filtered at > 2-fold change and a minimum read count of 50 in either condition. I have read through some of your posts where Gordon suggested using a simple excel formula to achieve fold changes when you don’t have replicates *lib.size1 <- sum(y1)* >>* lib.size2 <- sum(y2)* >>* logFC <- log2((y1+0.5)/(lib.size1+0.5)/(y2+0.5)*(lib.size2+0.5))* * * Is this something I could apply to the current analysis? I have 3 files - with gene ID and counts (one for each sample) and if genes are not listed in the sample files – I assume the counts are zero. Would you have any suggestions as to what to do with these zero count reads? I am trying to avoid learning how to script write at the moment to see if this analysis works and obviously when I come to more complicated public data with replicates I will have to invest some time in learning the bioconductor program! Many thanks JILL [[alternative HTML version deleted]]
Transcription DEGseq Transcription DEGseq • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6