Question

NGS public data analysis

0

Entering edit mode

Jill Pleasance ▴ 10

@jill-pleasance-5512

Last seen 9.6 years ago

Hi I am writing as I am trying to analyse NGS data from public data (GEO) specifically datasets such as one sample per time point. The raw (somewhat processed data) is 3 samples at different time points where The read count at exon, splice-junction, transcript and gene levels were summarized and normalized to relative abundance in Fragments Per Kilobase of exon model per Million (FPKM) in order to compare transcription level among samples. The authors of this paper then used The differentially expressed transcripts were identified using M-A based random sampling method implemented in DEGseq package in BioConductor ( http://bioconductor.org/packages/2.5/bioc/html/DEGseq.html). The transcripts were further filtered at > 2-fold change and a minimum read count of 50 in either condition. I have read through some of your posts where Gordon suggested using a simple excel formula to achieve fold changes when you dont have replicates *lib.size1 <- sum(y1)* >>* lib.size2 <- sum(y2)* >>* logFC <- log2((y1+0.5)/(lib.size1+0.5)/(y2+0.5)*(lib.size2+0.5))* * * Is this something I could apply to the current analysis? I have 3 files - with gene ID and counts (one for each sample) and if genes are not listed in the sample files I assume the counts are zero. Would you have any suggestions as to what to do with these zero count reads? I am trying to avoid learning how to script write at the moment to see if this analysis works and obviously when I come to more complicated public data with replicates I will have to invest some time in learning the bioconductor program! Many thanks JILL [[alternative HTML version deleted]]

Transcription DEGseq Transcription DEGseq • 1.3k views

ADD COMMENT • link 11.6 years ago Jill Pleasance ▴ 10