Quality control on RNA-Seq (level 3) data from TCGA
2
0
Entering edit mode
NS ▴ 60
@ns-7498
Last seen 5.8 years ago
United States

I have downloaded RNA_Seq files for mRNAs in level 3 format from TCGA database. These files contain "read_counts".

Now, my supervisor emphasizes to do quality control steps, e.g. GC-content bias and length bias but I do not know anything about these procedures and how I can do them. My major is not biology and it is the first time I am working with RNA-Seq files.

I appreciate if anyone can help me and tell me how I can do such pre-processing steps.

rna-seq GC-Content qualitycontrol • 2.7k views
ADD COMMENT
3
Entering edit mode
@matthew-mccormack-2021
Last seen 10 months ago
United States

If you have the files in .fastq format you can use fastqc available from here: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Fastqc will work on zipped files, so you do not have to unzip them first. However, I think this format is what TGAC calls level 1 data. Level 3 data seems to me to be files in which the expression changes have already been calculated, so in other words much of the analysis has already been done. You would not be able to asses the quality of sequencing from this type of file. You would need the level 1 data (specifically, .fastq files.).

If you download fastqc and also the .fastq files, preferably zipped .fastq files, then you can just open the zipped .fastq files into fastqc. They will take a few minutes to load a single file. It will then provide a series of graphics. You can watch this 12 min. video made by the fastqc author on how to interpret the results here: https://www.youtube.com/watch?v=bz93ReOv87Y

(The names here can be a little confusing if you are beginning, so remember that .fastq is a sequencing file with results from the sequencing machine, and fastqc is a program you can download to assess the quality of the sequencing using the .fastq files.)

ADD COMMENT
1
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States

I think you should consider asking your supervisor for some pointers ... it sounds like you're still engaged in some type of training (graduate school, perhaps?) and this is what advisors/supervisors are for.

In any event, googling for the topics you mention along with "RNAseq" will surely provide many places you can get started, as well.

ADD COMMENT

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6