Question: edgeR filtering: how to remove meta tags and lowly expressed reads (cpm)
0
gravatar for lynski008
2.8 years ago by
lynski0080
lynski0080 wrote:

I'm attempting to use edgeR to analyse my RNA-seq data.  I have managed to use readDGE to create a data object which I have called DG and it looks like this if I print it: 

An object of class DGEList
$samples
                      files group lib.size norm.factors
counts3D7_1-1 counts3D7_1-1     1 13251744            1
counts3D7_2-1 counts3D7_2-1     1 13809955            1
counts3D7_3-1 counts3D7_3-1     1 12328705            1
counts3D7_1-2 counts3D7_1-2     1 12605616            1
counts3D7_2-2 counts3D7_2-2     1 13392599            1
22 more rows ...

$counts
                     Samples
Tags                  counts3D7_1-1 counts3D7_2-1 counts3D7_3-1 counts3D7_1-2 counts3D7_2-2 counts3D7_3-2 counts3D7_1-3 counts3D7_2-3 counts3D7_3-3 countsBLM_1-1 countsBLM_2-1 countsBLM_3-1 countsBLM_1-2 countsBLM_2-2 countsBLM_3-2 countsBLM_1-3
  rna_PF3D7_0100100-1            49            24            27            14             8             6            15             9            15            11             5             8             4             5             6             6
  rna_PF3D7_0100200-1            17            17            23            11            13            13             3             6             6            31            15            15             9             9             4             2
  rna_PF3D7_0100300-1            15            10             4             2             4             5             2             4             5             5            11             4             1             2             2             1
  rna_PF3D7_0100400-1            44            45            46            28            33            35            38            33            53           116            87            82            65            66            41            88
  rna_PF3D7_0100500-1             0             0             0             0             1             2             1             1             0             0             2             2             1             1             0             5
                    

I now wish to move onto the filtering stage but I am stuck.  There are two things I wish to filter out.  When I created the DG data object a warning came up that said "Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique" so I wish to remove expression values with these meta tags i.e. non-aligned features.  Secondly, in accordance with edgeR's recommendations I wish to remove features which do not have at least 1 read per million in n samples, where n for my dataset =3.  Can anyone help me with putting together the code to achieve this?  I've read through the edgeR manual but am new to R and edgeR and am not really sure where to start.  Any help is greatly appreciated.  Thanks,

edger filtering cpm • 852 views
ADD COMMENTlink modified 2.8 years ago by Gordon Smyth37k • written 2.8 years ago by lynski0080
Answer: edgeR filtering: how to remove meta tags and lowly expressed reads (cpm)
1
gravatar for Gordon Smyth
2.8 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

To remove the meta tags:

MetaTags <- grep("^__", rownames(y))
y <- y[-MetaTags, ]

Here I have assumed your DGEList object is called y.

To filter on expression:

IsExpr <- rowSums(cpm(y) > 1) >= 3
y <- y[IsExpr, ]

For more detail about filtering, see the work flow paper: http://f1000research.com/articles/5-1438

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Gordon Smyth37k

That worked perfectly.  Thank you very much Gordon.

ADD REPLYlink written 2.8 years ago by lynski0080
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 282 users visited in the last hour