Skip to contents

Return a grandR object with fewer genes than the given grandR object (usually to filter out weakly expressed genes).

Usage

FilterGenes(
  data,
  mode.slot = "count",
  minval = 100,
  mincol = ncol(data)/2,
  min.cond = NULL,
  use = NULL,
  keep = NULL,
  return.genes = FALSE
)

Arguments

data

the grandR object

mode.slot

the mode.slot that is used for filtering (see details)

minval

the minimal value for retaining a gene

mincol

the minimal number of columns (i.e. samples or cells) a gene has to have a value >= minval

min.cond

if not NULL, do not compare values per column, but per condition (see details)

use

if not NULL, defines the genes directly that are supposed to be retained (see details)

keep

if not NULL, defines genes directly, that should be kept even though they do not adhere to the filtering criteria (see details)

return.genes

if TRUE, return the gene names instead of a new grandR object

Value

either a new grandR object (if return.genes=FALSE), or a vector containing the gene names that would be retained

Details

By default genes are retained, if they have 100 read counts in at least half of the columns (i.e. samples or cells).

The use parameter can be used to define genes to be retained directly. The keep parameter, in contrast, defines additional genes to be retained. For both, genes can be referred to by their names, symbols, row numbers in the gene table, or a logical vector referring to the gene table rows.

To refer to data slots, the mode.slot syntax can be used: Each name is either a data slot, or one of (new,old,total) followed by a dot followed by a slot. For new or old, the data slot value is multiplied by ntr or 1-ntr. This can be used e.g. to filter by new counts.

if the min.cond parameter is given, first all columns belonging to the same Condition are summed up, and then the usual filtering is performed by conditions instead of by columns.

Examples


sars <- ReadGRAND(system.file("extdata", "sars.tsv.gz", package = "grandR"),
                  design=c("Condition",Design$dur.4sU,Design$Replicate))
#> Warning: Duplicate gene symbols (n=1, e.g. MATR3) present, making unique!

nrow(sars)
#> [1] 1045
# This is already filtered and has 1045 genes
nrow(FilterGenes(sars,minval=1000))
#> [1] 966
# There are 966 genes with at least 1000 read counts in half of the samples
nrow(FilterGenes(sars,minval=10000,min.cond=1))
#> [1] 944
# There are 944 genes with at least 10000 read counts in the Mock or SARS condition
nrow(FilterGenes(sars,use=GeneInfo(sars,"Type")!="Cellular"))
#> [1] 11
# These are the 11 viral genes.