Read featureCounts — ReadFeatureCounts • grandR

grandR can also be used to analyze standard RNA-seq data, and this function is here to read such data.

Usage

ReadFeatureCounts(
  file,
  design = c(Design$Condition, Design$Replicate),
  classify.genes = ClassifyGenes(),
  rename.sample = NULL,
  filter.table = NULL,
  num.samples = NULL,
  verbose = FALSE,
  sep = "\t"
)

Arguments

file: a file containing featureCounts
design: Either a design vector (see details), or a data.frame providing metadata for all columns (samples/cells), or a function that is called with the condition name vector and is supposed to return this data.frame.
classify.genes: A function that is used to add the type column to the gene annotation table, always a call to ClassifyGenes
rename.sample: function that is applied to each sample name before parsing (or NULL)
filter.table: function that is applied to the table directly after read it (or NULL)
num.samples: number of sample columns containing read counts (can be NULL, see details)
verbose: Print status updates
sep: The column separator used in the file

Value

a grandR object

Details

The table is assumed to have read counts in the last n columns, which must be named according to sample names. If num.samples is NULL this n is automatically recognized as the number of columns containing .bam (so make sure to either specify num.samples, or that the count columns are called after the bam files).

If these columns are named systematically in a particular way, the design vector provides a powerful and easy way to create the column annotations.

The column names have to contain dots (.) to separate the fields for the column annotation table. E.g. the name Mock.4h.A will be split into the fields Mock, 4h and A. For such names, a design vector of length 3 has to be given, that describes the meaning of each field. A reasonable design vector for the example would be c("Treatment","Time","Replicate"). Some names are predefined in the list Design.

The names given in the design vector might even have additional semantics: E.g. for the name duration.4sU the values are interpreted (e.g. 4h is converted into the number 4, or 30min into 0.5, or no4sU into 0). Semantics can be user-defined by calling MakeColdata and using the return value as the design parameter, or a function that calls MakeColdata. In most cases it is easier to manipulate the Coldata table after loading data instead of using this mechanism; the build-in semantics simply provide a convenient way to reduce this kind of manipulation in most cases.

Sometimes you might have forgotten to name all samples consistently (or you simply messed something up). In this case, the rename.sample parameter can be handy (e.g. to rename a particular misnamed sample).

Sometimes the table contains more than you want to read. In this case, use the filter.table parameter to preprocess it. This should be a function that receives a data.frame, and returns a data.frame.

If there are no columns named "Geneid", "Gene" or "Symbol", the first column is used!