Returns a function to be used as classify.genes
parameter for ReadGRAND
.
Arguments
- ...
additional functions to define types (see details)
- use.default
if TRUE, use the default type inference (priority after the user defined ones); see details
- drop.levels
if TRUE, drop unused types from the factor that is generated
- name.unknown
the type to be used for all genes where no type was identified
Value
a function that takes the original GeneInfo table and adds the Type column
Details
This function returns a function. Usually, you do not use it yourself but ClassifyGenes
is usually as classify.genes
parameter
for ReadGRAND
to build the Type column in the GeneInfo
table. See the example
to see how to use it directly.
Each ... parameter must be a function that receives the gene info table and must return a logical vector, indicating for each row in the gene info table, whether it matches to a specific type. The name of the parameter is used as the type name.
If a gene matches to multiple type, the first function returning TRUE for a row in the table is used.
By default, this function will recognize mitochondrial genes (MT prefix of the gene symbol), ERCC spike-ins, and Ensembl gene identifiers (which it will call "cellular"). These three are the last functions to be checked (in case a user defined type via ...) also matches to, e.g., an Ensembl gene).
Examples
viral.genes <- c('ORF3a','E','M','ORF6','ORF7a','ORF7b','ORF8','N','ORF10','ORF1ab','S')
sars <- ReadGRAND(system.file("extdata", "sars.tsv.gz", package = "grandR"),
design=c("Cell",Design$dur.4sU,Design$Replicate),
classify.genes=ClassifyGenes(`SARS-CoV-2`=
function(gene.info) gene.info$Symbol %in% viral.genes),
verbose=TRUE)
#> Checking file...
#> Reading files...
#> Warning: Duplicate gene symbols (n=1, e.g. MATR3) present, making unique!
#> Processing...
table(GeneInfo(sars)$Type)
#>
#> SARS-CoV-2 Cellular
#> 11 1034
fun<-ClassifyGenes(viral=function(gene.info) gene.info$Symbol %in% viral.genes)
table(fun(GeneInfo(sars)))
#>
#> viral Cellular
#> 11 1034