# Simulate metabolic labeling - nucleotide conversion RNA-seq data.

Source:`R/readsimulator.R`

`SimulateReadsForSample.Rd`

This function takes a vector of *true* relative abundances and NTRs, and then simulates
(i) read counts per gene and (ii) 4sU incorporation and conversion events. Subsequently, it
uses the same approach as implemented in the GRAND-SLAM 2.0 software (Juerges et al., Bioinformatics 2018)
to estimate the NTR from these simulated data.

## Usage

```
SimulateReadsForSample(
num.reads = 2e+07,
rel.abundance = setNames(rlnorm(10000, meanlog = 4.5, sdlog = 1), paste0("Gene",
1:10000)),
ntr = setNames(rbeta(10000, 1.5, 3), paste0("Gene", 1:10000)),
dispersion = 0.05,
beta.approx = FALSE,
conversion.reads = FALSE,
u.content = 0.25,
u.content.sd = 0.05,
read.length = 75,
p.old = 1e-04,
p.new = 0.04,
p.new.fit = p.new,
seed = NULL
)
```

## Arguments

- num.reads
the total amount of reads for simulation

- rel.abundance
named (according to genes) vector of the true relative abundances. Is divided by its sum.

- ntr
vector of true NTRs

- dispersion
vector of dispersion parameters (should best be estimated by DESeq2)

- beta.approx
should the beta approximation of the NTR posterior be computed?

- conversion.reads
also output the number of reads with conversion

- u.content
the relative frequency of uridines in the reads

- u.content.sd
the standard deviation of the u content

- read.length
the read length for simulation

- p.old
the probability for a conversion in reads originating from old RNA

- p.new
the probability for a conversion in reads originating from new RNA

- p.new.fit
the probability for a conversion in reads originating from new RNA that is used for fitting (to simulate bias in the estimation of p.new)

- seed
seed value for the random number generator (set to make it deterministic!)

## Value

a matrix containing, per column, the simulated counts, the simulated NTRs, (potentially the shape parameters of the beta distribution approximation,) and the true relative frequencies and ntrs

## Details

The simulation proceeds as follows:

Draw for each gene the number of reads from a negative binomial distribution parametrized with the relative abundances x read number and the dispersion parameter

For each gene: Draw for each read the number of uridines according to a beta binomial distribution for the given read length (the beta prior is parametrized to match the u.content and u.content.sd parameters)

For each read: Draw the number of conversions according to the binomial mixture model of GRAND-SLAM (parametrized with p_old, p_new, the gene specific NTR and the read specific number of uridines)

Estimate the NTR by using the GRAND-SLAM approach

## Examples

```
SimulateReadsForSample(num.reads = 10000,rel.abundance = rep(1,5),ntr=0.9)
#> count ntr true_freq true_ntr
#> [1,] 2013 0.9317620 0.2 0.9
#> [2,] 2818 0.8905261 0.2 0.9
#> [3,] 1849 0.9493985 0.2 0.9
#> [4,] 1195 0.8917638 0.2 0.9
#> [5,] 1671 0.8820507 0.2 0.9
SimulateReadsForSample(num.reads = 10000,rel.abundance = rep(1,5),ntr=0.9,seed=1337)
#> count ntr true_freq true_ntr
#> [1,] 1970 0.9088477 0.2 0.9
#> [2,] 1733 0.8445619 0.2 0.9
#> [3,] 3158 0.8908687 0.2 0.9
#> [4,] 2866 0.8973888 0.2 0.9
#> [5,] 2173 0.9158799 0.2 0.9
SimulateReadsForSample(num.reads = 10000,rel.abundance = rep(1,5),ntr=0.9,seed=1337)
#> count ntr true_freq true_ntr
#> [1,] 1970 0.9088477 0.2 0.9
#> [2,] 1733 0.8445619 0.2 0.9
#> [3,] 3158 0.8908687 0.2 0.9
#> [4,] 2866 0.8973888 0.2 0.9
#> [5,] 2173 0.9158799 0.2 0.9
# the second and third matrix should be equal, the first should be distinct
```