Skip to contents

This function takes a vector of true relative abundances and NTRs, and then simulates (i) read counts per gene and (ii) 4sU incorporation and conversion events. Subsequently, it uses the same approach as implemented in the GRAND-SLAM 2.0 software (Juerges et al., Bioinformatics 2018) to estimate the NTR from these simulated data.

Usage

SimulateReadsForSample(
  num.reads = 2e+07,
  rel.abundance = setNames(rlnorm(10000, meanlog = 4.5, sdlog = 1), paste0("Gene",
    1:10000)),
  ntr = setNames(rbeta(10000, 1.5, 3), paste0("Gene", 1:10000)),
  dispersion = 0.05,
  beta.approx = FALSE,
  conversion.reads = FALSE,
  u.content = 0.25,
  u.content.sd = 0.05,
  read.length = 75,
  p.old = 1e-04,
  p.new = 0.04,
  p.new.fit = p.new,
  seed = NULL
)

Arguments

num.reads

the total amount of reads for simulation

rel.abundance

named (according to genes) vector of the true relative abundances. Is divided by its sum.

ntr

vector of true NTRs

dispersion

vector of dispersion parameters (should best be estimated by DESeq2)

beta.approx

should the beta approximation of the NTR posterior be computed?

conversion.reads

also output the number of reads with conversion

u.content

the relative frequency of uridines in the reads

u.content.sd

the standard deviation of the u content

read.length

the read length for simulation

p.old

the probability for a conversion in reads originating from old RNA

p.new

the probability for a conversion in reads originating from new RNA

p.new.fit

the probability for a conversion in reads originating from new RNA that is used for fitting (to simulate bias in the estimation of p.new)

seed

seed value for the random number generator (set to make it deterministic!)

Value

a matrix containing, per column, the simulated counts, the simulated NTRs, (potentially the shape parameters of the beta distribution approximation,) and the true relative frequencies and ntrs

Details

The simulation proceeds as follows:

  1. Draw for each gene the number of reads from a negative binomial distribution parametrized with the relative abundances x read number and the dispersion parameter

  2. For each gene: Draw for each read the number of uridines according to a beta binomial distribution for the given read length (the beta prior is parametrized to match the u.content and u.content.sd parameters)

  3. For each read: Draw the number of conversions according to the binomial mixture model of GRAND-SLAM (parametrized with p_old, p_new, the gene specific NTR and the read specific number of uridines)

  4. Estimate the NTR by using the GRAND-SLAM approach

Examples

SimulateReadsForSample(num.reads = 10000,rel.abundance = rep(1,5),ntr=0.9)
#>      count       ntr true_freq true_ntr
#> [1,]  2013 0.9317620       0.2      0.9
#> [2,]  2818 0.8905261       0.2      0.9
#> [3,]  1849 0.9493985       0.2      0.9
#> [4,]  1195 0.8917638       0.2      0.9
#> [5,]  1671 0.8820507       0.2      0.9
SimulateReadsForSample(num.reads = 10000,rel.abundance = rep(1,5),ntr=0.9,seed=1337)
#>      count       ntr true_freq true_ntr
#> [1,]  1970 0.9088477       0.2      0.9
#> [2,]  1733 0.8445619       0.2      0.9
#> [3,]  3158 0.8908687       0.2      0.9
#> [4,]  2866 0.8973888       0.2      0.9
#> [5,]  2173 0.9158799       0.2      0.9
SimulateReadsForSample(num.reads = 10000,rel.abundance = rep(1,5),ntr=0.9,seed=1337)
#>      count       ntr true_freq true_ntr
#> [1,]  1970 0.9088477       0.2      0.9
#> [2,]  1733 0.8445619       0.2      0.9
#> [3,]  3158 0.8908687       0.2      0.9
#> [4,]  2866 0.8973888       0.2      0.9
#> [5,]  2173 0.9158799       0.2      0.9
# the second and third matrix should be equal, the first should be distinct