Command Line Arguments

scrm is uses a syntax compatible with the popular program ms. There are, however, a few differences to ms:

  • scrm can not simulate
    • gene conversions (-c in ms) and
    • a fix number of segregating sites (-s),
  • the option -L produces a slightly different output and
  • it additionally implements the flags
    • -l (approximation),
    • -sr (changing recombination rate),
    • -st (changing mutation rate),
    • -eI (sampling haplotypes at multiple time points) and
    • -oSFS (generates frequency spectra).
  • We do not support changing the number of populations with -ema. Our version of the command is just -ema <t> <M11> <M12> ... instead of -ema <t> <npop> <M11> <M12> ....

For all other options, you can also refer to ms’ manual to get a detailed description of what the commands are doing. scrm should happily execute any ms command that does not contain -c, -s and -ema. Also, scrm has somewhat stricter requirements regarding the order of arguments if population admixture (-es) is involved.

General Syntax

The arguments for calling scrm are

scrm <nhap> <nrep> [...]

where nhap is the total number of haplotypes (in all populations and at all times) that are simulated at each locus, and nrep is the number of independent loci that will be produced. The [...] is an optional placeholder for an arbitrary number of command line flags described below.

Basic Parameters

Recombination

  • -r <R> <L>: Set the recombination rate to R = 4N0r and the length of all loci to L base pairs. r is expected number of recombinations on the locus per generation.
  • -l <l>: Use approximation rather than simulating the exact ARG. Within a sliding window of length l base pairs all linkage information is considered when building the genealogy. To positions outside of this window, some linkage is ignored. Setting l=0 produces the SMC’ and l=-1 deactivates the approximation. Since v1.6.0, it’s also possible to specify the window’s length in number of recombinations. To do so, use -l <x>r, where x is the number of recombinations (e.g. -l 100r for a window spanning 100 recombinations). Also starting with version 1.6.0 approximation is turned on by default using a conservative window length of 500 recombinations. For most applications, it should be fine to reduce this value to 100 - 250 recombinations if runtime is a critical factor.

Population structure & migration

In all commands, migrations rates M = 4N0m, where m is the fraction of a population that is replaced with migrants from other populations each generation (looking forwards in time).

  • -I <npop> <s1> ... <sn> [<M>]: Use an island model with npop populations, where s1 to sn haplotypes are sampled from population 1 to n, respectively. Optionally assume a symmetric migration rate of M.
  • -M <M>: Assume a symmetric migration rate of M/(npop-1).
  • -m <i> <j> <M>: Set the migration rate from population j to population i to M (looking forward in time) [since v1.3.1].
  • -ma <M11> <M21> ... <M21> ...: Set the migration matrix (Dimension is npop x npop). Diagonals elements are ignored but required (you can use x or 0).

Population size changes

For exponential growth/decline of a population, the parameter a changes the size of a population according to the formula N(s) = N(0)exp(-as), where N(0) is the population’s size at the time of the command (e.g. 0 for -g <a> and -G <a> and t for -eg <t> <a> and -eG <t> <a>) and N(s) is the size of the population s time units in the past. Looking forwards in time, a positive a leads to population growth, while a negative one generates a decline in population sizes.

  • -n <i> <n>: Set the present day size of population i to _n*N0_.
  • -G <a>: Set the exponential growth rate of all populations to a.
  • -g <i> <a>: Set the exponential growth rate of population i to a.

Summary Statistics

  • -t < $\theta$ >: Set the mutation rate to θ = 4N0u, where u is the neutral mutation rate per locus. If this options is given, scrm generates the segregating sites output.
  • -transpose-segsites or --transpose-segsites: If given, the segregating sites are printed with each row representing a mutation and each column representing a haplotype, rather than the other way round. Additionally, the time at which a mutation occurred is reported (in units of 4 * N0 generations) [since v1.7.0].
  • -T: Print the local genealogies in newick format.
  • -O: Print the local genealogies in the oriented forest format as described in Kelleher et al. (2014) [since v1.2].
  • -L: Print the TMRCA and the local tree length for each segment (behaves different to ms). Both values are scaled in coalescent time units, e.g. in 4 * N0 generations.
  • -oSFS: Print the site frequency spectrum. Requires that the mutation rate θ is given with the ‘-t’ option.
  • -SC [ms|rel|abs]: Scaling of sequence positions. Either relative to the locus length between 0 and 1 (rel), absolute in base pairs (abs) or ms’s scaling (default) where the positions in the segregating sites output are relative, and the positions in the trees output are absolute (ms) [since v1.3.0].

Other options

  • -seed <SEED> [<SEED2> <SEED3>]: Specifies a seed for the simulation. You can input up to three non-negative numbers. If no seed is given, scrm generates one using entropy provided by the operating system. To reproduce a previous simulation, use the single number in the second line of the output.
  • -print-model, --print-model: Prints information about the model defined by the command line arguments, including calculated population sizes. Can be useful for debugging or verifying the model [since v1.5.0].
  • -p <digits>: Sets the number of significant digits used in the output [since v1.4.0].
  • -h, --help: Prints a help text.
  • -v, --version: Prints version information.

Time specific parameters

The command this section all have a time t as first parameter. Changes made by the commands affect the time from t further back into the past. All times in units of _4*N0_ generations.

Population structure & migration

  • -eI <t> <s1> ... <sn>: Sample s1 to sn haplotypes are from population 1 to n, respectively, at time t.
  • -eM <t> <M>: Assume a symmetric migration rate of M/(npop-1) at time t.
  • -em <t> <i> <j> <M>: Set the migration rate from population j to population i to M (looking forward in time) at time t [since v1.3.1].
  • -ema <t> <M11> <M12> ... <M21> ...: Set the migration matrix at time t (Dimension is npop x npop). Diagonals elements are ignored but required (use ‘x’ or 0). The rates apply pastwards from time t.

Population size changes

  • -eN <t> <n>: Set the size of all populations to _n*N0_ at time t.
  • -en <t> <i> <n>: Set the size of population i to _n*N0_ at time t.
  • -eg <t> <i> <a>: Set the exponential growth rate of population i to a at time t.
  • -eG <t> <a>: Set the exponential growth rate of all populations to a at time t.

Population Splits & Merges

  • -es <t> <i> <p>: Population admixture. Replaces a fraction 1-p of population i with haplotypes from a population npop + 1. Technically (and looking backwards in time), a new population n+1 with size N0 is created at time t. Migration (to & from) and growth rates for this population are initially 0. Each lines in population i is moved to the new population with probability 1-p. Please sort multiple -es arguments by their time to avoid confusion about the numbering of populations. Please give the arguments that affect the whole population (-M, -N, -G & -ma) before giving the first -es. Also, their timed equivalent’s (-eM, -eN, -eG, -eI & -ema) position on the command line events must also be sorted by time, at least relative to the -es argument. scrm throws an error if any of these conditions is not met. In doubt, just sort all command line arguments by their time.
  • -eps <t> <i> <j> <p>: Partial admixture. Similar to -es but replaces a fraction 1-p of population i with haploids from population j at time t. Different to -es, population j is a normal population that continues to exist at times more recent than t. Viewed backwards in time, this moves a fraction 1-p of the linages in population i to population j. This does not change the number of populations, population sizes, growth or migration rates in any way [since v1.5.0].
  • -ej <t> <j> <i>: Adds a specialization event in population i that creates population j (forwards in time). Technically (and looking backwards in time), it moves all lines from population j into population i at time t. Migration rates into population j are set to 0 for the time further back into the past.

When multiple es, eps or ej arguments are given for the same time t, the migrations are executed in the order in which the commands are given. For example if we have -es 0.08 2 .2 -ej 0.08 3 1, first 80% of pop 2 move to a newly created pop 3 (viewed backwards in time), then everyone that just moved to pop 3 moves on to pop 1. This is equivalent to -eps 0.08 2 1 .2, except that the latter does not create the empty population 3.

Sequence specific parameters

The following commands change the model parameters from at a sequence position s. You should still set the initial rate with -r or -t, respectively, and then use the commands prefixed with s for all changes. Note that -r also takes the total length of the sequence as second argument, while -sr just has the rate as argument.

  • -sr <s> <R>: Set the recombination rate to R starting at position s.
  • -st <s> <$\theta$>: Set the mutation rate to θ starting at position s.