--- title: "Command Line Arguments" author: "Paul Staab" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{scrm-Arguments} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- scrm is uses a syntax compatible with the popular program [ms][1]. There are, however, a few differences to ms: * scrm can not simulate * gene conversions (`-c` in ms) and * a fix number of segregating sites (`-s`), * the option `-L` produces a slightly different output and * it additionally implements the flags * `-l` (approximation), * `-sr` (changing recombination rate), * `-st` (changing mutation rate), * `-eI` (sampling haplotypes at multiple time points) and * `-oSFS` (generates frequency spectra). * We do not support changing the number of populations with `-ema`. Our version of the command is just `-ema ...` instead of `-ema ...`. For all other options, you can also refer to [ms' manual][1] to get a detailed description of what the commands are doing. scrm should happily execute any ms command that does not contain `-c`, `-s` and `-ema`. Also, scrm has somewhat stricter requirements regarding the order of arguments if population admixture (`-es`) is involved. ## General Syntax The arguments for calling _scrm_ are scrm [...] where _nhap_ is the total number of haplotypes (in all populations and at all times) that are simulated at each locus, and _nrep_ is the number of independent loci that will be produced. The `[...]` is an optional placeholder for an arbitrary number of command line flags described below. ## Basic Parameters ### Recombination * `-r `: Set the recombination rate to _R = 4*N0*r_ and the length of all loci to L base pairs. _r_ is expected number of recombinations on the locus per generation. * `-l `: Use approximation rather than simulating the exact ARG. Within a sliding window of length _l_ base pairs all linkage information is considered when building the genealogy. To positions outside of this window, some linkage is ignored. Setting _l=0_ produces the SMC' and _l=-1_ deactivates the approximation. Since v1.6.0, it's also possible to specify the window's length in number of recombinations. To do so, use `-l r`, where x is the number of recombinations (e.g. `-l 100r` for a window spanning 100 recombinations). Also starting with version 1.6.0 **approximation is turned on by default** using a conservative window length of 500 recombinations. For most applications, it should be fine to reduce this value to 100 - 250 recombinations if runtime is a critical factor. ### Population structure & migration In all commands, migrations rates _M = 4*N0*m_, where _m_ is the fraction of a population that is replaced with migrants from other populations each generation (looking forwards in time). * `-I ... []`: Use an island model with _npop_ populations, where _s1_ to _sn_ haplotypes are sampled from population 1 to n, respectively. Optionally assume a symmetric migration rate of _M_. * `-M `: Assume a symmetric migration rate of _M/(npop-1)_. * `-m `: Set the migration rate from population _j_ to population _i_ to _M_ (looking forward in time) [since v1.3.1]. * `-ma ... ...`: Set the migration matrix (Dimension is _npop x npop_). Diagonals elements are ignored but required (you can use `x` or `0`). ### Population size changes For exponential growth/decline of a population, the parameter _a_ changes the size of a population according to the formula _N(s) = N(0)*exp(-a*s)_, where _N(0)_ is the population's size at the time of the command (e.g. _0_ for `-g ` and `-G ` and _t_ for `-eg ` and `-eG `) and _N(s)_ is the size of the population _s_ time units in the past. Looking forwards in time, a positive _a_ leads to population growth, while a negative one generates a decline in population sizes. * `-n `: Set the present day size of population _i_ to _n*N0_. * `-G `: Set the exponential growth rate of all populations to _a_. * `-g `: Set the exponential growth rate of population _i_ to _a_. ### Summary Statistics * `-t < $\theta$ >`: Set the mutation rate to $\theta = 4N_0u$, where _u_ is the neutral mutation rate per locus. If this options is given, scrm generates the segregating sites output. * `-transpose-segsites` or `--transpose-segsites`: If given, the segregating sites are printed with each row representing a mutation and each column representing a haplotype, rather than the other way round. Additionally, the time at which a mutation occurred is reported (in units of _4 * N0_ generations) [since v1.7.0]. * `-T`: Print the local genealogies in newick format. * `-O`: Print the local genealogies in the `oriented forest` format as described in [Kelleher _et al._ (2014)][2] [since v1.2]. * `-L`: Print the TMRCA and the local tree length for each segment (behaves different to ms). Both values are scaled in coalescent time units, e.g. in _4 * N0_ generations. * `-oSFS`: Print the site frequency spectrum. Requires that the mutation rate $\theta$ is given with the '-t' option. * `-SC [ms|rel|abs]`: Scaling of sequence positions. Either relative to the locus length between 0 and 1 (`rel`), absolute in base pairs (`abs`) or `ms`'s scaling (default) where the positions in the _segregating sites_ output are relative, and the positions in the trees output are absolute (`ms`) [since v1.3.0]. ### Other options * `-seed [ ]`: Specifies a seed for the simulation. You can input up to three non-negative numbers. If no seed is given, scrm generates one using entropy provided by the operating system. To reproduce a previous simulation, use the single number in the second line of the output. * `-print-model, --print-model`: Prints information about the model defined by the command line arguments, including calculated population sizes. Can be useful for debugging or verifying the model [since v1.5.0]. * `-p `: Sets the number of significant digits used in the output [since v1.4.0]. * `-h`, `--help`: Prints a help text. * `-v`, `--version`: Prints version information. ## Time specific parameters The command this section all have a time _t_ as first parameter. Changes made by the commands affect the time from _t_ further back into the past. All times in units of _4*N0_ generations. ### Population structure & migration * `-eI ... `: Sample _s1_ to _sn_ haplotypes are from population _1_ to _n_, respectively, at time _t_. * `-eM `: Assume a symmetric migration rate of _M/(npop-1)_ at time _t_. * `-em `: Set the migration rate from population _j_ to population _i_ to _M_ (looking forward in time) at time _t_ [since v1.3.1]. * `-ema ... ...`: Set the migration matrix at time _t_ (Dimension is _npop x npop_). Diagonals elements are ignored but required (use 'x' or 0). The rates apply pastwards from time _t_. ### Population size changes * `-eN `: Set the size of all populations to _n*N0_ at time _t_. * `-en `: Set the size of population _i_ to _n*N0_ at time _t_. * `-eg `: Set the exponential growth rate of population _i_ to _a_ at time _t_. * `-eG `: Set the exponential growth rate of all populations to _a_ at time _t_. ### Population Splits & Merges * `-es

`: Population admixture. Replaces a fraction _1-p_ of population _i_ with haplotypes from a population _npop + 1_. Technically (and looking backwards in time), a new population _n+1_ with size _N0_ is created at time _t_. Migration (to & from) and growth rates for this population are initially 0. Each lines in population _i_ is moved to the new population with probability _1-p_. Please sort multiple `-es` arguments by their time to avoid confusion about the numbering of populations. Please give the arguments that affect the whole population (`-M`, `-N`, `-G` & `-ma`) before giving the first `-es`. Also, their timed equivalent's (`-eM`, `-eN`, `-eG`, `-eI` & `-ema`) position on the command line events must also be sorted by time, at least relative to the `-es` argument. `scrm` throws an error if any of these conditions is not met. In doubt, just sort all command line arguments by their time. * `-eps

`: Partial admixture. Similar to `-es` but replaces a fraction `1-p` of population _i_ with haploids from population _j_ at time _t_. Different to `-es`, population _j_ is a normal population that continues to exist at times more recent than _t_. Viewed backwards in time, this moves a fraction _1-p_ of the linages in population _i_ to population _j_. This does not change the number of populations, population sizes, growth or migration rates in any way [since v1.5.0]. * `-ej `: Adds a specialization event in population _i_ that creates population _j_ (forwards in time). Technically (and looking backwards in time), it moves all lines from population _j_ into population _i_ at time _t_. Migration rates into population _j_ are set to 0 for the time further back into the past. When multiple `es`, `eps` or `ej` arguments are given for the same time _t_, the migrations are executed in the order in which the commands are given. For example if we have `-es 0.08 2 .2 -ej 0.08 3 1`, first 80% of pop 2 move to a newly created pop 3 (viewed backwards in time), then everyone that just moved to pop 3 moves on to pop 1. This is equivalent to `-eps 0.08 2 1 .2`, except that the latter does not create the empty population 3. ## Sequence specific parameters The following commands change the model parameters from at a sequence position _s_. You should still set the initial rate with `-r` or `-t`, respectively, and then use the commands prefixed with `s` for all changes. Note that `-r` also takes the total length of the sequence as second argument, while `-sr` just has the rate as argument. * `-sr `: Set the recombination rate to _R_ starting at position _s_. * `-st <$\theta$>`: Set the mutation rate to $\theta$ starting at position _s_. [1]: http://home.uchicago.edu/~rhudson1/source/mksamples.html [2]: https://www.sciencedirect.com/science/article/pii/S0040580914000355