Inference¶
In order to run inference for your data using sweepLink, please, specify the infer task.
Example run command:
$ ./sweepLink infer --meta sweepLink_meta.txt --counts sweepLink_alleleCounts.txt --nGridPoints 50 --numThreads 8 --h.update false --numStatesAbsS 18 --grid_abs_s 0.008,0.011,0.015,0.019,0.024,0.031,0.036,0.044,0.0520,0.061,0.072,0.084,0.098,0.113,0.131,0.151,0.174,0.2 --forwardPrior betaSeparate
Required Input Data¶
SweepLink requires three options as input:
Meta file (
--metaoption)This file should contain meta information about populations and time points of given genetic data. It has information for each population/time point data. This file consists of three columns separated by tabs: label name, time of samples, population.
Labels are used in the allele counts file. Note that time is forward, meaning that 0 is past.
Example of meta file:
time_0_pop_0 0 pop0 time_10_pop_0 10 pop0 time_20_pop_0 20 pop0 time_30_pop_0 30 pop0 time_40_pop_0 40 pop0 time_50_pop_0 50 pop0 time_60_pop_0 60 pop0 time_70_pop_0 70 pop0 time_80_pop_0 80 pop0 time_90_pop_0 90 pop0
Allele counts file (
--countsoption)
This file contains allele counts for each population and each time point specified in meta file. It consists of the following columns: chromosome, position, allele counts for each (time point + population) label. Each allele count isC/N, whereCis derived allele count andNis total number of haploid samples. If some data is missed, it can be marked as0/0.Example of allele counts file:
- - time_90_pop_0 time_80_pop_0 time_70_pop_0 time_60_pop_0 time_50_pop_0 time_40_pop_0 time_30_pop_0 time_20_pop_0 time_10_pop_0 time_0_pop_0 1 3482 0/200 0/200 3/200 2/200 3/200 1/200 2/200 8/200 7/200 17/200 1 4576 1/200 1/200 3/200 0/200 0/200 0/200 0/200 0/200 0/200 0/200 1 6981 2/200 1/200 0/200 0/200 0/200 0/200 0/200 0/200 0/200 0/200
Mutation rates (
--mu_a_Aandmu_A_aoptions)One have to specify two mutation rates: rate to mutate from ancestral allele to derived allele (
--mu_a_A) and rate to mutate in the oposite direction (--mu_A_a).
Additional Input Options¶
There are several additional options that one could specify for the inference run.
The grid for the Wright-Fisher diffusion equation (
--nGridPointsoption and flags for grid type)One can specify the number of grid points in the numerical method using
--nGridPointsoption.By default, the grid type is quadratic, but it can be changed by specific options to uniform grid (
--uniformGrid) or logarithmic grid (--logisticGrid). (TODO: describe grid types)Partial Differential Equation solver (
--PDEmethodoption and--backwardflag)There is a choice of two methods for PDE solver - the numerical method that solves the Wright-Fisher diffusion equation:
--PDEmethod CC- use Chang-Cooper numerical scheme (default)--PDEmethod CN- use Crank-Nikolson numerical scheme, it is less unstable according to our experiments
The second option is flag
--backwardthat specifies that sweepLink will use backward pass in order to evaluate the likelihood of one locus data for given demographic parameters. By default, it uses forward pass with some prior (see prior options below).Prior for forward pass (
--forwardPrioroption)There is a choice of three priors for forward pass that allows likelihood evaluations in sweepLink:
--forwardPrior uniformspecifies uniform prior distribution--forwardPrior betaspecifies beta prior distribution for each population, where parameters of these distributions (alpha and beta for each population) are estimated along with the demographic parameters--forwardPrior betaSeparaterefers to beta prior distribution that is different for the loci that are segregating at the first time point and that start to segregate at some intermediate time point (default)
The last option has idea that the prior should be different for the loci that were just introduced and for those that were introduced in the past and were not lost. (TODO: better explanation)
Selection estimation (
--numStatesAbsS,--grid_abs_s,--abs_s_maxoptions)One can set the specific grid for selection coefficients. Option
--numStatesAbsS Nrefers to the number of positive selection coefficients on the grid. One can specify the maximum value of selection by--abs_s_max Xoption, whereXlies in (0, 1] interval. If--numStatesAbsSand--abs_s_maxare set, sweepLink uses unifrom grid with required number of points. It is possible to specify non-unifrm grid by setting--grid_abs_s X1,X2,X3,X4...,Xnoption, whereX1,X2,X3,X4...,Xnare specific positive grid points.One can disable selection inference by using
--s.update falseoption. By default this will use neutral selection for all loci. In order to fix selection for each locus to specific value, one can use option--s <file_with_values>, where<file_with_values>is a file that contain indices for selection values in the grid on each line.Dominance estimation (use
--h.update falseto avoid estimation of dominance coefficients)