Imputed Stats - imputed-stats

Performs statistical analysis on imputed data (either SKAT analysis, or linear, logistic or survival regression).

Available statistical models

Name

Description

cox

Cox’s proportional hazard model (survival regression).

linear

Linear regression (ordinary least squares).

logistic

Logistic regression (GLM with binomial distribution).

mixedlm

Linear mixed effect model (random intercept).

skat

SKAT analysis.

Common options

General options

Option

Description

-h, --help

Show this help message and exit.

-v, --version

Show program’s version number and exit.

--debug

Set the logging level to debug.

Input files

Option

Description

--impute2 FILE

The output from IMPUTE2.

--sample FILE

The sample file (the order should be the same as in the IMPUTE2 files).

--pheno FILE

The file containing phenotypes and co variables.

--extract-sites FILE

A list of sites to extract for analysis (optional).

Output options

Option

Description

--out FILE

The prefix for the output files. [imputed_stats]

General options

Option

Description

--nb-process INT

The number of process to use. [1]

--nb-lines INT

The number of line to read at a time. [1000]

--chrx

The analysis is performed for the non pseudo-autosomal region of the chromosome X (male dosage will be divided by 2 to get values [0, 0.5] instead of [0, 1]) (males are coded as 1 and option ‘--gender-column’ should be used).

--gender-column NAME

The name of the gender column (use to exclude samples with unknown gender (i.e. not 1, male, or 2, female). If gender not available, use ‘None’. [Gender]

Dosage options

Option

Description

--scale INT

Scale dosage so that values are in [0, n] (possible values are 1 (no scaling) or 2). [2]

--prob FLOAT

The minimal probability for which a genotype should be considered. [>=0.9]

--maf FLOAT

Minor allele frequency threshold for which marker will be skipped. [<0.01]

Phenotype options

Option

Description

--covar NAME

The co variable names (in the phenotype file), separated by coma.

--categorical NAME

The name of the variables that are categorical (note that the gender is always categorical). The variables are separated by coma.

--missing-value NAME

The missing value in the phenotype file.

--sample-column NAME

The name of the sample ID column (in the phenotype file). [sample_id]

--interaction NAME

Add an interaction between the genotype and this variable.

Cox’s proportional hazard model options

Option

Description

--time-to-event NAME

The time to event variable (in the pheno file).

--event NAME

The event variable (1 if observed, 0 if not observed).

Linear regression options

Option

Description

--pheno-name NAME

The phenotype.

Logistic regression options

Option

Description

--pheno-name NAME

The phenotype.

Linear mixed effects options

Option

Description

--pheno-name NAME

The phenotype.

--use-ml

Fit the standard likelihood using maximum likelihood (ML) estimation instead of REML (default is REML).

--p-threshold FLOAT

The p-value threshold for which the real MixedLM analysis will be performed. [<0.0001]

SKAT options

Option

Description

--snp-sets FILE

A file indicating a snp_set and an optional weight for every variant.

--outcome-type {continuous,discrete}

The variable type for the outcome. This will be passed to SKAT.

--skat-o

By default, the regular SKAT is used. Setting this flag will use the SKAT-O algorithm instead.

--pheno-name NAME

The phenotype.