genipe.formats package

Module contents

Submodules

genipe.formats.impute2 module

genipe.formats.impute2.matrix_from_line(impute2_line)[source]

Generates the probability matrix from an IMPUTE2 line.

Parameters:impute2_line (list) – a single line from IMPUTE2’s result (split by space)
Returns:
a tuple containing the marker’s information (first five values
of the line) and the matrix probability (numpy array, float)
Return type:tuple

The shape of the matrix is n x 3 where n is the number of samples. The columns represent the probability for AA, AB and BB.

Note

The impute2_line variable is a list of str, corresponding to a line from the IMPUTE2’s result, split by space.

genipe.formats.impute2.get_good_probs(prob_matrix, min_prob=0.9)[source]

Gathers good imputed genotypes (>= probability threshold).

Parameters:
  • prob_matrix (numpy.array) – the probability matrix
  • min_prob (float) – the probability threshold
Returns:

a mask array containing the positions where the

probabilities are equal or higher to the threshold

Return type:

numpy.array

genipe.formats.impute2.maf_from_probs(prob_matrix, a1, a2, gender=None, site_name=None)[source]

Computes MAF from a probability matrix (and gender if chromosome X).

Parameters:
  • prob_matrix (numpy.array) – the probability matrix
  • a1 (str) – the first allele
  • a2 (str) – the second allele
  • gender (numpy.array) – the gender of the samples
  • site_name (str) – the name for this site
Returns:

a tuple containing three values: the minor allele frequency, the

minor and the major allele.

Return type:

tuple

When ‘gender’ is not None, we assume that the MAF on chromosome X is required (hence, males count as 1, and females as 2 alleles). There is also an Exception raised if there are any heterozygous males.

genipe.formats.impute2.dosage_from_probs(homo_probs, hetero_probs, scale=2)[source]

Computes dosage from probability matrix (for the minor allele).

Parameters:
  • homo_probs (numpy.array) – the probabilities for the homozygous genotype
  • hetero_probs (numpy.array) – the probabilities for the heterozygous genotype
  • scale (int) – the scale value
Returns:

the dosage computed from the probabilities

Return type:

numpy.array

genipe.formats.impute2.hard_calls_from_probs(a1, a2, probs)[source]

Computes hard calls from probability matrix.

Parameters:
  • a1 (str) – the first allele
  • a2 (str) – the second allele
  • probs (numpy.array) – the probability matrix
Returns:

the hard calls computed from the probabilities

Return type:

numpy.array

genipe.formats.impute2.maf_dosage_from_probs(prob_matrix, a1, a2, scale=2, gender=None, site_name=None)[source]

Computes MAF and dosage vector from probs matrix.

Parameters:
  • prob_matrix (numpy.array) – the probability matrix
  • a1 (str) – the first allele
  • a2 (str) – the second allele
  • scale (int) – the scale value
  • gender (numpy.array) – the gender of the samples
  • site_name (str) – the name for this site
Returns:

a tuple containing four values: the dosage vector, the minor

allele frequency, the minor and the major allele.

Return type:

tuple

When ‘gender’ is not None, we assume that the MAF on chromosome X is required (hence, males count as 1, and females as 2 alleles). There is also an Exception raised if there are any heterozygous males.

genipe.formats.impute2.additive_from_probs(a1, a2, probs)[source]

Compute additive format from probability matrix.

Parameters:
  • a1 (str) – the a1 allele
  • a2 (str) – the a2 allele
  • probs (numpy.array) – the probability matrix
Returns:

the additive format computed from the probabilities, the minor

and major allele.

Return type:

tuple

The encoding is as follow: 0 when homozygous major allele, 1 when heterozygous and 2 when homozygous minor allele.

The minor and major alleles are inferred by looking at the MAF. By default, we think a2 is the minor allele, but flip if required.

genipe.formats.index module

genipe.formats.index.get_index(fn, cols, names, sep)[source]

Restores the index for a given file.

Parameters:
  • fn (str) – the name of the file
  • cols (list) – a list containing column to keep (as int)
  • names (list) – the name corresponding to the column to keep (as str)
  • sep (str) – the field separator
Returns:

the index

Return type:

pandas.DataFrame

If the index doesn’t exist for the file, it is first created.

genipe.formats.index.get_open_func(fn, return_fmt=False)[source]

Get the opening function.

Parameters:
  • fn (str) – the name of the file
  • return_fmt (bool) – if the file format needs to be returned
Returns:

either a tuple containing two elements: a boolean telling if the

format is bgzip, and the opening function.

Return type:

tuple