Result Class¶

class pyqms.Results(lookup=None, params=None, fixed_labels=None, metabolic_labels=None, aa_compositions=None, isotopic_distributions=None, charges=None, verbose=False)¶

pyQms results class.

Holds all matching information and lookups. Can be accessed as a dictionary. Several lookup allow the mapping of moleclar formulas to molecules (e.g. peptides) and/or trivial names (e.g. protein names).

Structure

key (named tuple)

file_name

formula

charge

label_percentiles

value (named tuple)

spec_id

rt

score

scaling_factor

peaks

add(key, value)¶

Adds match to the result container.

Parameters:	key (named tuple) – ( file_name, formula, charge, label_percentiles ) value (named tuple) – (spec_id, rt, score, scaling_factor, peaks)

Returns: formatted_key (named tuple) : ( file_name, formula, charge, label_percentiles )

Structure

key (named tuple)

file_name

formula

charge

label_percentiles

value (named tuple)

spec_id

rt

score

scaling_factor

peaks

calc_amounts_from_rt_info_file(rt_info_file=None, rt_border_tolerance=None, calc_amount_function=None, evidence_score_field='PEP', buffer_only=False, buffered_csv_dicts=None)¶

Function to calculate molecule/peptide amounts based on the quant summary/rt info file genearte by write_rt_info_file(). See e.g. example script generate_quant_summary_file.py. A function to calculate the final molecule amounts can be defined otherwise the default maximum intensity function is used.

Parameters:	rt_info_file (str) – output file name of the quant summary/rt info csv file, must be be a complete or relative path rt_border_tolerance (int) – retention time border tolerance in minutes calc_amount_function (obj) – python function to calculate final amounts based on a simple dictionary structure

The function to calculate the amount of the molecules (calc_amount_function) should be able to process the below shown dictionary structure (obj_for_calc_amount). The default function returns the maximum amount in the retention time window or in the complete profile. The function should return the determined amount, the retention time (or approximate) as well as the score. If functions are used which deternine the amount over more than one spectrum retention times for this amount should be e.g. at the maximum intensity of the profile. Scores could be e.g. averaged or also the score at the maximum amount could be used.

Example for obj_for_calc_amount:

{
    'rt'       : [rt1,rt2,...],
    'i'        : [in1,in2,...],
    'scores'   : [sc1,sc2,...],
    'spec_ids' : [id1,idt2,...],
}

Example key names (default):

‘max I in window’

‘max I in window (rt)’

‘max I in window (score)’

‘auc in window’ (area under curve)

‘sum I in window’ (summed up intensities)

curate_rt_windows(evidence_dict, rt_tolerance)¶: Internal function to curate RT windows

determine_max_itensity(obj_for_calc_amount)¶

Function to determine the maximum intensity in given elution profile. The structure of the object passed to the function is shown below. This fucntion can be used as a template function to write and use of function to determine the amount of a molecule e.g. area under curve or summed up intensities. All self written function must return amount, rt at or around the amount and the mScore at or around the amount in an dictionary with appropiate key names.

Example key names (default):

‘max I in window’

‘max I in window (rt)’

‘max I in window (score)’

Note

This is the default function to determine the peptide amount when write_amount_csv() is called.

Examples:

{
   'rt'       : [rt1,rt2,...],
   'i'        : [in1,in2,...],
   'scores'   : [sc1,sc2,...],
   'spec_ids' : [id1,idt2,...],
}

Returns:	keys are shown above
Return type:	dict

extract_results(molecules=None, charges=None, file_names=None, label_percentiles=None, formulas=None, score_threshold=None)¶

Extract selected results.

Extracts all matches from the results instance that meet given filter criteria.

Parameters:

molecules (list of str, optional) – considered molecules. Those will be translated using self._translate_molecules_to_formulas()
charges (list of int, optional) – considered charge states.
file_names (list of str, optional) – list of file names to be considered.
label_percentiles (list of tuple, optional) – list of label percentile tuples to be considered.
formulas (list of str) – list of chemical formulas

Yields:

key, i, entry (tuple) – result class key, index of entry and entry

Structure

key (named tuple)

file_name

formula

charge

label_percentiles

value (named tuple)

spec_id

rt

score

scaling_factor

peaks

max_score(molecules=None, charges=None, file_names=None, label_percentiles=None, formulas=None)¶

Find max score for a given set of parameters.

Parameters:	molecules (list of str, optional) – considered molecules. Those will be translated using self._translate_molecules_to_formulas() charges (list of int, optional) – considered charge states. file_names (list of str, optional) – list of file names to be considered. label_percentiles (list of tuple, optional) – list of label percentile tuples to be considered. formulas (list of str) – list of chemical formulas
Returns:	key is appropriate key in result.dict
Return type:	best_score, key, index (tuple)

plot_MIC_3D(key, file_name=None, rt_window=None, i_transform=None)¶: Plot MIC from results using rpy2 in 3D.

plot_MICs_2D(key_list, file_name=None, rt_window=None, i_transform=None, xlimits=None, additional_legends=None, title=None, zlimits=None, ablines=None, graphics=None)¶

Parameters:	additional_legends (dict) – key points on lists of strings that are plotted as well.

write_result_csv(output_file_name=None)¶

Write raw results into a .csv file

Parameters:	output_file_name (str) – output file name of the csv containing containing all raw results, should be a complete path

Warning

Depending on data size the resulting csv can become very large. Some csv viewer can not handle files with a large number of lines.

Keys in csv:

Formula : molecular formula of the molecule (str)

Molecule : molecule or trivial name (str)

Charge : charge of the molecule (int)

ScanID : ScanID of the quantified spectrum (int)

Label Percentiles : Labeling percentile ( (element, enrichment in %), )

Amount : the determined amount of the molecule

Retention Time : retetention time of the ScanID

mScore : score of the isotopologue match

Filename : filename of spectrum input files

write_result_mztab(output_file_name=None, rt_border_tolerance=None)¶

Write minimal peptide quantification results into a .mztab file. It is neccessary to specify the ‘formula to evidences’ dict in the lookup of the results class to write results!

Note:

This basic mzTab writer is still in beta stage. Use and evaluate with care.

PRIDE CV based quantifcation unit and value is fixd to:

PRIDE:0000393, Relative quantification unit

PRIDE:0000425, MS1 intensity based label-free quantification method

Args:

output_file_name (str): output file name of the mztab containing

containing all raw results, should be a complete path

Note:

Adiitional information has to be passed tot he result class for a more complete mztab output.

Keys in mztab:

sequence

accession

unique

database

database_version

search_engine

best_search_engine_score[1-n]

modifications

retention_time

retention_time_window

charge

mass_to_charge

peptide_abundance_study_variable[1-n]

peptide_abundance_stdev_study_variable[1-n]

peptide_abundance_std_error_study_variable[1-n]

search_engine_score[1-n]_ms_run[1-n]

peptide_abundance_assay[1-n]

spectra_ref

opt_{identifier}_*

reliability

uri

Addtional information can be added to the mzTab file by adding a dict like shown below to the results.lookup dict under the key ‘mztab_meta_info’.:

mztab_meta_info = {
    'protein_search_engine_score'   : [],
    'psm_search_engine_score'       : ['[MS,MS:1001475,OMSSA:evalue, ]'],
    'fixed_mod'                     : ['[UNIMOD, UNIMOD:4, Carbamidomethyl, ]'],
    'variable_mod'                  : ['[UNIMOD, UNIMOD:35, Oxidation, ]'],
    'study_variable-description'    : ['Standard BSA measurement'],
    'ms_run-location'               : ['BSA1.mzML'],
}

write_rt_info_file(output_file=None, list_of_csvdicts=None, trivial_name_lookup=None, rt_border_tolerance=None, update=True, buffer_only=False)¶

Function to write a default quant summary/rt info file. See e.g. example script generate_quant_summary_file.py.

Parameters:

output_file (str) – output file name of the csv, should be a complete path
list_of_csvdicts (list) – list of dictionaries passed to the DictWriter class, default fieldnames can be found below
trivial_name_lookup (dict) – self defined trivial_name_lookup, see format below.
rt_border_tolerance (int) – retention time border tolerance in minutes
update (bool) – if True read in or passed dictionaries in list_of_csvdicts will be updated with default evidence and trivial name information

The quant summary file can manually be updated (e.g. the start and stop RT information). If an evidence lookup is present in the result class ( can be passed to the isotopologue library or later be set in the result class), these information are used to define the retention time borders (e.g. peptide identfication information from peptide spectrum matches).

Default fieldnames:

file_name : filename of spectrum input file

formula : molecular formula of the molecule

molecule : molecule or trivial name

trivial_name(s) : protein or trivial names

label_percentiles : labeling percentile ( (element, enrichment in %), )

charge : charge of the molecule

start (min) : start of retention time window

stop (min) : stop of retention time window

max I in window : maximum intensity in retention time window

max I in window (rt) : retention time @ maximum intensity in retention time window

max I in window (score) : score @ maximum intensity in retention time window

auc in window : area under curve in retention time window

sum I in window : summed up intensities in retention time window

evidences (min) : all evidences/identifications (score@rt;…)

Trivial name lookup example:

{
    'C(33)H(59)14N(1)N(8)O(9)S(1)' : ['BSA','Bovine serum albumine']
}