Skip to Content
DocumentationData TransformationResult Summary and Utility Functions

Result Summary and Utility Functions

from pyXLMS import __version__ print(f"Installed pyXLMS version: {__version__}")
βœ“
Installed pyXLMS version: 1.5.1
from pyXLMS import parser from pyXLMS import transform

All data transformation functionality is available via the transform submodule. We also import the parser submodule here for reading result files.

parser_result = parser.read( "../../data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult", engine="MS Annika", crosslinker="DSS", )
βœ“
Reading MS Annika CSMs...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 826/826 [00:00<00:00, 8136.07it/s] Reading MS Annika crosslinks...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 300/300 [00:00<00:00, 13044.83it/s]

We read crosslink-spectrum-matches and crosslinks using the generic parserΒ  from a single .pdResult file.

csms = parser_result["crosslink-spectrum-matches"] xls = parser_result["crosslinks"]

For easier access we assign our crosslink-spectrum-matches to the variable csms and our crosslinks to the variable xls.

Result Summary

csms_summary = transform.summary(csms)
βœ“
Number of CSMs: 826.0 Number of unique CSMs: 826.0 Number of intra CSMs: 803.0 Number of inter CSMs: 23.0 Number of target-target CSMs: 786.0 Number of target-decoy CSMs: 39.0 Number of decoy-decoy CSMs: 1.0 Minimum CSM score: 1.1132827525593785 Maximum CSM score: 452.9861536355926

We can get a summary of our crosslink-spectrum-matches by calling transform.summary() and passing the crosslink-spectrum-matches as the first argument. The function returns a dictionary containing summary statistics. You can read more about the summary() function and all its parameters here: docs.

csms_summary
βœ“
{'Number of CSMs': 826.0, 'Number of unique CSMs': 826.0, 'Number of intra CSMs': 803.0, 'Number of inter CSMs': 23.0, 'Number of target-target CSMs': 786.0, 'Number of target-decoy CSMs': 39.0, 'Number of decoy-decoy CSMs': 1.0, 'Minimum CSM score': 1.1132827525593785, 'Maximum CSM score': 452.9861536355926}

The dictionary returned by summary(). By default all summary statistics are also printed to stdout.

validated_csms = transform.validate(csms) _ = transform.summary(validated_csms)
βœ“
Iterating over scores for FDR calculation...: 15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 121/826 [00:00<00:00, 30246.78it/s] Number of CSMs: 705.0 Number of unique CSMs: 705.0 Number of intra CSMs: 701.0 Number of inter CSMs: 4.0 Number of target-target CSMs: 699.0 Number of target-decoy CSMs: 6.0 Number of decoy-decoy CSMs: 0.0 Minimum CSM score: 34.188549584398956 Maximum CSM score: 452.9861536355926

You can also repeatedly call summary() after data transformation steps to see how your results change, here as an example demonstrated after validation of crosslink-spectrum-matches.

_ = transform.summary(xls)
βœ“
Number of crosslinks: 300.0 Number of unique crosslinks by peptide: 300.0 Number of unique crosslinks by protein: 298.0 Number of intra crosslinks: 279.0 Number of inter crosslinks: 21.0 Number of target-target crosslinks: 265.0 Number of target-decoy crosslinks: 0.0 Number of decoy-decoy crosslinks: 35.0 Minimum crosslink score: 1.1132827525593785 Maximum crosslink score: 452.9861536355926

Similarly, we can get a summary of our crosslinks by calling transform.summary() and passing the crosslinks as the first argument.

_ = transform.summary(parser_result)
βœ“
Number of CSMs: 826.0 Number of unique CSMs: 826.0 Number of intra CSMs: 803.0 Number of inter CSMs: 23.0 Number of target-target CSMs: 786.0 Number of target-decoy CSMs: 39.0 Number of decoy-decoy CSMs: 1.0 Minimum CSM score: 1.1132827525593785 Maximum CSM score: 452.9861536355926 Number of crosslinks: 300.0 Number of unique crosslinks by peptide: 300.0 Number of unique crosslinks by protein: 298.0 Number of intra crosslinks: 279.0 Number of inter crosslinks: 21.0 Number of target-target crosslinks: 265.0 Number of target-decoy crosslinks: 0.0 Number of decoy-decoy crosslinks: 35.0 Minimum crosslink score: 1.1132827525593785 Maximum crosslink score: 452.9861536355926

Lastly, we can also get a summary of our parser_result by calling transform.summary() and passing the parser_result as the first argument.

You might want to convert your crosslink-spectrum-matches and crosslinks to a pandas.DataFrame to work with pandasΒ  or simply to save them to a file, for example to process them somewhere else. For this purpose the transform.to_dataframe() function exists: it takes a list of crosslink-spectrum-matches or crosslinks as input and returns them as a pandas.DataFrame. You can read more about the to_dataframe() function and its parameters here: docs.

df = transform.to_dataframe(csms) df.head(5)
βœ“

png

Crosslink-spectrum-matches converted to a pandas.DataFrame.

loaded_csms = transform.from_dataframe(df) _ = transform.summary(loaded_csms)
βœ“
Reading CSMs...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 826/826 [00:00<00:00, 2650.47it/s] Number of CSMs: 826.0 Number of unique CSMs: 826.0 Number of intra CSMs: 803.0 Number of inter CSMs: 23.0 Number of target-target CSMs: 786.0 Number of target-decoy CSMs: 39.0 Number of decoy-decoy CSMs: 1.0 Minimum CSM score: 1.1132827525593785 Maximum CSM score: 452.9861536355926

We can also load back our crosslink-spectrum-matches from the pandas.DataFrame using transfrom.from_dataframe(). You can read more about the from_dataframe() function and its parameters here: docs.

df = transform.to_dataframe(xls) df.head(5)
βœ“

png

Similarly, we can convert our crosslinks to a pandas.DataFrame with transform.to_dataframe()…

loaded_xls = transform.from_dataframe(df) _ = transform.summary(loaded_xls)
βœ“
Reading crosslinks...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 300/300 [00:00<00:00, 7895.46it/s] Number of crosslinks: 300.0 Number of unique crosslinks by peptide: 300.0 Number of unique crosslinks by protein: 298.0 Number of intra crosslinks: 279.0 Number of inter crosslinks: 21.0 Number of target-target crosslinks: 265.0 Number of target-decoy crosslinks: 0.0 Number of decoy-decoy crosslinks: 35.0 Minimum crosslink score: 1.1132827525593785 Maximum crosslink score: 452.9861536355926

…and load them back with transform.from_dataframe().

If you want to export your crosslink-spectrum-matches or crosslinks to ProFormaΒ  notation, you can do that with transform.to_proforma() which you can read more about here: docs.

Let’s go through a few examples:

csm = csms[0] csm
βœ“
{'data_type': 'crosslink-spectrum-match', 'completeness': 'full', 'alpha_peptide': 'GQKNSR', 'alpha_modifications': {3: ('DSS', 138.06808)}, 'alpha_peptide_crosslink_position': 3, 'alpha_proteins': ['Cas9'], 'alpha_proteins_crosslink_positions': [779], 'alpha_proteins_peptide_positions': [777], 'alpha_score': 119.82548987540834, 'alpha_decoy': False, 'beta_peptide': 'GQKNSR', 'beta_modifications': {3: ('DSS', 138.06808)}, 'beta_peptide_crosslink_position': 3, 'beta_proteins': ['Cas9'], 'beta_proteins_crosslink_positions': [779], 'beta_proteins_peptide_positions': [777], 'beta_score': 119.82547820493929, 'beta_decoy': False, 'crosslink_type': 'intra', 'score': 119.82547820493929, 'spectrum_file': 'XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw', 'scan_nr': 2257, 'charge': 3, 'retention_time': 733.1895599999999, 'ion_mobility': 0.0, 'additional_information': None}

Let’s consider the above crosslink-spectrum-match of a homeotypic crosslink where both peptides are GQKSNR crosslinked with DSS at the K3.

csm_proforma = transform.to_proforma(csm)

Calling to_proforma() on a crosslink-spectrum-match will give us the crosslink-spectrum-match in ProForma notation (as a str).

csm_proforma
βœ“
'GQK[+138.06808]NSR//GQK[+138.06808]NSR/3'

The crosslink modifications are automatically denoted with their corresponding monoisotopic delta masses, for DSS that is +138.06808 Da.

csms_proforma = transform.to_proforma(csms)

We can also pass a list of crosslink-spectrum-matches to to_proforma()…

csms_proforma[:5]
βœ“
['GQK[+138.06808]NSR//GQK[+138.06808]NSR/3', 'GQK[+138.06808]NSR//GSQK[+138.06808]DR/3', 'SDK[+138.06808]NR//SDK[+138.06808]NR/3', 'DK[+138.06808]QSGK//DK[+138.06808]QSGK/3', 'DK[+138.06808]QSGK//HSIK[+138.06808]K/3']

…which will return a list of str instead, containing the corresponding crosslink-spectrum-matches in ProForma notation.

xl = xls[0] xl
βœ“
{'data_type': 'crosslink', 'completeness': 'full', 'alpha_peptide': 'GQKNSR', 'alpha_peptide_crosslink_position': 3, 'alpha_proteins': ['Cas9'], 'alpha_proteins_crosslink_positions': [779], 'alpha_decoy': False, 'beta_peptide': 'GQKNSR', 'beta_peptide_crosslink_position': 3, 'beta_proteins': ['Cas9'], 'beta_proteins_crosslink_positions': [779], 'beta_decoy': False, 'crosslink_type': 'intra', 'score': 119.82547820493929, 'additional_information': None}

Alternatively, consider the above crosslink which we get from the previous crosslink-spectrum-match: again both peptides are GQKNSR and are crosslinked at the K3 residue.

xl_proforma = transform.to_proforma(xl)

We can get the ProForma str of the crosslink by passing the crosslink as the first argument to to_proforma().

xl_proforma
βœ“
'GQKNSR//GQKNSR'

Because the crosslink in itself does not contain any information about post-translational-modifications since it is an aggregation of crosslink-spectrum-matches, the returned ProForma string does not contain the crosslinker.

xl_proforma = transform.to_proforma(xl, crosslinker="Xlink:DSS") xl_proforma
βœ“
'GQK[Xlink:DSS]NSR//GQK[Xlink:DSS]NSR'

We can explicitly pass the name or mass of the crosslinker via the crosslinker parameter to include it in the ProForma string.

csm_proforma = transform.to_proforma(csm, crosslinker="Xlink:DSS") csm_proforma
βœ“
'GQK[+138.06808]NSR//GQK[+138.06808]NSR/3'

Passing a crosslinker via the parameter crosslinker when there is already a crosslinker specified, as for example in our previous crosslink-spectrum-match, will have no effect.

xls_proforma = transform.to_proforma(xls)

We can also pass a list of crosslinks to to_proforma()…

xls_proforma[:5]
βœ“
['GQKNSR//GQKNSR', 'GQKNSR//GSQKDR', 'SDKNR//SDKNR', 'DKQSGK//DKQSGK', 'DKQSGK//HSIKK']

…which will return a list of str instead, containing the corresponding crosslinks in ProForma notation.

Last updated on