Result Summary and Utility Functions
from pyXLMS import __version__
print(f"Installed pyXLMS version: {__version__}") Installed pyXLMS version: 1.5.1from pyXLMS import parser
from pyXLMS import transformAll data transformation functionality is available via the transform submodule. We also import the parser submodule here for reading result files.
parser_result = parser.read(
"../../data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult",
engine="MS Annika",
crosslinker="DSS",
) Reading MS Annika CSMs...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 826/826 [00:00<00:00, 8136.07it/s]
Reading MS Annika crosslinks...: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 300/300 [00:00<00:00, 13044.83it/s]We read crosslink-spectrum-matches and crosslinks using the generic parserΒ from a single .pdResult file.
csms = parser_result["crosslink-spectrum-matches"]
xls = parser_result["crosslinks"]For easier access we assign our crosslink-spectrum-matches to the variable csms and our crosslinks to the variable xls.
Result Summary
csms_summary = transform.summary(csms) Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.1132827525593785
Maximum CSM score: 452.9861536355926We can get a summary of our crosslink-spectrum-matches by calling transform.summary() and passing the crosslink-spectrum-matches as the first argument. The function returns a dictionary containing summary statistics. You can read more about the summary() function and all its parameters here: docs.
csms_summary {'Number of CSMs': 826.0,
'Number of unique CSMs': 826.0,
'Number of intra CSMs': 803.0,
'Number of inter CSMs': 23.0,
'Number of target-target CSMs': 786.0,
'Number of target-decoy CSMs': 39.0,
'Number of decoy-decoy CSMs': 1.0,
'Minimum CSM score': 1.1132827525593785,
'Maximum CSM score': 452.9861536355926}The dictionary returned by summary(). By default all summary statistics are also printed to stdout.
validated_csms = transform.validate(csms)
_ = transform.summary(validated_csms) Iterating over scores for FDR calculation...: 15%|ββββββββββββ | 121/826 [00:00<00:00, 30246.78it/s]
Number of CSMs: 705.0
Number of unique CSMs: 705.0
Number of intra CSMs: 701.0
Number of inter CSMs: 4.0
Number of target-target CSMs: 699.0
Number of target-decoy CSMs: 6.0
Number of decoy-decoy CSMs: 0.0
Minimum CSM score: 34.188549584398956
Maximum CSM score: 452.9861536355926You can also repeatedly call summary() after data transformation steps to see how your results change, here as an example demonstrated after validation of crosslink-spectrum-matches.
_ = transform.summary(xls) Number of crosslinks: 300.0
Number of unique crosslinks by peptide: 300.0
Number of unique crosslinks by protein: 298.0
Number of intra crosslinks: 279.0
Number of inter crosslinks: 21.0
Number of target-target crosslinks: 265.0
Number of target-decoy crosslinks: 0.0
Number of decoy-decoy crosslinks: 35.0
Minimum crosslink score: 1.1132827525593785
Maximum crosslink score: 452.9861536355926Similarly, we can get a summary of our crosslinks by calling transform.summary() and passing the crosslinks as the first argument.
_ = transform.summary(parser_result) Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.1132827525593785
Maximum CSM score: 452.9861536355926
Number of crosslinks: 300.0
Number of unique crosslinks by peptide: 300.0
Number of unique crosslinks by protein: 298.0
Number of intra crosslinks: 279.0
Number of inter crosslinks: 21.0
Number of target-target crosslinks: 265.0
Number of target-decoy crosslinks: 0.0
Number of decoy-decoy crosslinks: 35.0
Minimum crosslink score: 1.1132827525593785
Maximum crosslink score: 452.9861536355926Lastly, we can also get a summary of our parser_result by calling transform.summary() and passing the parser_result as the first argument.
Crosslink-Spectrum-Matches and Crosslinks as pandas DataFrames
You might want to convert your crosslink-spectrum-matches and crosslinks to a pandas.DataFrame to work with pandasΒ or simply to save them to a file, for example to process them somewhere else. For this purpose the transform.to_dataframe() function exists: it takes a list of crosslink-spectrum-matches or crosslinks as input and returns them as a pandas.DataFrame. You can read more about the to_dataframe() function and its parameters here: docs.
df = transform.to_dataframe(csms)
df.head(5)
Crosslink-spectrum-matches converted to a pandas.DataFrame.
loaded_csms = transform.from_dataframe(df)
_ = transform.summary(loaded_csms) Reading CSMs...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 826/826 [00:00<00:00, 2650.47it/s]
Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.1132827525593785
Maximum CSM score: 452.9861536355926We can also load back our crosslink-spectrum-matches from the pandas.DataFrame using transfrom.from_dataframe(). You can read more about the from_dataframe() function and its parameters here: docs.
df = transform.to_dataframe(xls)
df.head(5)
Similarly, we can convert our crosslinks to a pandas.DataFrame with transform.to_dataframe()β¦
loaded_xls = transform.from_dataframe(df)
_ = transform.summary(loaded_xls) Reading crosslinks...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 300/300 [00:00<00:00, 7895.46it/s]
Number of crosslinks: 300.0
Number of unique crosslinks by peptide: 300.0
Number of unique crosslinks by protein: 298.0
Number of intra crosslinks: 279.0
Number of inter crosslinks: 21.0
Number of target-target crosslinks: 265.0
Number of target-decoy crosslinks: 0.0
Number of decoy-decoy crosslinks: 35.0
Minimum crosslink score: 1.1132827525593785
Maximum crosslink score: 452.9861536355926β¦and load them back with transform.from_dataframe().
Crosslink-Spectrum-Matches and Crosslinks in ProForma Notation
If you want to export your crosslink-spectrum-matches or crosslinks to ProFormaΒ notation, you can do that with transform.to_proforma() which you can read more about here: docs.
Letβs go through a few examples:
csm = csms[0]
csm {'data_type': 'crosslink-spectrum-match',
'completeness': 'full',
'alpha_peptide': 'GQKNSR',
'alpha_modifications': {3: ('DSS', 138.06808)},
'alpha_peptide_crosslink_position': 3,
'alpha_proteins': ['Cas9'],
'alpha_proteins_crosslink_positions': [779],
'alpha_proteins_peptide_positions': [777],
'alpha_score': 119.82548987540834,
'alpha_decoy': False,
'beta_peptide': 'GQKNSR',
'beta_modifications': {3: ('DSS', 138.06808)},
'beta_peptide_crosslink_position': 3,
'beta_proteins': ['Cas9'],
'beta_proteins_crosslink_positions': [779],
'beta_proteins_peptide_positions': [777],
'beta_score': 119.82547820493929,
'beta_decoy': False,
'crosslink_type': 'intra',
'score': 119.82547820493929,
'spectrum_file': 'XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw',
'scan_nr': 2257,
'charge': 3,
'retention_time': 733.1895599999999,
'ion_mobility': 0.0,
'additional_information': None}Letβs consider the above crosslink-spectrum-match of a homeotypic crosslink where both peptides are GQKSNR crosslinked with DSS at the K3.
csm_proforma = transform.to_proforma(csm)Calling to_proforma() on a crosslink-spectrum-match will give us the crosslink-spectrum-match in ProForma notation (as a str).
csm_proforma 'GQK[+138.06808]NSR//GQK[+138.06808]NSR/3'The crosslink modifications are automatically denoted with their corresponding monoisotopic delta masses, for DSS that is +138.06808 Da.
csms_proforma = transform.to_proforma(csms)We can also pass a list of crosslink-spectrum-matches to to_proforma()β¦
csms_proforma[:5] ['GQK[+138.06808]NSR//GQK[+138.06808]NSR/3',
'GQK[+138.06808]NSR//GSQK[+138.06808]DR/3',
'SDK[+138.06808]NR//SDK[+138.06808]NR/3',
'DK[+138.06808]QSGK//DK[+138.06808]QSGK/3',
'DK[+138.06808]QSGK//HSIK[+138.06808]K/3']β¦which will return a list of str instead, containing the corresponding crosslink-spectrum-matches in ProForma notation.
xl = xls[0]
xl {'data_type': 'crosslink',
'completeness': 'full',
'alpha_peptide': 'GQKNSR',
'alpha_peptide_crosslink_position': 3,
'alpha_proteins': ['Cas9'],
'alpha_proteins_crosslink_positions': [779],
'alpha_decoy': False,
'beta_peptide': 'GQKNSR',
'beta_peptide_crosslink_position': 3,
'beta_proteins': ['Cas9'],
'beta_proteins_crosslink_positions': [779],
'beta_decoy': False,
'crosslink_type': 'intra',
'score': 119.82547820493929,
'additional_information': None}Alternatively, consider the above crosslink which we get from the previous crosslink-spectrum-match: again both peptides are GQKNSR and are crosslinked at the K3 residue.
xl_proforma = transform.to_proforma(xl)We can get the ProForma str of the crosslink by passing the crosslink as the first argument to to_proforma().
xl_proforma 'GQKNSR//GQKNSR'Because the crosslink in itself does not contain any information about post-translational-modifications since it is an aggregation of crosslink-spectrum-matches, the returned ProForma string does not contain the crosslinker.
xl_proforma = transform.to_proforma(xl, crosslinker="Xlink:DSS")
xl_proforma 'GQK[Xlink:DSS]NSR//GQK[Xlink:DSS]NSR'We can explicitly pass the name or mass of the crosslinker via the crosslinker parameter to include it in the ProForma string.
csm_proforma = transform.to_proforma(csm, crosslinker="Xlink:DSS")
csm_proforma 'GQK[+138.06808]NSR//GQK[+138.06808]NSR/3'Passing a crosslinker via the parameter crosslinker when there is already a crosslinker specified, as for example in our previous crosslink-spectrum-match, will have no effect.
xls_proforma = transform.to_proforma(xls)We can also pass a list of crosslinks to to_proforma()β¦
xls_proforma[:5] ['GQKNSR//GQKNSR',
'GQKNSR//GSQKDR',
'SDKNR//SDKNR',
'DKQSGK//DKQSGK',
'DKQSGK//HSIKK']β¦which will return a list of str instead, containing the corresponding crosslinks in ProForma notation.