Exporting Crosslink-Spectrum-Matches and Crosslinks to IMP-X-FDR
from pyXLMS import __version__
print(f"Installed pyXLMS version: {__version__}") Installed pyXLMS version: 1.5.3from pyXLMS import parser
from pyXLMS import exporterAll exporting functionality is available via the exporter submodule. We also import the parser submodule to read crosslink-spectrum-matches and crosslinks.
parser_result = parser.read(
"../../data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult",
engine="MS Annika",
crosslinker="DSS",
)
csms = parser_result["crosslink-spectrum-matches"]
xls = parser_result["crosslinks"] Reading MS Annika CSMs...: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 826/826 [00:00<00:00, 10315.30it/s]
Reading MS Annika crosslinks...: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 300/300 [00:00<00:00, 15041.25it/s]We read crosslink-spectrum-matches and crosslinks using the generic parserΒ from a single .pdResult file. For easier access we also assign our crosslink-spectrum-matches to the variable csms and our crosslinks to the variable xls.
from pyXLMS.transform import validate, targets_only
csms = targets_only(validate(csms))
xls = targets_only(validate(xls)) Iterating over scores for FDR calculation...: 15%|βββββββββββ | 121/826 [00:00<?, ?it/s]
Iterating over scores for FDR calculation...: 25%|ββββββββββββββββββ | 74/300 [00:00<?, ?it/s]For benchmarking purposes you would usually want to validate your crosslink-spectrum-matches and crosslinks, either externally or in this case with the pyXLMS in-built validate() function [docsΒ ] to compare your estimated FDR to the βrealβ experimental FDR! You should also filter out any non-target matches, e.g. via the targets_only() function [docsΒ ]!
_df = exporter.to_impxfdr(xls, filename=None)The function exporter.to_impxfdr() exports a list of crosslinks or crosslink-spectrum-matches to IMP-X-FDRΒ format for benchmarking purposes. The tool IMP-X-FDR is available from github.com/vbc-proteomics-org/imp-x-fdrΒ . We recommend using version 1.1.0 and selecting "MS Annika" as input file format for the here exported file. A slightly modified version is available from github.com/hgb-bin-proteomics/MSAnnika_NC_ResultsΒ . This version contains a few bug fixes and was used for the MS Annika 2.0Β and MS Annika 3.0Β publications. Requires that alpha_protein, beta_proteins, alpha_proteins_crosslink_positions and beta_proteins_crosslink_positions fields are set for crosslinks and crosslink-spectrum-matches. You can read more about the to_impxfdr() function and all its parameters here: docs.
Specifying filename=None will only return the pandasΒ DataFrame and not write it to disk!
_df = exporter.to_impxfdr(csms, filename=None)Exporting works for both crosslinks and crosslink-spectrum-matches.