Skip to Content
DocumentationWorking with pyXLMSThe pyXLMS File Format

Description of the pyXLMS file format

Reading files with parser.read(*, engine="Custom") requires the following data format for crosslinks and crosslink-spectrum-matches. While the column names can be adjusted via the parameter column_mapping, the format of the columns needs to stay the same for successful parsing. Any column that is not required can be safely omitted. This format is also output by transform.to_dataframe().

For an extended description including additional examples please refer to hereΒ .

Data required for parsing crosslink-spectrum-matches:

Column NameRequiredData TypeExample 1Description
Alpha Peptideβœ…strPEPKTIDEUnmodified amino acid sequence of the alpha peptide in uppercase letters
Alpha Peptide Modifications❌str(4:[DSS|138.06808])Modifications of the alpha peptide, see ➑️ Modification Encoding
Alpha Peptide Crosslink Positionβœ…int4Position of the crosslinker in the alpha peptide (1-based)
Alpha Proteins❌strG3ECR1Accession of the associated protein(s) of the alpha peptide, if multiple proteins are given they should be delimited by a semicolon
Alpha Proteins Crosslink Positions❌int, str13Position of the crosslinker in the associated alpha protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Alpha Proteins Peptide Positions❌int, str10Position of the alpha peptide in the associated alpha protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Alpha Score❌float0.837Score of the alpha peptide
Alpha Decoy❌bool, strFalseWhether the alpha peptide is from the target (False) or decoy (True) database
Beta Peptideβœ…strPEPKTIDEUnmodified amino acid sequence of the beta peptide in uppercase letters
Beta Peptide Modifications❌str(4:[DSS|138.06808])Modifications of the beta peptide, see ➑️ Modification Encoding
Beta Peptide Crosslink Positionβœ…int4Position of the crosslinker in the beta peptide (1-based)
Beta Proteins❌strG3ECR1Accession of the associated protein(s) of the beta peptide, if multiple proteins are given they should be delimited by a semicolon
Beta Proteins Crosslink Positions❌int, str13Position of the crosslinker in the associated beta protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Beta Proteins Peptide Positions❌int, str10Position of the beta peptide in the associated beta protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Beta Score❌float0.837Score of the beta peptide
Beta Decoy❌bool, strFalseWhether the beta peptide is from the target (False) or decoy (True) database
CSM Score❌float0.99513Score of the crosslink-spectrum-match
Spectrum Fileβœ…str2025_03_17_EXP1_RUN3_R1.rawFile name of the spectrum file
Scan Nrβœ…int1703The scan number of the spectrum the match was identified in
Precursor Charge❌int3Precursor charge of the crosslink spectrum
Retention Time❌float530.17Retention time of the crosslink spectrum in seconds
Ion Mobility❌float170.41Ion mobility, CCS, or compensation voltage of the crosslink spectrum

Additional resources:

Modification Encoding

Modifications are encoded with the following values:

  • position: The 1-based position of the modification in the peptide sequence
    • should be parse-able as int data type
  • name: The name of the modification, for example Oxidation
    • should be parse-able as str data type
  • mass: The monoisotopic delta mass of the modification, for example 15.994915
    • should be parse-able as float data type

Any modification is then encoded as (position:[name|mass]), multiple modifications should be delimited by a semicolon ;. In the rare case that there is more than one modification on the same position, their names should be delimited by a comma ,. See examples below:

  • (4:[DSS|138.06808])
  • (1:[DSS|138.06808]);(5:[Oxidation|15.994915])
  • (5:[Substitution, Oxidation|13.541798])

Data required for parsing crosslinks:

Column NameRequiredData TypeExample 1Description
Alpha Peptideβœ…strPEPKTIDEUnmodified amino acid sequence of the alpha peptide in uppercase letters
Alpha Peptide Crosslink Positionβœ…int4Position of the crosslinker in the alpha peptide (1-based)
Alpha Proteins❌strG3ECR1Accession of the associated protein(s) of the alpha peptide, if multiple proteins are given they should be delimited by a semicolon
Alpha Proteins Crosslink Positions❌int, str13Position of the crosslinker in the associated alpha protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Alpha Decoy❌bool, strFalseWhether the alpha peptide is from the target (False) or decoy (True) database
Beta Peptideβœ…strPEPKTIDEUnmodified amino acid sequence of the beta peptide in uppercase letters
Beta Peptide Crosslink Positionβœ…int4Position of the crosslinker in the beta peptide (1-based)
Beta Proteins❌strG3ECR1Accession of the associated protein(s) of the beta peptide, if multiple proteins are given they should be delimited by a semicolon
Beta Proteins Crosslink Positions❌int, str13Position of the crosslinker in the associated beta protein(s), positions in multiple proteins should be delimited by a semicolon (1-based)
Beta Decoy❌bool, strFalseWhether the beta peptide is from the target (False) or decoy (True) database
Crosslink Score❌float0.99513Score of the crosslink

Additional resources:

Last updated on