| Title: | Internal Fragment Identification from Top-Down Mass Spectrometry |
|---|---|
| Description: | Top-Down mass spectrometry aims to identify entire proteins as well as their (post-translational) modifications or ions bound (eg Chen et al (2018) <doi:10.1021/acs.analchem.7b04747>). The pattern of internal fragments (Haverland et al (2017) <doi:10.1007/s13361-017-1635-x>) may reveal important information about the original structure of the proteins studied (Skinner et al (2018) <doi:10.1038/nchembio.2515> and Li et al (2018) <doi:10.1038/nchem.2908>). However, the number of possible internal fragments gets huge with longer proteins and subsequent identification of internal fragments remains challenging, in particular since the the accuracy of measurements with current mass spectrometers represents a limiting factor. This package attempts to deal with the complexity of internal fragments and allows identification of terminal and internal fragments from deconvoluted mass-spectrometry data. |
| Authors: | Wolfgang Raffelsberger [aut, cre] |
| Maintainer: | Wolfgang Raffelsberger <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.4 |
| Built: | 2026-05-22 09:35:32 UTC |
| Source: | https://github.com/cran/wrTopDownFrag |
Return a matrix with charge-catching amino-acids and their assumed strength. So far, the strength shown/used is set rather empirically.
.chargeCatchingAA( chargeMode = "pos", silent = FALSE, debug = FALSE, callFrom = NULL ).chargeCatchingAA( chargeMode = "pos", silent = FALSE, debug = FALSE, callFrom = NULL )
chargeMode |
(character) this value may be 'pos' (default) for the positively charged amino-acids K,R and H or, if this argument has any other value, than all charged amino-acids (K,R,H, S,T,N,Q, D,E, W and Y) will be considered. |
silent |
(logical) suppress messages |
debug |
(logical) additional messages and objects exportet to current session for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a matrix with charge-catching amino-acids and their assumed strength
.chargeCatchingAA().chargeCatchingAA()
Check matrix or data.frame for containing columns with specified name and optionally remove other columns (by their names)
.chColNa(x, colNa = "index", rmCol = NULL, callFrom = NULL).chColNa(x, colNa = "index", rmCol = NULL, callFrom = NULL)
x |
(matrix or data.frame) |
colNa |
(character) |
rmCol |
(character) |
callFrom |
(character) allow easier tracking of message(s) produced |
This function returns a matrix or data.frame with adjusted columns
ma1 <- matrix(1:6, nrow=2, dimnames=list(NULL, c("index","aa","bb"))) .chColNa(ma1) .chColNa(ma1, colNa="zz", rmCol="aa")ma1 <- matrix(1:6, nrow=2, dimnames=list(NULL, c("index","aa","bb"))) .chColNa(ma1) .chColNa(ma1, colNa="zz", rmCol="aa")
Check Modification Type
.checkModTy( modTy, knownMods, phoDePho = c("p", "q"), modTyGr = c("basMod", "varMod"), silent = FALSE, debug = FALSE, callFrom = NULL ).checkModTy( modTy, knownMods, phoDePho = c("p", "q"), modTyGr = c("basMod", "varMod"), silent = FALSE, debug = FALSE, callFrom = NULL )
modTy |
(character) list of modification types to be considered |
knownMods |
(character) optonal custom list of known modifications, default from |
phoDePho |
(character) names of modifications that may be de-phosphorylated |
modTyGr |
(character) groups of modifications to consider (defaults used both 'basMod' and 'varMod') |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
This function returns the corrected list of mixed of variable and fixed modifications ($basMod, $varMod and $varMo2)
modTy1 <- list(basMod=c("b","y","h"),varMod=c("p","o","q")) .checkModTy(modTy1, knownMods=c("a","b","h","o","p","q","y"))modTy1 <- list(basMod=c("b","y","h"),varMod=c("p","o","q")) .checkModTy(modTy1, knownMods=c("a","b","h","o","p","q","y"))
Count how many time a given letter occurs in each element of 'seq'
.countLET(sequ, countCh = "K", silent = FALSE, debug = FALSE, callFrom = NULL).countLET(sequ, countCh = "K", silent = FALSE, debug = FALSE, callFrom = NULL)
sequ |
(character) eg peptide sequence(s) |
countCh |
(charcter) |
silent |
(logical) suppress messages |
debug |
(logical) for bug-tracking: more/enhanced messages and intermediate objects written in global name-space |
callFrom |
(character) allow easier tracking of message(s) produced |
This function returns a numeric vector of counts for 'countCh' (single element !) in each element of 'seq'
protP2 <- c(mesp="MESPEPTIDES", pepe="PEPEPEP") .countLET(protP2,"P")protP2 <- c(mesp="MESPEPTIDES", pepe="PEPEPEP") .countLET(protP2,"P")
Count for all protein 'sequ' the occurance of modification types defined in list 'modTyp' (only if in names(specAAMod)).
.countModif( sequ, modTyp, specAAMod, knownMods, detailedCount = FALSE, silent = FALSE, debug = FALSE, callFrom = NULL ).countModif( sequ, modTyp, specAAMod, knownMods, detailedCount = FALSE, silent = FALSE, debug = FALSE, callFrom = NULL )
sequ |
(character) peptide sequence(s) |
modTyp |
(list) modifications : $basMod for character vector of fixed modifications and $varMod for variable modifications. For one letter-code see AAfragSettings("modChem") |
specAAMod |
(list) optional custom list showing which AA to be considered with which (one-letter) modification code (default |
knownMods |
(list) optional custom list showing which modification appears at what type of location, eg N-terminal, internal ... (default |
detailedCount |
(logical) |
silent |
(logical) suppress messages |
debug |
(logical) for bug-tracking: more/enhanced messages and intermediate objects written in global name-space |
callFrom |
(character) allow easier tracking of message(s) produced |
This function returns a list of matrixes $cou and $combTerm, with number of modifications per peptides (line in 'pepTab') for basMod, varMod & varMo2
protP2 <- c(mesp="MESPEPTIDES", pepe="PEPEPEP") pepTab1 <- makeFragments(protTab=protP2, minFra=6, internFr=TRUE, massTy="mono") modTy2 <- list(basMod=c("b","y","h"), varMod=c("x","p","o","q","e","j")) .countModif(pepTab1[,2], modTyp=c("b","y"), specAAMod=AAfragSettings(outTy="all")$specAAMod, knownMods=AAfragSettings(outTy="all")$knownMods)protP2 <- c(mesp="MESPEPTIDES", pepe="PEPEPEP") pepTab1 <- makeFragments(protTab=protP2, minFra=6, internFr=TRUE, massTy="mono") modTy2 <- list(basMod=c("b","y","h"), varMod=c("x","p","o","q","e","j")) .countModif(pepTab1[,2], modTyp=c("b","y"), specAAMod=AAfragSettings(outTy="all")$specAAMod, knownMods=AAfragSettings(outTy="all")$knownMods)
Make named character vector of sequential C-terminal fragments.
.CtermPepCut( pe, mi, se1 = ".", se2 = "-", mainName = NULL, indexOffs = NULL, silent = FALSE, debug = FALSE, callFrom = NULL ).CtermPepCut( pe, mi, se1 = ".", se2 = "-", mainName = NULL, indexOffs = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )
pe |
(character, length=1) sequence to be cut in sequential way |
mi |
(integer) min number of AA residues for considering peptide fragments; should be <= length(pe) (otherwise the full length of 'pe' ALWAYS returned !) |
se1 |
(character, length=1) separators for adding numbers to specify partial/fragment locations |
se2 |
(character, length=1) separators for adding numbers to specify partial/fragment locations |
mainName |
(character, length=1) |
indexOffs |
(logical) offset to add for custom numbering in names (numeric, length=1), ie '1' will already increase by +1 |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a numeric vector with mass(es) and sequence in name(s)
more flexible/sophisticated see .termPepCut; makeFragments; convAASeq2mass
## Ubiquitin example P0CG48 <- "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG" cut1 <- .CtermPepCut(P0CG48, mi=3, mainName="P0CG48") head(cut1); tail(cut1)## Ubiquitin example P0CG48 <- "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG" cut1 <- .CtermPepCut(P0CG48, mi=3, mainName="P0CG48") head(cut1); tail(cut1)
Evaluate selected lines of pepTab of SAME AA-length AND iso-mass for preferential cutting sites.
.evalIsoFra( x, prefFragPat = NULL, seqCol = "seq", silent = FALSE, debug = FALSE, callFrom = NULL ).evalIsoFra( x, prefFragPat = NULL, seqCol = "seq", silent = FALSE, debug = FALSE, callFrom = NULL )
x |
(matrix) main input, must contain cols specified as seqCol and "no","tailAA","precAA" |
prefFragPat |
(matrix) specifies preferential fragmentation (which combination of AA to consider cols cTer,nTer,score), default made by |
seqCol |
(character) column names for the column containing the sequence to search for preferential cutting sites |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns line ID-numbers (pepTab[,"no"]) for those below median score (ie to remove from pepTab)
peTab <- matrix(c("9","13","14","15", "LPVIAGHEAAG","PVIAGHEAAGI","EKKPFSI","KKPFSIE", "P","L","E","E", "I","V","E","E"),nr=4,dimnames=list(NULL,c("no","seq","precAA","tailAA"))) .evalIsoFra(peTab)peTab <- matrix(c("9","13","14","15", "LPVIAGHEAAG","PVIAGHEAAGI","EKKPFSI","KKPFSIE", "P","L","E","E", "I","V","E","E"),nr=4,dimnames=list(NULL,c("no","seq","precAA","tailAA"))) .evalIsoFra(peTab)
This function allows reorganiziong a list (of lists) of peotide fragments into matrix
.exNamesTyDeList( x, subLiNames = c("full", "Nter", "Cter", "inter"), inclNo = TRUE, fullSeq = NULL, outCol = c("seq", "orig", "origNa", "ty", "seqNa", "beg", "end", "precAA", "tailAA", "ambig", "mass"), silent = FALSE, callFrom = NULL, debug = FALSE ).exNamesTyDeList( x, subLiNames = c("full", "Nter", "Cter", "inter"), inclNo = TRUE, fullSeq = NULL, outCol = c("seq", "orig", "origNa", "ty", "seqNa", "beg", "end", "precAA", "tailAA", "ambig", "mass"), silent = FALSE, callFrom = NULL, debug = FALSE )
x |
(list) list of lists with charcter vectors of sequences with names that can be parsed eg 'x.1-7' to extract 'beg'&'end' otherwise ALL output will be NA (+message form extractLast2numericParts()) |
subLiNames |
(character) |
inclNo |
(logical) add 1st col with number |
fullSeq |
(character) to reinject full sequence which may not be used in names of 'x' and not be in x[[1]][["full"]] |
outCol |
(character) columns to create in output |
silent |
(logical) suppress messages |
callFrom |
(character) allow easier tracking of message(s) produced |
debug |
(logical) for bug-tracking: more/enhanced messages |
This function returns matrix with fragment sequence, mass, start- and end-position, heading and tailing AA (or NA if terminal fragment)
makeFragments; evalIsoFragm, from package wrProteo convAASeq2mass, AAmass, massDeFormula
prot1 <- c(protP="KEPTIDE", pro2="MPRATE") ## fragment all target proteins pep3 <- lapply(prot1, fragmentSeq, minSize=3, maxSize=5, internFragments=FALSE, separTerm=TRUE, keepRedSeqs=TRUE) pepTab <- .exNamesTyDeList(pep3, fullSeq=prot1)prot1 <- c(protP="KEPTIDE", pro2="MPRATE") ## fragment all target proteins pep3 <- lapply(prot1, fragmentSeq, minSize=3, maxSize=5, internFragments=FALSE, separTerm=TRUE, keepRedSeqs=TRUE) pepTab <- .exNamesTyDeList(pep3, fullSeq=prot1)
This function allows multiplying values of 'mat' by its colnames and (optionally) summing along rows.
.multMatByColNa( mat, sumByRow = TRUE, silent = FALSE, debug = FALSE, callFrom = NULL ).multMatByColNa( mat, sumByRow = TRUE, silent = FALSE, debug = FALSE, callFrom = NULL )
mat |
(matrix) main input |
sumByRow |
(logical) |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
This functions returns a numeric vector or a matrix if sumByRow=FALSE
mat1 <- 3 + matrix(1:4, ncol=2, dimnames=list(letters[1:2], c("3","2"))) .multMatByColNa(mat1) .multMatByColNa(mat1, sumByRow=FALSE)mat1 <- 3 + matrix(1:4, ncol=2, dimnames=list(letters[1:2], c("3","2"))) .multMatByColNa(mat1) .multMatByColNa(mat1, sumByRow=FALSE)
Make named character vector of sequential C-terminal fragments.
.NtermPepCut( pe, mi, se1 = ".", se2 = "-", mainName = NULL, sepNC = FALSE, indexOffs = NULL, silent = FALSE, debug = FALSE, callFrom = NULL ).NtermPepCut( pe, mi, se1 = ".", se2 = "-", mainName = NULL, sepNC = FALSE, indexOffs = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )
pe |
(character, length=1) sequence to be cut in sequential way |
mi |
(integer) min number of AA residues for considering peptide fragments; should be <= length(pe) (otherwise the full length of 'pe' ALWAYS returned !) |
se1 |
(character, length=1) separators for adding numbers to specify partial/fragment locations |
se2 |
(character, length=1) separators for adding numbers to specify partial/fragment locations |
mainName |
(character, length=1) |
sepNC |
(logical) if TRUE, separate fragments from both ends as $Nter & $Cter in list |
indexOffs |
(logical) offset to add for custom numbering in names (numeric, length=1), ie '1' will already increase by +1 |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a numeric vector with mass(es) and sequence in name(s)
more flexible/sophisticated see .termPepCut; makeFragments; convAASeq2mass
## Ubiquitin example P0CG48 <- "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG" cut1 <- .CtermPepCut(P0CG48, mi=3, mainName="P0CG48") head(cut1); tail(cut1) #' @export## Ubiquitin example P0CG48 <- "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG" cut1 <- .CtermPepCut(P0CG48, mi=3, mainName="P0CG48") head(cut1); tail(cut1) #' @export
This function combines all variants and sums them
.parCombinateAllAndSum( uniqCo, massModV, nProc = NULL, firstOfRepeated = NULL, parRegDefault = TRUE, silent = FALSE, debug = FALSE, callFrom = NULL ).parCombinateAllAndSum( uniqCo, massModV, nProc = NULL, firstOfRepeated = NULL, parRegDefault = TRUE, silent = FALSE, debug = FALSE, callFrom = NULL )
uniqCo |
(matrix) number of modifications to be considered for each peptide |
massModV |
(named numeric) mass modification values (names must match colnames of |
nProc |
(integer) number of processors to be used |
firstOfRepeated |
(character) |
parRegDefault |
(logical) - argument currently not in use |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
This function requires the packages 'parallel' and 'BiocParallel' (from Bioconductor) Note : The function may work only on some Windows systems or may give warnings on Windows
This functions returns a list with single and combined mass-modifications (PTMs) for each peptide
uniqCo <- matrix(c(1,1,1,0,1,1), nrow=2, dimnames=list(c("PTI","KPE"),c("d","p","h")) ) massModV <- c(d=-18.01056, p=79.96633, h=-18.01056) chPa <- c(requireNamespace("parallel", quietly=TRUE), requireNamespace("BiocParallel", quietly=TRUE), "windows" %in% .Platform$OS.type) ## Note : the function may work only on some windows systems if(all(chPa)) if(parallel::detectCores() >1) { .parCombinateAllAndSum(uniqCo, massModV, nProc=2)}uniqCo <- matrix(c(1,1,1,0,1,1), nrow=2, dimnames=list(c("PTI","KPE"),c("d","p","h")) ) massModV <- c(d=-18.01056, p=79.96633, h=-18.01056) chPa <- c(requireNamespace("parallel", quietly=TRUE), requireNamespace("BiocParallel", quietly=TRUE), "windows" %in% .Platform$OS.type) ## Note : the function may work only on some windows systems if(all(chPa)) if(parallel::detectCores() >1) { .parCombinateAllAndSum(uniqCo, massModV, nProc=2)}
Return data.frame with pattern of perferential fragmentation sites Here a simplified version (elaborate see Kelleher group: Haverland 2017, J Am Soc Mass Spectrom)
.prefFragPattern(silent = FALSE, debug = FALSE, callFrom = NULL).prefFragPattern(silent = FALSE, debug = FALSE, callFrom = NULL)
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a data.frame with pattern of perferential fragmentation sites
peTab <- matrix(c("9","13","14","15", "LPVIAGHEAAG","PVIAGHEAAGI","EKKPFSI","KKPFSIE", "P","L","E","E", "I","V","E","E"),nr=4,dimnames=list(NULL,c("no","seq","precAA","tailAA"))) head(.prefFragPattern())peTab <- matrix(c("9","13","14","15", "LPVIAGHEAAG","PVIAGHEAAGI","EKKPFSI","KKPFSIE", "P","L","E","E", "I","V","E","E"),nr=4,dimnames=list(NULL,c("no","seq","precAA","tailAA"))) head(.prefFragPattern())
Add single specific modification to peptide/protein fragments .
.singleSpecModif( pepTab, specModif, nMaxMod = 1, massTy = "mono", callFrom = NULL, silent = FALSE, debug = FALSE ).singleSpecModif( pepTab, specModif, nMaxMod = 1, massTy = "mono", callFrom = NULL, silent = FALSE, debug = FALSE )
pepTab |
(matrix) matrix of fragments (cols 'no','seq','orig','ty','seqNa','beg','end','precAA','tailAA','ambig','mass') |
specModif |
(list) with elements 'modOrigin' (sequence), 'modPos' (position within sequence), 'modMass' (digits, ie mass to add), 'modName' (name of modif), 'modFixed' (fixed or , logical) |
nMaxMod |
(numeric) max number a given modification may occur |
massTy |
(character) 'mono' or 'average' |
callFrom |
(character) allow easier tracking of message(s) produced |
silent |
(logical) suppress messages |
debug |
(logical) additional messages and objects exportet to current session for debugging |
This function returns a list with $massMatch (list of exerimental peptides matching to one or more predicted), $preMa (predicted ions, including fixed modif), $pepTab (predicted neutral peptides, wo modifications), $expMa (experimental mass from input), $recalibFact (recalibration factor as from input), $docTi (time for calculations)
makeFragments, identifVarModif, identifyPepFragments
pep1 <- c(pe1="KPEPTI") # The table of possible terminal fragments (for simplicity terminal only) pepTab1 <- makeFragments(pep1, min=3, max=7, internFra=FALSE) specModif1 <- list(modOrigin=pep1, modPos=1, modMass=579.9663, modName="p", modFixed=FALSE) .singleSpecModif(pepTab1, specModif1 ) protP <- c(protP="PEPTIDEKR") pep1 <- c("PTI","KPE","EPTI") papTab1 <- cbind(no=c(7,2,6),seq=pep1, orig=rep("KPEPTI",3), origNa=rep("pe1",3), ty=paste0(c("C","N","C"),"ter"), beg=c(4,1,3), end=c(6,3,4), mass= wrProteo::convAASeq2mass(pep1, massTy="mono"), modSpec="")pep1 <- c(pe1="KPEPTI") # The table of possible terminal fragments (for simplicity terminal only) pepTab1 <- makeFragments(pep1, min=3, max=7, internFra=FALSE) specModif1 <- list(modOrigin=pep1, modPos=1, modMass=579.9663, modName="p", modFixed=FALSE) .singleSpecModif(pepTab1, specModif1 ) protP <- c(protP="PEPTIDEKR") pep1 <- c("PTI","KPE","EPTI") papTab1 <- cbind(no=c(7,2,6),seq=pep1, orig=rep("KPEPTI",3), origNa=rep("pe1",3), ty=paste0(c("C","N","C"),"ter"), beg=c(4,1,3), end=c(6,3,4), mass= wrProteo::convAASeq2mass(pep1, massTy="mono"), modSpec="")
Make named character vector of sequential terminal fragments
.termPepCut( pe, mi, ma = 1000, se1 = ".", se2 = "-", mainName = NULL, sepNC = FALSE, indexOffs = NULL, silent = FALSE, debug = FALSE, callFrom = NULL ).termPepCut( pe, mi, ma = 1000, se1 = ".", se2 = "-", mainName = NULL, sepNC = FALSE, indexOffs = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )
pe |
(character, length=1) s |
mi |
(integer) min number of AA residues for considering peptide fragments; should be <= length(pe) (otherwise the full length of 'pe' ALWAYS returned !) |
ma |
(integer) max number of AA residues for considering peptide fragments |
se1 |
(character, length=1) separators for adding numbers to specify partial/fragment locations |
se2 |
(character, length=1) separators for adding numbers to specify partial/fragment locations |
mainName |
(character, length=1) |
sepNC |
(logical) if 'TRUE', separate N-terminal, C-terminal and internal fragments in list |
indexOffs |
(logical) offset to add for custom numbering in names (numeric, length=1), ie '1' will already increase by +1 |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a numeric vector with mass(es) and sequence in name(s)
## Ubiquitin example P0CG48 <- "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG" .termPepCut(P0CG48, mi=3, ma=12 ,sepNC=TRUE, mainName="P0CG48")## Ubiquitin example P0CG48 <- "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG" .termPepCut(P0CG48, mi=3, ma=12 ,sepNC=TRUE, mainName="P0CG48")
This function provides basic settings for what types of fragments may accomodate which type of modifications : $knownMods: information about which modifications may be considered, $specAAMod: specifc AA sites (if applicable), $specAAMod: specifc AA sites (if applicable). For example, here 'p' codes for gain of mass for HPO3 only at S, T and Y residues. Note: $knownMods$Nterm and $knownMods$Cterm are treated as mutually exclusive
AAfragSettings(outTy = "all", silent = FALSE, debug = FALSE, callFrom = NULL)AAfragSettings(outTy = "all", silent = FALSE, debug = FALSE, callFrom = NULL)
outTy |
(character) default "all" or any of the list-elements |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
This function returns a list ($knownMods, $knspecAAMods, $modChem, $neutralLossOrGain)
makeFragments, fragmentSeq, massDeFormula
AAfragSettings()AAfragSettings()
Adjust/add mass for modifications from 'modTy' to all peptides in 'pepTab' based on count 'cou' of occurances of modifications :
Either fixed or variable modifications will be added to the mass of initial peptides from argument papTab.
addMassModif( cou, pepTab, combTerm, modTy, lastIndex = NULL, modChem = NULL, basVarMod = "basMod", massTy = "mono", knownMods = NULL, nProc = 1, parallDefault = TRUE, silent = FALSE, debug = FALSE, callFrom = NULL )addMassModif( cou, pepTab, combTerm, modTy, lastIndex = NULL, modChem = NULL, basVarMod = "basMod", massTy = "mono", knownMods = NULL, nProc = 1, parallDefault = TRUE, silent = FALSE, debug = FALSE, callFrom = NULL )
cou |
(list) list of matrixes with counts for number of modifications per peptide |
pepTab |
(matrix) table with peptide properties |
combTerm |
(matrix) table with separate rows for $basMod that are exclusive (ie can't be accumulated, eg x & y ions) |
modTy |
(character) list of modification types to be considered |
lastIndex |
(integer) index-1 (ie last index from prev matrix) from which new peptide-variants should start from |
modChem |
(character) optional modifications |
basVarMod |
(character) toggle if fixed ('basMod') or variable ('varMod') modificatons should be calculated |
massTy |
(character) 'mono' or 'average' |
knownMods |
(list) optional custom definition whoch modification is N-term, etc (see |
nProc |
(integer) number of processors in case of multi-processor use (requires Bioconductor package |
parallDefault |
(logical) for use of other/previously set |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
Terminal ionization (like 'b' or 'y' -fragments) is treated as fixed modification and the resulting masses will correspond to standard mono-protonated ions. Since variable and fixed modification types can't be run in a single instance, the function has to get calles twice, it is recommended to always start with the fixed modfications, In the case of fixed modifications (like defining 'b' or 'y' fragments) neutral peptide masses should be given to add the corresponding mass-shift (and to obtain mono-protonated ions). In case of variable modifications (like 'd' or 'p'), the corresponding ions from the fixed modifications should get furnished to add the corresponding mass-shift, the masses resulting from the initial fixed modifications run can be used. Note, that transforming a neutral precursor M into MH+ is also considered a modification. The results are also correct with obligatory fragments that can't occur the same time (eg x & y ions can't be same time, need to make add'l lines...). This function has a multiprocessor mode, with small data-sets (like the toy example below) there is typcally no gain in performance.
This functions returns a list containing $pepTab (table of peptide as single charge positive ions), $abc ('representative' list of all combinations to add). Main result in $pepTab
pep1 <- c(pe1="KPEPTI") # The table of possible terminal fragments (for simplicity terminal only) pepTab1 <- makeFragments(pep1, min=3, max=7, internFra=FALSE) # Which fragment may be subject to how many modification (including ionization by H+) cou1 <- countPotModifAAs(pepTab=pepTab1, modTy=list(basMod=c("b","y"))) # Add modifications (here: ionize all pepptides by H+) preMa1 <- addMassModif(cou=cou1$cou, pepTab=pepTab1, combTerm=cou1$combTerm, modTy=list(basMod=c("b","y")), basVarMod="basMod") preMa1 ## Example including variable modifications modT3 <- list(basMod=c("b","y"),varMod=c("p","h","d")) cou3 <- countPotModifAAs(pepTab=pepTab1, modTy=modT3) ## Now we re-use/inject the results for the fixed modificatons preMa3 <- addMassModif(cou=cou3$cou, pepTab=preMa1$pepTab, combTerm=cou1$combTerm, modTy=modT3, basVarMod="varMod") head(preMa3$pepTab,12)pep1 <- c(pe1="KPEPTI") # The table of possible terminal fragments (for simplicity terminal only) pepTab1 <- makeFragments(pep1, min=3, max=7, internFra=FALSE) # Which fragment may be subject to how many modification (including ionization by H+) cou1 <- countPotModifAAs(pepTab=pepTab1, modTy=list(basMod=c("b","y"))) # Add modifications (here: ionize all pepptides by H+) preMa1 <- addMassModif(cou=cou1$cou, pepTab=pepTab1, combTerm=cou1$combTerm, modTy=list(basMod=c("b","y")), basVarMod="basMod") preMa1 ## Example including variable modifications modT3 <- list(basMod=c("b","y"),varMod=c("p","h","d")) cou3 <- countPotModifAAs(pepTab=pepTab1, modTy=modT3) ## Now we re-use/inject the results for the fixed modificatons preMa3 <- addMassModif(cou=cou3$cou, pepTab=preMa1$pepTab, combTerm=cou1$combTerm, modTy=modT3, basVarMod="varMod") head(preMa3$pepTab,12)
Check & complete settings for mixed of variable and fixed modifications. The final format is a list with $basMod, $varMod and $varMo2
checkModTy( modTy, knownMods = NULL, silent = TRUE, debug = FALSE, callFrom = NULL )checkModTy( modTy, knownMods = NULL, silent = TRUE, debug = FALSE, callFrom = NULL )
modTy |
(character) list of modification types to be considered |
knownMods |
(character) optonal custom list of known modifications, default from |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
This function returns the corrected list of mixed of variable and fixed modifications ($basMod, $varMod and $varMo2)
modTy1 <- list(basMod=c("b","y","h"),varMod=c("p","o","q")) checkModTy(modTy1)modTy1 <- list(basMod=c("b","y","h"),varMod=c("p","o","q")) checkModTy(modTy1)
Use this function for preparing all combinations of non-compulsatory, ie variable, mass modifications. Variable modifications may or may not be present. Thus, for a given amino-acid with a variable modification two versions of the molecular weight need to be considered.
combinateAllAndSum( nMax, modVal, notSingle = NULL, silent = TRUE, debug = FALSE, callFrom = NULL )combinateAllAndSum( nMax, modVal, notSingle = NULL, silent = TRUE, debug = FALSE, callFrom = NULL )
nMax |
(integer or data.frame with 1 line) maximum number of modifications |
modVal |
(numeric, has to have names !) the change of molecular mass introduced by given modifications (as specified by the name of the value) |
notSingle |
(character) names of 'modVal' where 1st element of 'notSingle' cannot happen/appear if 2nd element not present (eg de-phospho/phosphorylation) |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
Most (variable) modifications are linked to a type of amino acid, like serine- or thyrosine residues for phosphorlylation.
Thus in this case, each instance of the amino acids S or T may or may not be modified.
So, for example if there are 2 serines on a given peptide/protein, 0, 1 or 2 phosphorylation modifications may be present.
For this reason there is an argument called nMax to allow staying within biologically relevant ranges (external knowledge) and allowing to reduce complexity significantly.
In the case of phosporylations, the total number of actually phosphoylated amino-acids is typically way below the number of S and T residues in pthe initial sequence.
Some modifications are exclusive to others, argument notSingle : An (artificially occuring) de-phosphorylation event during fragmentation can only happen if the amino acid was already phosphorylated in the first place.
This functions returns a named (concatenated names of modVal) numeric vector
uniqCo <- matrix(c(1,1,1,0,1,1), nrow=2, dimnames=list(c("PTI","KPE"),c("d","p","h")) ) massModV <- c(d= -18.01056, p= 79.96633, h= -18.01056) ## for 1st peptide combinateAllAndSum(uniqCo[1,], massModV, notSingle=c("q","p")) ## for all peptides apply(uniqCo, 1, combinateAllAndSum, massModV, notSingle=c("q","p"))uniqCo <- matrix(c(1,1,1,0,1,1), nrow=2, dimnames=list(c("PTI","KPE"),c("d","p","h")) ) massModV <- c(d= -18.01056, p= 79.96633, h= -18.01056) ## for 1st peptide combinateAllAndSum(uniqCo[1,], massModV, notSingle=c("q","p")) ## for all peptides apply(uniqCo, 1, combinateAllAndSum, massModV, notSingle=c("q","p"))
Provide all combinations for each of n elements of vector 'nMax' (positive integer, eg number of max multiplicative value). Results allow to see possible total compositons and must be read vertically.
combinatIntTable( nMax, include0 = TRUE, asList = FALSE, callFrom = NULL, debug = FALSE, silent = TRUE )combinatIntTable( nMax, include0 = TRUE, asList = FALSE, callFrom = NULL, debug = FALSE, silent = TRUE )
nMax |
(positiveinteger) atomic compostion; could be max number of voting participants form different cities, eg Paris max 2 persons, Lyon max 1 person ... |
include0 |
(logical) include 0 occurances, ie provide al combinations starting from 0 or from 1 up to nMax |
asList |
(logical) return result a |
callFrom |
(character) allows easier tracking of messages produced |
debug |
(logical) additional messages for debugging |
silent |
(logical) suppress messages |
This functions returns a list or array (as 2- or 3 dim) with possible number of occurances for each of the 3 elements in nMax. Read results vertical : out[[1]] or out[,,1] .. (multiplicative) table for 1st element of nMax; out[,,2] .. for 2nd
combinatIntTable(c(1,1,1,2), include0=TRUE, asList=FALSE, silent=TRUE) nMa <- c(Paris=2,Lyon=1,Strasbourg=1) combinatIntTable(nMa, include0=TRUE, asList=TRUE, silent=TRUE)combinatIntTable(c(1,1,1,2), include0=TRUE, asList=FALSE, silent=TRUE) nMa <- c(Paris=2,Lyon=1,Strasbourg=1) combinatIntTable(nMa, include0=TRUE, asList=TRUE, silent=TRUE)
The protein sequence and composition of most proteomes cannot be well mimicked by pure random sequences. Thus, random sequences (for comparing the quality of fitting) shhould be corrected accordingly, this vector provides help to do so. This function loads corMutShift- or corInDelShift-values from RData. The vector contains possible mass alterations for random drawing either by mutating a given aminoacid (corMutShift), or by making or in/del changes (corInDelShift). These values are based on simulations in the human proteome (from UniProt).
corInDelShift(fi = NULL)corInDelShift(fi = NULL)
fi |
(character) file (and path) to RData to read as corMutShift or corInDelShift. The file when opend must contain a numeric vector either called 'corMutShift' or 'corInDelShift'. |
This functions returns a numeric vector (1907 possible mass alterations for random drawing)
corMutShift, randMassByStochastic
corMutShift <- corMutShift() str(corMutShift) corInDelShift <- corInDelShift() str(corInDelShift)corMutShift <- corMutShift() str(corMutShift) corInDelShift <- corInDelShift() str(corInDelShift)
The protein sequence and composition of most proteomes cannot be well mimicked by pure random sequences. Thus, random sequences (for comparing the quality of fitting) shhould be corrected accordingly, this vector provides help to do so. This function loads corMutShift- or corInDelShift-values from RData. The vector contains possible mass alterations for random drawing either by mutating a given aminoacid (corMutShift), or by making or in/del changes (corInDelShift). These values are based on simulations in the human proteome (from UniProt).
corMutShift(fi = NULL)corMutShift(fi = NULL)
fi |
(character) file (and path) to RData to read as corMutShift or corInDelShift. The file when opend must contain a numeric vector either called 'corMutShift' or 'corInDelShift'. |
This functions returns a numeric vector (possible mass alterations for random drawing)
corInDelShift, randMassByStochastic
corMutShift <- corMutShift() str(corMutShift) corInDelShift <- corInDelShift() str(corInDelShift)corMutShift <- corMutShift() str(corMutShift) corInDelShift <- corInDelShift() str(corInDelShift)
This functions helps identifying fragments ('parent') characterized by a start- and end-position, that got split into 2 'children' fragments.
So, each one of the new 'children' conserves either the start- or end-site of the parent and the the remaining ends are on consecutive positions.
For example if the sequence 'BCDEFG' (parent) gets split into 'BCD' (positions 1-3) and 'EFG' (positions 4-6),
this will be identified as a children/parent 'family' which could be represented as 'a+b=c' case.
Note : At this point only settings with 2 children are considered, for more complex scenarions one may build trees using buildTree (however, this function does not identify 'parents').
In proteomics-applications some start- and end-sites may occur multiple times, representing eg unmodified and modified versions of the same basal peptide-sequence.
Such duplicated start- and end-cases are handeled as allowed, a 'child' (characterized by its start- and end-position) may occur multiple times, and the
corresponding redundant rownames (eg peptide sequence like 'BCD') will be conserved. However, information reflecting eg different peptide modifications must be stored separately.
If redudant start- and end-sites accur with different row-names, repeated start- and end-sites will display NA.
countChildrenParent( fragments, output = "count", silent = FALSE, debug = FALSE, callFrom = NULL )countChildrenParent( fragments, output = "count", silent = FALSE, debug = FALSE, callFrom = NULL )
fragments |
(matrix or data.frame) integer values in 1st column, for start site of fragment, and in 2nd column as end-sites of fragments, rownames as IDs |
output |
(character) choose simply returning results as counts or as list with |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
This functions returns either numeric vector with cumulated counts (corresponding to rows of fragments) or list with $count and $detailIndex (list with indexes refering to non-redundant entries of all a+b=c settings identified)
simpleFragFig ; for building longer consecutive trees (without identification of 'parent') buildTree
frag3 <- cbind(beg=c(4,2,3,7,13,13,15, 2,9,2,9), end=c(14,6,12,8,18,20,20, 8,12,12,18)) rownames(frag3) <- c("K","A","E","B","C","D","F", "H","G","I","J") countChildrenParent(frag3) ## example with duplicate start- and end-position positions frag3c <- cbind(beg=c(4,2,3,7, 7,13, 13,13,15, 2,9,2,9,9), end=c(14,6,12,8, 8,18, 18,20,20, 8,12,12,12,18)) rownames(frag3c) <- c("K","A","E", "B","B", "C","C","D","F", "H","G","I","G","J") countChildrenParent(frag3c, out="det")frag3 <- cbind(beg=c(4,2,3,7,13,13,15, 2,9,2,9), end=c(14,6,12,8,18,20,20, 8,12,12,18)) rownames(frag3) <- c("K","A","E","B","C","D","F", "H","G","I","J") countChildrenParent(frag3) ## example with duplicate start- and end-position positions frag3c <- cbind(beg=c(4,2,3,7, 7,13, 13,13,15, 2,9,2,9,9), end=c(14,6,12,8, 8,18, 18,20,20, 8,12,12,12,18)) rownames(frag3c) <- c("K","A","E", "B","B", "C","C","D","F", "H","G","I","G","J") countChildrenParent(frag3c, out="det")
Makes table 'cou' with counts of (potential) modification sites based on column 'seq' in matrix 'pepTab'. Note: if multiple N-or C-term modifs, then only the first is shown in resulting table 'cou'.
countPotModifAAs( pepTab, modTy, maxMod = c(p = 3, h = 1, k = 1, o = 1, m = 1, n = 1, u = 1, r = 1, s = 1), specAAMod = NULL, knownMods = NULL, silent = FALSE, callFrom = NULL, debug = FALSE )countPotModifAAs( pepTab, modTy, maxMod = c(p = 3, h = 1, k = 1, o = 1, m = 1, n = 1, u = 1, r = 1, s = 1), specAAMod = NULL, knownMods = NULL, silent = FALSE, callFrom = NULL, debug = FALSE )
pepTab |
(matrix) peptide sequences, start and end sites, typically result from |
modTy |
(list) modifications : $basMod for character vector of fixed modifications and $varMod for variable modifications. For one letter-code see AAfragSettings("modChem") |
maxMod |
(integer) maximal number variable modifications will be considered in given fragment (may increase complexity and RAM consumption) |
specAAMod |
(list) optional custom list showing which AA to be considered with which (one-letter) modification code (default |
knownMods |
(list) optional custom list showing which modification appears at what type of location, eg N-terminal, internal ... (default |
silent |
(logical) suppress messages |
callFrom |
(character) allow easier tracking of message(s) produced |
debug |
(logical) for bug-tracking: more/enhanced messages and intermediate objects written in global name-space |
list of matrixes $cou and $combTerm, with number of modifications per peptides (line in 'pepTab') for basMod, varMod & varMo2
protP2 <- c(mesp="MESPEPTIDES", pepe="PEPEPEP") pepTab1 <- makeFragments(protTab=protP2, minFra=6, internFr=TRUE, massTy="mono") cou1 <- countPotModifAAs(pepTab=pepTab1, modTy=list(basMod=c("b","y"), varMod=c("p","h"))) modTy2 <- list(basMod=c("b","y","h"), varMod=c("x","p","o","q","e","j")) cou2 <- countPotModifAAs(pepTab=pepTab1, modTy=modTy2)protP2 <- c(mesp="MESPEPTIDES", pepe="PEPEPEP") pepTab1 <- makeFragments(protTab=protP2, minFra=6, internFr=TRUE, massTy="mono") cou1 <- countPotModifAAs(pepTab=pepTab1, modTy=list(basMod=c("b","y"), varMod=c("p","h"))) modTy2 <- list(basMod=c("b","y","h"), varMod=c("x","p","o","q","e","j")) cou2 <- countPotModifAAs(pepTab=pepTab1, modTy=modTy2)
Evaluate selected lines of pepTab (iso-mass) for preferential cutting sites. Such sites are taken by default from .prefFragPattern() simplified from a publication
by the Kelleher group (Haverland 2017, J Am Soc Mass Spectrom) or can be furnished by the user.
evalIsoFragm( z, prefFragPat = NULL, seqCol = "seq", silent = FALSE, debug = FALSE, callFrom = NULL )evalIsoFragm( z, prefFragPat = NULL, seqCol = "seq", silent = FALSE, debug = FALSE, callFrom = NULL )
z |
(matrix) main input, must contain cols specified as seqCol and "no","tailAA","precAA" |
prefFragPat |
(matrix) specifies preferential fragmentation (which combination of AA to consider cols cTer,nTer,score), default made by |
seqCol |
(character) column names for the column containing the sequence to search for preferential cutting sites |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns line ID-numbers (pepTab[,"no"]) for those below median score (ie to remove from pepTab) or NULL if nothing to remove due to preferential fragmentation
peTab <- matrix(c("9","13","14","15", "LPVIAGHEAAG","PVIAGHEAAGI","EKKPFSI","KKPFSIE", "P","L","E","E", "I","V","E","E"),nr=4,dimnames=list(NULL,c("no","seq","precAA","tailAA"))) evalIsoFragm(peTab)peTab <- matrix(c("9","13","14","15", "LPVIAGHEAAG","PVIAGHEAAGI","EKKPFSI","KKPFSIE", "P","L","E","E", "I","V","E","E"),nr=4,dimnames=list(NULL,c("no","seq","precAA","tailAA"))) evalIsoFragm(peTab)
Makes internal/terminal fragments of a SINGLE peptide/protein input (as single letter amino-acid code) and returns list of all possible sequences ($full, $Nter, $Cter, $inter).
fragmentSeq( sequ, minSize = 3, maxSize = 300, internFragments = TRUE, separTerm = FALSE, keepRedSeqs = TRUE, prefName = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )fragmentSeq( sequ, minSize = 3, maxSize = 300, internFragments = TRUE, separTerm = FALSE, keepRedSeqs = TRUE, prefName = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )
sequ |
(character, length=1) sequence used for fragmenting, as as mono-aminoacid letter code (so that cuting will be perfomed between all the letters/characters) |
minSize |
(integer) min number of AA residues for considering peptide fragments |
maxSize |
(integer) max number of AA residues for considering peptide fragments |
internFragments |
(logical) logical (return only terminal fragments if 'FALSE') |
separTerm |
(logical) if 'TRUE', separate N-terminal, C-terminal and internal fragments in list |
keepRedSeqs |
(logical) if 'FALSE' remove fragments with redundant content (but my be from different origin in 'sequ'); remove redundant so far only when no separation of Nterm/Cterm/intern as list |
prefName |
(logical) alternative name for all fragments (default the sequence itself), avoid separators '.' and '-' |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
Note: Thus function can handle only 1 sequence at each run ! Note: The mass values returned correspond to neutral peptides. If you are looking for ions, you need to adjust masses to the repsective ion-charcteriostics (adding H+, etc).
This function returns a numeric vector with the (neutral) mass of fragmented peptides
fragmentSeq("ABCDE") fragmentSeq("ABCDE", minSize=3, internFragments=FALSE) fragmentSeq("ABCDE", minSize=3, internFragments=TRUE) ## Run multiple peptides/proteins twoPep <- cbind(c("a","ABCABCA"), c("e","EFGEFGEF")) apply(twoPep, 2, function(x) fragmentSeq(x[2], mi=3, kee=FALSE, sep=TRUE, pre=x[1])) ## Ubiquitin example P0CG48 <- "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG" system.time( fra1 <- (fragmentSeq(P0CG48, mi=5, kee=FALSE))) # < 0.5 secfragmentSeq("ABCDE") fragmentSeq("ABCDE", minSize=3, internFragments=FALSE) fragmentSeq("ABCDE", minSize=3, internFragments=TRUE) ## Run multiple peptides/proteins twoPep <- cbind(c("a","ABCABCA"), c("e","EFGEFGEF")) apply(twoPep, 2, function(x) fragmentSeq(x[2], mi=3, kee=FALSE, sep=TRUE, pre=x[1])) ## Ubiquitin example P0CG48 <- "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG" system.time( fra1 <- (fragmentSeq(P0CG48, mi=5, kee=FALSE))) # < 0.5 sec
Identify peptide/protein fragments based on experimental m/z values 'expMass' for given range of aa-length.
Internally all possible fragments will be predicted and their mass compared to the experimental values (argument expMass).
identifFixedModif( prot, expMass, minFragSize = 5, maxFragSize = 60, indexStart = 1, suplPepTab = NULL, internFra = TRUE, chargeCatchFilter = TRUE, maxMod = c(p = 3, h = 1, k = 1, o = 1, m = 1, n = 1, u = 1, r = 1, s = 1), modTy = NULL, specModif = NULL, knownMods = NULL, identMeas = "ppm", limitIdent = 5, filtAmbiguous = FALSE, recalibrate = FALSE, massTy = "mono", prefFragPat = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )identifFixedModif( prot, expMass, minFragSize = 5, maxFragSize = 60, indexStart = 1, suplPepTab = NULL, internFra = TRUE, chargeCatchFilter = TRUE, maxMod = c(p = 3, h = 1, k = 1, o = 1, m = 1, n = 1, u = 1, r = 1, s = 1), modTy = NULL, specModif = NULL, knownMods = NULL, identMeas = "ppm", limitIdent = 5, filtAmbiguous = FALSE, recalibrate = FALSE, massTy = "mono", prefFragPat = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )
prot |
(character) amino-acid sequene of peptide or protein |
expMass |
(numeric) experimental masses to identify peptides from |
minFragSize |
(integer) min number of AA residues for considering peptide fragments |
maxFragSize |
(integer) max number of AA residues for considering peptide fragments |
indexStart |
(integer) for starting at correct index (if not 1) |
suplPepTab |
(matrix) additional peptides to be add to theoretical peptides |
internFra |
(logical) decide whether internal fragments should be consiered |
chargeCatchFilter |
(logical) by default remove all peptides not containing charge-catching (polar) AAs (K, R, H, defined via |
maxMod |
(integer) maximum number of residue modifications to be consiered in fragments (values >1 will increase complexity and RAM consumption) |
modTy |
(character) type of fixed and variable modifications |
specModif |
(list) supplemental custom fixed or variable modifications (eg Zn++ at given residue) |
knownMods |
(character) optional custom alternative to |
identMeas |
(character) default 'ppm' |
limitIdent |
(character) thershold for identification in 'identMeas' units |
filtAmbiguous |
(logical) allows filtering/removing ambiguous results (ie same mass peptides) |
recalibrate |
(logical or numeric) may be direct recalibration-factor (numeric,length=1), if 'TRUE' fresh determination of 'recalibFact' or 'FALSE' (no action); final recalibration-factor used exported in result as $recalibFact |
massTy |
(character) 'mono' or 'average' |
prefFragPat |
(numeric) pattern for preferential fragmentation (see also Haverland 2017), if |
silent |
(logical) suppress messages |
debug |
(logical) additional messages and objects exportet to current session for debugging |
callFrom |
(character) allow easier tracking of message(s) produced |
The main matching results are in output$massMatch : This list has one entry for each predicted mass where some matches were found.
Thus, the names of the list-elements design the index from argument expMass.
Each list-element contains a numeric vector giving the difference observed to predicted, the names design the unique predicted peptide index/number from output$preMa[,"no"]
The main element of the output is the $massMatch -list, which is in the format of findCloseMatch.
Thus, the list-elements names represent the line-number of mass-predictions and the values the delta-mass and their names the position of the initial query.
This function returns a list with $massMatch (list of exerimental peptides matching to one or more predicted), $preMa (predicted ions, including fixed modif), $pepTab (predicted neutral peptides, wo modifications), $expMa (experimental mass from input), $recalibFact (recalibration factor as from input), $docTi (time for calculations)
makeFragments, identifVarModif, identifyPepFragments, findCloseMatch
pro3 <- "HLVDEPQNLIK" exp3 <- c( b4=465.2451, b5=594.2877, b6=691.3404, y7=841.4772, y6=712.4347, y5=615.3819) ident3 <- identifFixedModif(prot=pro3, expMass=exp3, minFragSize=4, maxFragSize=60, modTy=list(basMod=c("b","y"))) ident3$massMatch ## as human readable table: ident3$preMa[ ident3$preMa[,"no"] %in% (names(ident3$massMatch)),]pro3 <- "HLVDEPQNLIK" exp3 <- c( b4=465.2451, b5=594.2877, b6=691.3404, y7=841.4772, y6=712.4347, y5=615.3819) ident3 <- identifFixedModif(prot=pro3, expMass=exp3, minFragSize=4, maxFragSize=60, modTy=list(basMod=c("b","y"))) ident3$massMatch ## as human readable table: ident3$preMa[ ident3$preMa[,"no"] %in% (names(ident3$massMatch)),]
Take result from identifFixedModif and search for variable modifications (only on identified fixed modif), ie 2nd step for identif of var modifs.
To reduce the complexity of the search space, only peptide fragments identified with fixed identifiactions will be considered for possible variable modifications.
identifVarModif( zz, modTy, expMa, maxMod, identMeas = "ppm", knownMods = NULL, limitIdent = 5, filtAmbiguous = FALSE, indexStart = 1, recalibFact = NULL, suplPepTab = NULL, massTy = "mono", silent = FALSE, callFrom = NULL, debug = TRUE )identifVarModif( zz, modTy, expMa, maxMod, identMeas = "ppm", knownMods = NULL, limitIdent = 5, filtAmbiguous = FALSE, indexStart = 1, recalibFact = NULL, suplPepTab = NULL, massTy = "mono", silent = FALSE, callFrom = NULL, debug = TRUE )
zz |
(list) min input, result from |
modTy |
(character) type of fixed and variable modifications |
expMa |
(matrix) experimental m/z values |
maxMod |
(integer) maximum number of residue modifications to be consiered in fragments (values >1 will increase complexity and RAM consumption) |
identMeas |
(character) comparison type (used in findCloseMatch(), default ="ppm"), used with limit 'limitIdent' |
knownMods |
(character) optional custom alternative to |
limitIdent |
(integer) limit applied to 'identMeas' |
filtAmbiguous |
(logical) toggle to remove all ambiguous identifications |
indexStart |
(integer) for keeping correct index at iterative use |
recalibFact |
(numeric, length=1) |
suplPepTab |
(matrix) predicted fragments (incl fixed and var modifs) to include to search (allowong to ensure overlap to include hits close to prev search) |
massTy |
(character) 'mono' or 'average' |
silent |
(logical) suppress messages |
callFrom |
(character) allow easier tracking of messages produced |
debug |
(logical) additional messages for debugging |
The main matching results are in output$massMatch : This list has one entry for each predicted mass where some matches were found.
Thus, the names of the list-elements design the index from argument expMass.
Each list-element contains a numeric vector giving the difference observed to predicted, the names design the unique predicted peptide index/number from output$preMa[,"no"]
list with $massMatch (list of exerimental peptides matching to one or more predicted), $preMa (predicted ions, including fixed and variable modif), $pepTab (predicted neutral peptides, wo modifications), $expMa (experimental mass from input), $recalibFact (recalibration factor as from input), $docTi (time for calculations)
makeFragments, identifFixedModif, identifyPepFragments
protP <- c(protP="PEPTIDE") obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) rownames(obsMassX) <- c("E","P","T","I","D") # all 1 & 7 ions not included identP10 <- identifFixedModif(prot=protP,expMass=as.numeric(obsMassX),minFragSize=2, maxFragSize=7,modTy=list(basMod=c("b","y"))) # looks ok identP10v <- identifVarModif(identP10,list(varMod="h"), as.numeric(obsMassX),2) identP10v$massMatch # list of matchesprotP <- c(protP="PEPTIDE") obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) rownames(obsMassX) <- c("E","P","T","I","D") # all 1 & 7 ions not included identP10 <- identifFixedModif(prot=protP,expMass=as.numeric(obsMassX),minFragSize=2, maxFragSize=7,modTy=list(basMod=c("b","y"))) # looks ok identP10v <- identifVarModif(identP10,list(varMod="h"), as.numeric(obsMassX),2) identP10v$massMatch # list of matches
Function for predicting internal and terminal peptide-fragments and compare them with experimental monoisotopic masses. The accuracy of results is given in ppm and a false discovery rate (FDR) for the identification is estimated. The identifed fragments are also checked for preferential break sites, a score including this and other parameters is given with the results.
identifyPepFragments( expMass, pep, modTy = NULL, minFragSize = 6, maxFragSize = 75, identMeas = "ppm", limitIdent = 5, internFra = TRUE, specModif = NULL, massTy = "mono", chargeCatchFilter = TRUE, corMutShift = NULL, nProc = 1, parallDefault = TRUE, multParam = NULL, maxMod = c(p = 3, h = 1, k = 1, o = 1, m = 1, n = 1, u = 1, r = 1, s = 1), recalibrate = TRUE, filtAmbiguous = FALSE, prefFragmPat = NULL, sortOutputByMass = FALSE, silent = FALSE, callFrom = NULL, debug = FALSE )identifyPepFragments( expMass, pep, modTy = NULL, minFragSize = 6, maxFragSize = 75, identMeas = "ppm", limitIdent = 5, internFra = TRUE, specModif = NULL, massTy = "mono", chargeCatchFilter = TRUE, corMutShift = NULL, nProc = 1, parallDefault = TRUE, multParam = NULL, maxMod = c(p = 3, h = 1, k = 1, o = 1, m = 1, n = 1, u = 1, r = 1, s = 1), recalibrate = TRUE, filtAmbiguous = FALSE, prefFragmPat = NULL, sortOutputByMass = FALSE, silent = FALSE, callFrom = NULL, debug = FALSE )
expMass |
(matrix or data.frame) |
pep |
(character) protein/peptide sequences to be used for fragmentation |
modTy |
(list) defining fixed and variable modifications |
minFragSize |
(integer) min length in AA of peptides to be considered (please see you spectrometers characteristics) |
maxFragSize |
(integer) max length in AA of peptides to be considered (please see you spectrometers characteristics) |
identMeas |
(character) comparison type (used in findCloseMatch(), default ="ppm"), used with limit 'limitIdent' |
limitIdent |
(integer) limit applied to 'identMeas' |
internFra |
(logical) switch from including all internal fragments to terminal fragments only (if F) |
specModif |
(list) optional custom single-site modifications (eg ions bound), will be processed using |
massTy |
(list) list of modifications/fragmentation-type(s) to consider, organipredMae as 'basMod' (any occurance) and 'varMod' (optional aoccurance), 'modPos' (position of modif, integer), 'modMass' (mass to be added), 'modName' (name), 'modFixed' (fixed or variable modif, logical) |
chargeCatchFilter |
(logical) filter (upfront) to consider only peptides containing AAs capable of catching extra charges (K, R, H, defined via |
corMutShift |
(numeric) (numeric) vector of decoy-type possible mass shifts (eg from load("C:/E/projects/MassSpec/fragmIdentif/corMutShift.RData")) |
nProc |
(integer) number of preocessors to use |
parallDefault |
(logical) if 'parallDefault'=F no multiprocessor parameters set for BiocParallel |
multParam |
(list) |
maxMod |
(integer) maximum number of residue modifications to be consiered in fragments (values >1 will increase complexity and RAM consumption) |
recalibrate |
(logical) recalibrate based on region with highest density of experim values |
filtAmbiguous |
(logical) |
prefFragmPat |
(matrix) optional custum preferential fragmentation pattern (otherwise |
sortOutputByMass |
(logical) |
silent |
(logical) suppress messages |
callFrom |
(character) allow easier tracking of messages produced |
debug |
(logical) additional messages for debugging |
matrix of idenitfied ions
makeFragments, identifVarModif, identifFixedModif, findCloseMatch, scoreProteinFragments
protP <- c(protP="PEPTIDE") obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) rownames(obsMassX) <- c("E","P","T","I","D") # all 1 & 7 ions not included modTy1 <- list(basMod=c("b","y"), varMod=c("p","o","q")) frag1 <- identifyPepFragments(ex=as.numeric(obsMassX), pe=protP, modTy=modTy1, minFragSize=2, chargeCatchFilter=FALSE) (frag1b <- if(length(unlist(frag1$identif)) >0) modifFragmTabOutput(frag1))protP <- c(protP="PEPTIDE") obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) rownames(obsMassX) <- c("E","P","T","I","D") # all 1 & 7 ions not included modTy1 <- list(basMod=c("b","y"), varMod=c("p","o","q")) frag1 <- identifyPepFragments(ex=as.numeric(obsMassX), pe=protP, modTy=modTy1, minFragSize=2, chargeCatchFilter=FALSE) (frag1b <- if(length(unlist(frag1$identif)) >0) modifFragmTabOutput(frag1))
Makes terminal and internal fragments based on protein-sequence and present as matrix including heading and/or tailing amino-acid or theoretical molecular mass of all fragments.
As the number of theoretically possible fragments increases with the size of the peptide/protein treated it is recommended to adopt arguments like masFragSize to
realistic values for the type of mass spectrometer used, since efficient filtering will reduce considerably the amount of memory (RAM) needed and will improve overal performance.
makeFragments( protTab, minFragSize = 6, maxFragSize = 300, internFra = TRUE, knownMods = NULL, redRedundSeq = FALSE, prefFragPat = NULL, remNonConfPrefFragm = TRUE, ambigLab = c(duplSequence = "duplSequence", isoMass = "isoMass"), massTy = "mono", specModif = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )makeFragments( protTab, minFragSize = 6, maxFragSize = 300, internFra = TRUE, knownMods = NULL, redRedundSeq = FALSE, prefFragPat = NULL, remNonConfPrefFragm = TRUE, ambigLab = c(duplSequence = "duplSequence", isoMass = "isoMass"), massTy = "mono", specModif = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )
protTab |
(character or matrix) named vector of protein-seqences to fragment or or, matrix with 1st column 'ma' with mass and 2nd column 'se' with protein-/peptide-sequence (optionally sequence(s) also as rownames or 3rd column with 'trivial' names) |
minFragSize |
(integer) minimum number of amino-acids for being considered |
maxFragSize |
(integer) maximum number of amino-acids for being considered |
internFra |
(logical) toggle if internal framents will be produced or not |
knownMods |
(character) optional custom alternative to |
redRedundSeq |
(logical) reduce redundant sequences to 1st appearance in all further treatments |
prefFragPat |
(matrix) for preferential fragmentation rules (see also |
remNonConfPrefFragm |
(logical) allows to remove (peptide-)fragments non conform with preferential fragmentation rules (using |
ambigLab |
(character) text-labels for ambiguities (first for duplicated sequences second for iso-mass) |
massTy |
(character) default 'mono' for mono-isotopic masses (alterative 'average') |
specModif |
(list) supplemental custom fixed or variable modifications (eg Zn++ at given residue) |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
matrix with fragment sequence, mass, start- and end-position, heading and tailing AA (or NA if terminal fragment)
makeFragments; evalIsoFragm, from package wrProteo convAASeq2mass, AAmass, massDeFormula
protP <- c(protP="PEPTIDE") pepT1 <- makeFragments(protTab=protP, minFragSize=2, maxFragSize=9, internFra=TRUE) tail(pepT1) protP2 <- cbind(se="PEPTIDE", ma="1304.7088503626") pepT2 <- makeFragments(protTab=protP2, minFragSize=2, maxFragSize=9, internFra=TRUE) tail(pepT1)protP <- c(protP="PEPTIDE") pepT1 <- makeFragments(protTab=protP, minFragSize=2, maxFragSize=9, internFra=TRUE) tail(pepT1) protP2 <- cbind(se="PEPTIDE", ma="1304.7088503626") pepT2 <- makeFragments(protTab=protP2, minFragSize=2, maxFragSize=9, internFra=TRUE) tail(pepT1)
Change fragment identification output to format better adopted for biologists
modifFragmTabOutput( datafr, addData = NULL, fuseC = c("precAA", "seq", "tailAA"), sep = ".", modifCol = "mod", replMod = cbind(old = "by", new = "i"), finCols = c("fraNa", "origNa", "beg", "end", "seq", "ty", "mod", "modSpec", "obsMass", "mass", "ppmToPred", "ambig", "runNo", "FDR", "sco4", "sc.prefFrag", "sc.chargeCatch", "sc.complemFra", "sc.sameSite", "logInt"), supFinCols = NULL, sortTable = "end", silent = FALSE, debug = FALSE, callFrom = NULL )modifFragmTabOutput( datafr, addData = NULL, fuseC = c("precAA", "seq", "tailAA"), sep = ".", modifCol = "mod", replMod = cbind(old = "by", new = "i"), finCols = c("fraNa", "origNa", "beg", "end", "seq", "ty", "mod", "modSpec", "obsMass", "mass", "ppmToPred", "ambig", "runNo", "FDR", "sco4", "sc.prefFrag", "sc.chargeCatch", "sc.complemFra", "sc.sameSite", "logInt"), supFinCols = NULL, sortTable = "end", silent = FALSE, debug = FALSE, callFrom = NULL )
datafr |
(data.frame) initial output from identifyPepFragments() |
addData |
(matrix or data.frame) suppelemental data |
fuseC |
(character) columns to exract preceeding and tailing AA to fuse with separator 'sep' to main sequence |
sep |
(character) separator for concatenation |
modifCol |
(character) default 'modif' |
replMod |
(matrix) if names of modifications shoule be renamed : the columns 'old' and 'new' indicata how modifcations should be renamed |
finCols |
(character) columns to retain for final output |
supFinCols |
(character) |
sortTable |
(character) sort output 1st by name, then by 'beg' or 'end' |
silent |
(logical) suppress messages |
debug |
(logical) addtional diagnostic messages |
callFrom |
(character) allow easier tracking of message produced |
data.frame of reorganized identification results
protP <- c(protP="PEPTIDE") obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) rownames(obsMassX) <- c("E","P","T","I","D") # all 1 & 7 ions not included modTy1 <- list(basMod=c("b","y"), varMod=c("p","o","q")) frag1 <- identifyPepFragments(ex=as.numeric(obsMassX), pe=protP, modTy=modTy1, minFragSize=2, chargeCatchFilter=FALSE) (frag1b <- if(length(unlist(frag1$identif)) >0) modifFragmTabOutput(frag1))protP <- c(protP="PEPTIDE") obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) rownames(obsMassX) <- c("E","P","T","I","D") # all 1 & 7 ions not included modTy1 <- list(basMod=c("b","y"), varMod=c("p","o","q")) frag1 <- identifyPepFragments(ex=as.numeric(obsMassX), pe=protP, modTy=modTy1, minFragSize=2, chargeCatchFilter=FALSE) (frag1b <- if(length(unlist(frag1$identif)) >0) modifFragmTabOutput(frag1))
Take result from fragIonMass and display identified fragments by their location, an additional parameter (default logIntensity) is used for coloring
This function illustrates the distribution of identfied peptidesa and thus common break-points.
plotFragmLoc( fraL, extrCol = NULL, useLog = FALSE, useCol = NULL, specLayout = NULL, useTi = NULL, subTit = NULL, footer = NULL, batchFig = FALSE, legCex = 0.5, legOffS = NULL, legBorder = NULL, silent = FALSE, callFrom = NULL, debug = FALSE )plotFragmLoc( fraL, extrCol = NULL, useLog = FALSE, useCol = NULL, specLayout = NULL, useTi = NULL, subTit = NULL, footer = NULL, batchFig = FALSE, legCex = 0.5, legOffS = NULL, legBorder = NULL, silent = FALSE, callFrom = NULL, debug = FALSE )
fraL |
(list) result from |
extrCol |
(character) 1st should be aa seq of initial proteins (used for dimensionong graph and separating multiple input proteins), 2nd & 3rd start- and ed-site for drawing;, 4th the column to use for coloring), 5th for protein name in title of figure |
useLog |
(logical) take values for coloring (4th element of 'extrCol') as log10 |
useCol |
(character) custom colors |
specLayout |
(character) custom layout |
useTi |
(character) custom title |
subTit |
(character) custom sub-title |
footer |
(character) custom footer |
batchFig |
(logical) reduce text content for multiple figues on page |
legCex |
(numeric, length=1) expansion factor |
legOffS |
(numeric) legend-offset (passed to |
legBorder |
(logical) legend-border (passed to |
silent |
(logical) suppress messages |
callFrom |
(character) allow easier tracking of message(s) produced |
debug |
(logical) additional messages for debugging |
This function returns a figure
protP <- c(protP="PEPTIDE") obsMassX <- cbind(a=c(199.1077, 296.1605, 397.2082, 510.2922,625.3192), b=c(227.1026, 324.1554, 425.2031, 538.2871, 653.3141), x=c(729.2937, 600.2511, 503.1984, 402.1507, 289.0666), y=c(703.3145, 574.2719, 477.2191, 376.1714, 263.0874)) rownames(obsMassX) <- c("E","P","T","I","D") # all 1 & 7 ions not included identP1 <- identifFixedModif(prot=protP,expMass=as.numeric(obsMassX), minFragSize=2, maxFragSize=7, modTy=list(basMod=c("b","y"))) #protP <- c(protP="PEPTIDE") obsMassX <- cbind(a=c(199.1077, 296.1605, 397.2082, 510.2922,625.3192), b=c(227.1026, 324.1554, 425.2031, 538.2871, 653.3141), x=c(729.2937, 600.2511, 503.1984, 402.1507, 289.0666), y=c(703.3145, 574.2719, 477.2191, 376.1714, 263.0874)) rownames(obsMassX) <- c("E","P","T","I","D") # all 1 & 7 ions not included identP1 <- identifFixedModif(prot=protP,expMass=as.numeric(obsMassX), minFragSize=2, maxFragSize=7, modTy=list(basMod=c("b","y"))) #
Draw simplified (deconvoluted) spectrum of mgf type and highlight matches found
plotMgfLike( lst, basInp = NULL, backgrCol = NULL, replNames = c("by", "i"), lwd = 1, col = NULL, tit = NULL, xLim = NULL, yLab = NULL, linPlot = FALSE, listNa = c("identif", "overview", "obsMass"), useColN = c("obsMass", "logInt", "origNa", "mod", "modSpec", "fraNa", "orig", "nIdentif", "minMassDec", "maxMass"), cex = 1, silent = FALSE, debug = FALSE, callFrom = NULL )plotMgfLike( lst, basInp = NULL, backgrCol = NULL, replNames = c("by", "i"), lwd = 1, col = NULL, tit = NULL, xLim = NULL, yLab = NULL, linPlot = FALSE, listNa = c("identif", "overview", "obsMass"), useColN = c("obsMass", "logInt", "origNa", "mod", "modSpec", "fraNa", "orig", "nIdentif", "minMassDec", "maxMass"), cex = 1, silent = FALSE, debug = FALSE, callFrom = NULL )
lst |
(list) result of identificationi with element |
basInp |
(numeric) alternative/custom entry of observaed masses |
backgrCol |
(character) color of background |
replNames |
(character) |
lwd |
(numeric) line width |
col |
(character) custom colors for different types of ions/modifications |
tit |
(character) custom title |
xLim |
(numeric length=2) custom x axis margins |
yLab |
(character) custom y axis label |
linPlot |
(logical) re-transform y-axis from log2 to linear scale |
listNa |
(character) list-elements of 'lst' to use/extract |
useColN |
(character) columns names tu use from input ie lst$identif & lst$overview |
cex |
(numeric) expansion factor for x- and y-label |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allows easier tracking of messages produced |
This function returns a mgf-like figure
makeFragments, identifVarModif, identifFixedModif, identifyPepFragments
set.seed(2025)set.seed(2025)
This simple function allows plotting the expected number of theoretical fragments from random fragmentation of peptides/proteins (in mass spectrometry). Here, only the pure fragmentation without any variable fragmentation is considered, all fragment-sizes are included (ie, no gating). For simplicity, possible (variable) modifications like loss of neutrals, etc, are not considered.
plotNTheor( x, tit = "Number of term and intern fragm", xlab = "Number of aa", ylab = "", col = 2:3, log = "", mark = NULL, cexMark = 0.75 )plotNTheor( x, tit = "Number of term and intern fragm", xlab = "Number of aa", ylab = "", col = 2:3, log = "", mark = NULL, cexMark = 0.75 )
x |
(integer) length (in amino-acids) of input peptides/proteins to be considered |
tit |
(character) custom title |
xlab |
(character) custom x-axis label |
ylab |
(character) custom y-axis label |
col |
(character or integer) cutsom colors |
log |
(character) define which axis should be log (use "xy" for drawing both x- and y-axis as log-scale) |
mark |
(matrix) first column for text and second column for where it should be stated along the top border of the figure (x-coordinate) |
cexMark |
(numeric) cex expansion-factor for text from argument |
figure only
marks <- data.frame(name=c("Ubiquitin\n76aa", "Glutamate dehydrogenase 1\n501aa"), length=c(76,501)) plotNTheor(x=20:750, log="", mark=marks)marks <- data.frame(name=c("Ubiquitin\n76aa", "Glutamate dehydrogenase 1\n501aa"), length=c(76,501)) plotNTheor(x=20:750, log="", mark=marks)
plot preferential fragmenation pattern
Plot preferential fragmenation pattern equivalent to Fig 1b of Haverland et al 2017 (J Am Soc Mass Spectrom)
plotPrefFragPat( prefPat, namesCex = 0.8, silent = FALSE, debug = FALSE, callFrom = NULL )plotPrefFragPat( prefPat, namesCex = 0.8, silent = FALSE, debug = FALSE, callFrom = NULL )
prefPat |
(matix) |
namesCex |
(numeric) expansion factor cex for display of AA-names |
silent |
(logical) suppress messages |
debug |
(logical) additional messages and objects exportet to current session for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a figure
plotPrefFragPat(.prefFragPattern())plotPrefFragPat(.prefFragPattern())
Make full random decoy mass vector (mimick pepTab)
randMassByMut( pepTab, randCha, useCol = "mass", negAvoid = TRUE, sepCol = FALSE, inDel = FALSE, nAlter = 1, setSeed = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )randMassByMut( pepTab, randCha, useCol = "mass", negAvoid = TRUE, sepCol = FALSE, inDel = FALSE, nAlter = 1, setSeed = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )
pepTab |
(matrix) typically table of petides, one column should match 'useCol' for numeric mass values |
randCha |
(numeric) vector of possible mass alterations for random drawing |
useCol |
(character) column from 'pepTab' with mass values to make decoys |
negAvoid |
(logical) if TRUE try avoiding 0 or negative random mass in result |
sepCol |
(character) optional column from 'pepTab' |
inDel |
(logical) switch to make random mass by insertion/deletion of 1 AA |
nAlter |
(integer) number of alterations per peptide |
setSeed |
(character) seed for random number generation set.seed() |
silent |
(logical) suppress messages |
debug |
(logical) additional messages and objects exportet to current session for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a matrix with additional column 'decoyMass', or if 'sepCol'=FALSE as additional lines
pepTab1 <- cbind(no=11:12, seq=c("YVVDTS","YVVDTSK"), origNa="test.P000", ty=c("inter","Cter"),mass=c(681.308997360991,809.403960378691)) randMassByMut(pepTab1, corMutShift()) randMassByMut(pepTab1, corInDelShift(), inDel=TRUE)pepTab1 <- cbind(no=11:12, seq=c("YVVDTS","YVVDTSK"), origNa="test.P000", ty=c("inter","Cter"),mass=c(681.308997360991,809.403960378691)) randMassByMut(pepTab1, corMutShift()) randMassByMut(pepTab1, corInDelShift(), inDel=TRUE)
Make full random decoy mass vector (mimick pepTab)
randMassByStochastic( xChar, nRepeat = 5, negAvoid = TRUE, setSeed = NULL, silent = FALSE, debug = TRUE, callFrom = NULL )randMassByStochastic( xChar, nRepeat = 5, negAvoid = TRUE, setSeed = NULL, silent = FALSE, debug = TRUE, callFrom = NULL )
xChar |
(numeric vector) characterize main data, must conatain elements 'n','minV','maxV' |
nRepeat |
(integer) number of time whole randomization process should be repeated |
negAvoid |
(logical) if TRUE try avoiding 0 or negative random mass in result |
setSeed |
(integer) seed for random number generation set.seed() |
silent |
(logical) suppress messages |
debug |
(logical) additional messages and objects exportet to current session for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a matrix with additional column 'decoyMass', or if 'sepCol'=FALSE as additional lines
Uniform, randMassByMut, convToNum
rand <- randMassByStochastic(c(n=10, minV=2, maxV=7)) summary(rand)rand <- randMassByStochastic(c(n=10, minV=2, maxV=7)) summary(rand)
Make score based on cumulative search for AA with given potential to catch charge (H+, or optionally any charge). Note : at current cumulative scoring large peptides may get priviliged.
scoreChargeCatch( resTab, pepCol = "seq", scale01 = TRUE, chargeMode = "pos", silent = FALSE, debug = FALSE, callFrom = NULL )scoreChargeCatch( resTab, pepCol = "seq", scale01 = TRUE, chargeMode = "pos", silent = FALSE, debug = FALSE, callFrom = NULL )
resTab |
(matrix or data.frame) matrix or data.frame of results for SINGLE protein (here only the column specified with argument 'pepCol' will be used) |
pepCol |
(character) column name of 'resTab' containing the peptide sequence to be scored |
scale01 |
(logical) linear rescale output to maximum 1.0 |
chargeMode |
(character) this value may be 'pos' (default) for the positively charged amino-acids K,R and H or, if this argument has any other value, than all charged amino-acids (K,R,H, S,T,N,Q, D,E, W and Y) will be considered. |
silent |
(logical) suppress messages |
debug |
(logical) additional messages and objects exportet to current session for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a numeric vector with score for each peptide of resTab (even if scale01=TRUE minimum may be >0 if all peptides do contain charge-catching AAs)
resTa <- matrix(c(1:4, "PEPTID","PEPTIK","PEPTRK","AGV"), ncol=2, dimnames=list(NULL,c("predInd","seq"))) scoreChargeCatch(resTa)resTa <- matrix(c(1:4, "PEPTID","PEPTIK","PEPTRK","AGV"), ncol=2, dimnames=list(NULL,c("predInd","seq"))) scoreChargeCatch(resTa)
Make scoring for single protein : individual components :sameSite,contiguous,prefFragSite,logPeakHeight + combined (sum of scales 0->1)
scoreFragments( resTab, fragmInp, suplTakeLog = TRUE, j = NULL, useResCol = c("orig", "seq", "precAA", "tailAA", "beg", "end", "ppmToPred", "obsInd", "predInd", "Abundance"), prefFragPat = NULL, contigTermFragWe = 0.5, figDraw = TRUE, silent = FALSE, debug = FALSE, callFrom = NULL )scoreFragments( resTab, fragmInp, suplTakeLog = TRUE, j = NULL, useResCol = c("orig", "seq", "precAA", "tailAA", "beg", "end", "ppmToPred", "obsInd", "predInd", "Abundance"), prefFragPat = NULL, contigTermFragWe = 0.5, figDraw = TRUE, silent = FALSE, debug = FALSE, callFrom = NULL )
resTab |
(matrix or data.frame) matrix or data.frame of results for SINGLE protein (will use columns 'beg','end','orig','obsInd') |
fragmInp |
(matrix) experimental m/z values including suppl col(s) to be considered for score, its intensity column/value will be used for 'logInt' in output |
suplTakeLog |
(logical) if suppl info should be used as log2: if T all supplemental data columns ('fragmInp') will be taken as log2 |
j |
(integer) which column of fragmInp has m/z values, the following column is assumed as peak-intensity |
useResCol |
(character) column-names from resTab to be used |
prefFragPat |
(matrix) for preferential fragmentation rules (see |
contigTermFragWe |
(numeric, length=1) weight to add for terminal fragments at 'sc.complemFra' (since they cannot match other fragments beyond the protein limits) |
figDraw |
(logical) make additional figure |
silent |
(logical) suppress messages |
debug |
(logical) additional messages and objects exportet to current session for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a list with matrix $scaled (combined and indiv rescaled scores) and $raw (matching lines of 'resTab'; 'index' refers to predictedIndex)
identifyPepFragments, scoreProteinFragments
tab2 <- matrix(c("20","2","13","11","3","10","4", "PT","PE","EP","DE","PEP","IDE","PEPT", rep(c("PEPTIDE","protP"),each=7), c("inter","Nter","Cter")[c(1,2,1,3,2,3,2)], c(3,1,2,6,1,5,1, 4,2,3,7,3,7,4), "E",NA,"P","I",NA,"T",NA, "I","P","T",NA,"T",NA,"I", c(1,6,6,20,7,19,8), c(-0.094312,-0.14707,-0.14707,0.08641,0.0084762,-0.10965,0.057087), rep(2,7)), nrow=7, dimnames=list(NULL,c("predInd","seq","orig","origNa","ty","beg","end", "precAA","tailAA","obsInd","ppmToPred","mass"))) tab2 <- cbind(tab2, seqNa=paste0(tab2[,"origNa"],".",tab2[,"beg"],"-",tab2[,"end"]),Abundance=1) rownames(tab2) <- paste0(tab2[,"origNa"],".", tab2[,"beg"],"", tab2[,"end"]) obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) (outF <- scoreFragments(tab2, fragmInp=cbind(as.numeric(obsMassX), Abundance=1)))tab2 <- matrix(c("20","2","13","11","3","10","4", "PT","PE","EP","DE","PEP","IDE","PEPT", rep(c("PEPTIDE","protP"),each=7), c("inter","Nter","Cter")[c(1,2,1,3,2,3,2)], c(3,1,2,6,1,5,1, 4,2,3,7,3,7,4), "E",NA,"P","I",NA,"T",NA, "I","P","T",NA,"T",NA,"I", c(1,6,6,20,7,19,8), c(-0.094312,-0.14707,-0.14707,0.08641,0.0084762,-0.10965,0.057087), rep(2,7)), nrow=7, dimnames=list(NULL,c("predInd","seq","orig","origNa","ty","beg","end", "precAA","tailAA","obsInd","ppmToPred","mass"))) tab2 <- cbind(tab2, seqNa=paste0(tab2[,"origNa"],".",tab2[,"beg"],"-",tab2[,"end"]),Abundance=1) rownames(tab2) <- paste0(tab2[,"origNa"],".", tab2[,"beg"],"", tab2[,"end"]) obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) (outF <- scoreFragments(tab2, fragmInp=cbind(as.numeric(obsMassX), Abundance=1)))
Search for preferential fragmentation sites from 'pepTab' among the 3 colums specified via 'useCol' (for full AA sequence, preceeding AA, tailing AA) and return sum of scores (from 3rd column of prefFragPat) for both ends. Note : proteins must be witten as single lettre code.
scorePrefFrag( pepTab, useCol = c("seq", "precAA", "tailAA"), prefFragPat = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )scorePrefFrag( pepTab, useCol = c("seq", "precAA", "tailAA"), prefFragPat = NULL, silent = FALSE, debug = FALSE, callFrom = NULL )
pepTab |
(matrix) peptide-fragments with lines for peptides, cols as sequence/preceedingAA/tailingAA |
useCol |
(character) column names for peptide-sequence,preceeding and tailing AA |
prefFragPat |
(matrix) for preferential fragmentation rules (see |
silent |
(logical) suppress messages |
debug |
(logical) additional messages and objects exportet to current session for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function returns a matrix with fragment sequence, mass, start- and end-position, heading and tailing AA (or NA if terminal fragment)
pepT <- cbind(precAA=c("A","D","D","A","D"),seq=c("AKA","PKA","AKA","PKD","PKD"), tailAA=c("A","A","D","P","P")) scorePrefFrag(pepT)pepT <- cbind(precAA=c("A","D","D","A","D"),seq=c("AKA","PKA","AKA","PKD","PKD"), tailAA=c("A","A","D","P","P")) scorePrefFrag(pepT)
Make scoring for multiple protein queries: individual components :sameSite,contiguous,prefFragSite,logPeakHeight + combined (sum of scales 0->1)
scoreProteinFragments( resTab, fragmInp = NULL, j = 2, useCol = c("orig", "precAA", "tailAA", "beg", "end", "ppmToPred", "obsInd", "predInd"), prefFragPat = NULL, contigTermFragWe = 0.5, returnCombined = TRUE, figDraw = TRUE, silent = FALSE, callFrom = NULL, debug = FALSE )scoreProteinFragments( resTab, fragmInp = NULL, j = 2, useCol = c("orig", "precAA", "tailAA", "beg", "end", "ppmToPred", "obsInd", "predInd"), prefFragPat = NULL, contigTermFragWe = 0.5, returnCombined = TRUE, figDraw = TRUE, silent = FALSE, callFrom = NULL, debug = FALSE )
resTab |
(matrix or data.frame) identification results (will use columns 'beg','end','orig','obsInd') |
fragmInp |
(numeric vector or matrix) experimental m/z values, may include suppl col(s) to be considered for score (se argument 'j') |
j |
(integer) which column of fragmInp has m/z values, the following column is assumed as peak-intensity |
useCol |
(character) colnames from resTab to be used, 1st posiion must be present as column of 'resTab' and must represent name of original, used for splitting by proteins (eg original input protein sequence), will be passed to |
prefFragPat |
(matrix) for preferential fragmentation rules (see |
contigTermFragWe |
(numeric, length=1) weight to add for terminal fragments (since they cannot match other fragments beyond the protein limits) |
returnCombined |
(logical) |
figDraw |
(logical) make additional figure |
silent |
(logical) suppress messages |
callFrom |
(character) allow easier tracking of message(s) produced |
debug |
(logical) for bug-tracking: more/enhanced messages and intermediate objects written in global name-space |
This function returns a list with matrix $scaled (combined and indiv rescaled scores) and $raw (matching lines of 'resTab')
scoreFragments, identifyPepFragments
tab2 <- matrix(c("20","2","13","11","3","10","4", "PT","PE","EP","DE","PEP","IDE","PEPT", rep(c("PEPTIDE","protP"),each=7), c("inter","Nter","Cter")[c(1,2,1,3,2,3,2)], c(3,1,2,6,1,5,1, 4,2,3,7,3,7,4), "E",NA,"P","I",NA,"T",NA, "I","P","T",NA,"T",NA,"I", c(1,6,6,20,7,19,8), c(-0.094312,-0.14707,-0.14707,0.08641,0.0084762,-0.10965,0.057087), rep(2,7)), nrow=7, dimnames=list(NULL, c("predInd","seq","orig","origNa","ty","beg","end", "precAA","tailAA","obsInd","ppmToPred","mass"))) tab2 <- cbind(tab2, seqNa=paste0(tab2[,"origNa"],".",tab2[,"beg"],"-",tab2[,"end"]),Abundance=1) rownames(tab2) <- paste0(tab2[,"origNa"],".", tab2[,"beg"],"", tab2[,"end"]) obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) (outF <- scoreFragments(tab2, fragmInp=cbind(as.numeric(obsMassX), Abundance=1))) (out <- scoreProteinFragments(tab2, fragmInp=cbind(as.numeric(obsMassX), Abundance=1)))tab2 <- matrix(c("20","2","13","11","3","10","4", "PT","PE","EP","DE","PEP","IDE","PEPT", rep(c("PEPTIDE","protP"),each=7), c("inter","Nter","Cter")[c(1,2,1,3,2,3,2)], c(3,1,2,6,1,5,1, 4,2,3,7,3,7,4), "E",NA,"P","I",NA,"T",NA, "I","P","T",NA,"T",NA,"I", c(1,6,6,20,7,19,8), c(-0.094312,-0.14707,-0.14707,0.08641,0.0084762,-0.10965,0.057087), rep(2,7)), nrow=7, dimnames=list(NULL, c("predInd","seq","orig","origNa","ty","beg","end", "precAA","tailAA","obsInd","ppmToPred","mass"))) tab2 <- cbind(tab2, seqNa=paste0(tab2[,"origNa"],".",tab2[,"beg"],"-",tab2[,"end"]),Abundance=1) rownames(tab2) <- paste0(tab2[,"origNa"],".", tab2[,"beg"],"", tab2[,"end"]) obsMassX <- cbind(a=c(199.1077,296.1605,397.2082,510.2922,625.3192), b=c(227.1026,324.1554,425.2031,538.2871,653.3141), x=c(729.2937,600.2511,503.1984,402.1507,289.0666), y=c(703.3145,574.2719,477.2191,376.1714,263.0874)) (outF <- scoreFragments(tab2, fragmInp=cbind(as.numeric(obsMassX), Abundance=1))) (out <- scoreProteinFragments(tab2, fragmInp=cbind(as.numeric(obsMassX), Abundance=1)))