R packages by wraff

wrMisc - Analyze Experimental High-Throughput (Omics) Data

The efficient treatment and convenient analysis of experimental high-throughput (omics) data gets facilitated through this collection of diverse functions. Several functions address advanced object-conversions, like manipulating lists of lists or lists of arrays, reorganizing lists to arrays or into separate vectors, merging of multiple entries, etc. Another set of functions provides speed-optimized calculation of standard deviation (sd), coefficient of variance (CV) or standard error of the mean (SEM) for data in matrixes or means per line with respect to additional grouping (eg n groups of replicates). A group of functions facilitate dealing with non-redundant information, by indexing unique, adding counters to redundant or eliminating lines with respect redundancy in a given reference-column, etc. Help is provided to identify very closely matching numeric values to generate (partial) distance matrixes for very big data in a memory efficient manner or to reduce the complexity of large data-sets by combining very close values. Other functions help aligning a matrix or data.frame to a reference using partial matching or to mine an experimental setup to extract patterns of replicate samples. Many times large experimental datasets need some additional filtering, adequate functions are provided. Convenient data normalization is supported in various different modes, parameter estimation via permutations or boot-strap as well as flexible testing of multiple pair-wise combinations using the framework of 'limma' is provided, too. Batch reading (or writing) of sets of files and combining data to arrays is supported, too.

Last updated 17 hours ago

4.36 score 4 dependents 33 scripts 1.4k downloads

wrProteo - Proteomics Data Analysis Functions

Data analysis of proteomics experiments by mass spectrometry is supported by this collection of functions mostly dedicated to the analysis of (bottom-up) quantitative (XIC) data. Fasta-formatted proteomes (eg from UniProt Consortium <doi:10.1093/nar/gky1049>) can be read with automatic parsing and multiple annotation types (like species origin, abbreviated gene names, etc) extracted. Initial results from multiple software for protein (and peptide) quantitation can be imported (to a common format): MaxQuant (Tyanova et al 2016 <doi:10.1038/nprot.2016.136>), Dia-NN (Demichev et al 2020 <doi:10.1038/s41592-019-0638-x>), Fragpipe (da Veiga et al 2020 <doi:10.1038/s41592-020-0912-y>), ionbot (Degroeve et al 2021 <doi:10.1101/2021.07.02.450686>), MassChroq (Valot et al 2011 <doi:10.1002/pmic.201100120>), OpenMS (Strauss et al 2021 <doi:10.1038/nmeth.3959>), ProteomeDiscoverer (Orsburn 2021 <doi:10.3390/proteomes9010015>), Proline (Bouyssie et al 2020 <doi:10.1093/bioinformatics/btaa118>), AlphaPept (preprint Strauss et al <doi:10.1101/2021.07.23.453379>) and Wombat-P (Bouyssie et al 2023 <doi:10.1021/acs.jproteome.3c00636>. Meta-data provided by initial analysis software and/or in sdrf format can be integrated to the analysis. Quantitative proteomics measurements frequently contain multiple NA values, due to physical absence of given peptides in some samples, limitations in sensitivity or other reasons. Help is provided to inspect the data graphically to investigate the nature of NA-values via their respective replicate measurements and to help/confirm the choice of NA-replacement algorithms. Meta-data in sdrf-format (Perez-Riverol et al 2020 <doi:10.1021/acs.jproteome.0c00376>) or similar tabular formats can be imported and included. Missing values can be inspected and imputed based on the concept of NA-neighbours or other methods. Dedicated filtering and statistical testing using the framework of package 'limma' <doi:10.18129/B9.bioc.limma> can be run, enhanced by multiple rounds of NA-replacements to provide robustness towards rare stochastic events. Multi-species samples, as frequently used in benchmark-tests (eg Navarro et al 2016 <doi:10.1038/nbt.3685>, Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>), can be run with special options considering such sub-groups during normalization and testing. Subsequently, ROC curves (Hand and Till 2001 <doi:10.1023/A:1010920819831>) can be constructed to compare multiple analysis approaches. As detailed example the data-set from Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>) quantified by MaxQuant, ProteomeDiscoverer, and Proline is provided with a detailed analysis of heterologous spike-in proteins.

Last updated 9 hours ago

3.61 score 1 dependents 17 scripts 1.3k downloads

wrGraph - Graphics in the Context of Analyzing High-Throughput Data

Additional options for making graphics in the context of analyzing high-throughput data are available here. This includes automatic segmenting of the current device (eg window) to accommodate multiple new plots, automatic checking for optimal location of legends in plots, small histograms to insert as legends, histograms re-transforming axis labels to linear when plotting log2-transformed data, a violin-plot <doi:10.1080/00031305.1998.10480559> function for a wide variety of input-formats, principal components analysis (PCA) <doi:10.1080/14786440109462720> with bag-plots <doi:10.1080/00031305.1999.10474494> to highlight and compare the center areas for groups of samples, generic MA-plots (differential- versus average-value plots) <doi:10.1093/nar/30.4.e15>, staggered count plots and generation of mouse-over interactive html pages.

Last updated 9 hours ago

2.30 score 6 scripts 837 downloads

wrTopDownFrag - Internal Fragment Identification from Top-Down Mass Spectrometry

Top-Down mass spectrometry aims to identify entire proteins as well as their (post-translational) modifications or ions bound (eg Chen et al (2018) <doi:10.1021/acs.analchem.7b04747>). The pattern of internal fragments (Haverland et al (2017) <doi:10.1007/s13361-017-1635-x>) may reveal important information about the original structure of the proteins studied (Skinner et al (2018) <doi:10.1038/nchembio.2515> and Li et al (2018) <doi:10.1038/nchem.2908>). However, the number of possible internal fragments gets huge with longer proteins and subsequent identification of internal fragments remains challenging, in particular since the the accuracy of measurements with current mass spectrometers represents a limiting factor. This package attempts to deal with the complexity of internal fragments and allows identification of terminal and internal fragments from deconvoluted mass-spectrometry data.

Last updated 5 years ago

2.00 score 2 scripts 119 downloads