Title: | parser for netCDF, mzXML and mzML and mzIdentML files (mass spectrometry data) |
---|---|
Description: | mzR provides a unified API to the common file formats and parsers available for mass spectrometry data. It comes with a subset of the proteowizard library for mzXML, mzML and mzIdentML. The netCDF reading code has previously been used in XCMS. |
Authors: | Bernd Fischer, Steffen Neumann, Laurent Gatto, Qiang Kou, Johannes Rainer |
Maintainer: | Steffen Neumann <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.41.1 |
Built: | 2024-11-20 03:44:14 UTC |
Source: | https://github.com/bioc/mzR |
Copy general information from the originating MS file and write this,
along with the provided spectra data, to a new file. The expected
workflow is the following: data is first loaded from an MS file,
e.g. using peaks
and header
methods,
processed in R and then saved again to an MS file providing the
(eventually) manipulated spectra and header data with arguments
header
and data
.
copyWriteMSData(object, file, original_file, header, backend = "pwiz", outformat = "mzml", rtime_seconds = TRUE, software_processing)
copyWriteMSData(object, file, original_file, header, backend = "pwiz", outformat = "mzml", rtime_seconds = TRUE, software_processing)
object |
|
file |
|
original_file |
|
header |
|
backend |
|
outformat |
|
rtime_seconds |
|
software_processing |
|
copyWriteMSData
supports at present copying data from
mzXML
and mzML
and exporting to mzML
. Copying and
exporting to mzXML
can fail for some input files.
The intention of this function is to copy data from an existing file
and save it along with eventually modified data to a new file. To
write new MS data files use the writeMSData
function
instead.
Johannes Rainer
writeMSData
for a function to save MS data to a new mzML
or mzXML file.
## Open a MS file and read the spectrum and header information library(msdata) fl <- system.file("threonine", "threonine_i2_e35_pH_tree.mzXML", package = "msdata") ms_fl <- openMSfile(fl, backend = "pwiz") ## Get the spectra pks <- spectra(ms_fl) ## Get the header hdr <- header(ms_fl) ## Modify the spectrum data adding 100 to each intensity. pks <- lapply(pks, function(z) { z[, 2] <- z[, 2] + 100 z }) ## Copy metadata and additional information from the originating file ## and save it, along with the modified data, to a new mzML file. out_file <- tempfile() copyWriteMSData(pks, file = out_file, original_file = fl, header = hdr)
## Open a MS file and read the spectrum and header information library(msdata) fl <- system.file("threonine", "threonine_i2_e35_pH_tree.mzXML", package = "msdata") ms_fl <- openMSfile(fl, backend = "pwiz") ## Get the spectra pks <- spectra(ms_fl) ## Get the header hdr <- header(ms_fl) ## Modify the spectrum data adding 100 to each intensity. pks <- lapply(pks, function(z) { z[, 2] <- z[, 2] + 100 z }) ## Copy metadata and additional information from the originating file ## and save it, along with the modified data, to a new mzML file. out_file <- tempfile() copyWriteMSData(pks, file = out_file, original_file = fl, header = hdr)
The methods return matrices of lower (column low
) and upper
(column high
) isolation window offsets. Matrices are returned
as a list of length equal to the number of input files (provided as
file names of raw mass spectrometry data objects, see below). By
default (i.e when unique. = TRUE
), only unique offsets are
returned, as they are expected to identical for all spectra per
acquisition. If this is not the case, a message is displayed.
signature(object = "character", unique. = "logical",
simplify = "logical")
Returns the isolation window for the
file object
. By default, only unique isolation windows
are returned per file (unique = TRUE
); if set to
FALSE
, a matrix
with as many rows as there are MS2
spectra. If only one file passed an input and simplify
is
set to TRUE
(default), the resulting list
of
length 1 is simplified to a matrix
.
signature(object = "mzRpwiz", unique. = "logical",
simplify = "logical")
As above for mzRpwiz
objects.
Laurent Gatto <[email protected]> based on the functionality from the
msPurity:::get_isolation_offsets
function.
library("msdata") f <- msdata::proteomics(full.names = TRUE, pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz") isolationWindow(f) rw <- openMSfile(f) isolationWindow(rw) str(isolationWindow(rw, unique = FALSE))
library("msdata") f <- msdata::proteomics(full.names = TRUE, pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz") isolationWindow(f) rw <- openMSfile(f) isolationWindow(rw) str(isolationWindow(rw, unique = FALSE))
mzR
object.
Accessors to the analytical setup metadata of a run.
runInfo
will show a summary of the experiment as a named list,
including scanCount
, lowMZ
, highMZ
,
dStartTime
, dEndTime
and startTimeStamp
. Note
that startTimeStamp
can only be extracted from mzML
files using the pwiz backend or from CDF files. A
NA
is reported if its value is not available.
The instrumentInfo
method returns a named list
including
instrument manufacturer, model, ionisation technique, analyzer and
detector. mzRpwiz
will give more additional information including
information on sample, software using and original source file.
These individual pieces of information can also be directly
accessed by the specific methods.
mzidInfo
is used for the mzR
object created from a mzid file.
It returns basic information on this mzid file including file provider,
creation date, software, database, enzymes and spectra data format.
The mzidInfo
will return the scoring results in identification.
It will return different results for different searching software used.
runInfo(object) chromatogramsInfo(object) analyzer(object) detector(object) instrumentInfo(object) ionisation(object) softwareInfo(object) sampleInfo(object) sourceInfo(object) model(object) mzidInfo(object) modifications(object, ...) psms(object, ...) substitutions(object) database(object, ...) enzymes(object) tolerance(object, ...) score(x, ...) para(object) specParams(object)
runInfo(object) chromatogramsInfo(object) analyzer(object) detector(object) instrumentInfo(object) ionisation(object) softwareInfo(object) sampleInfo(object) sourceInfo(object) model(object) mzidInfo(object) modifications(object, ...) psms(object, ...) substitutions(object) database(object, ...) enzymes(object) tolerance(object, ...) score(x, ...) para(object) specParams(object)
object |
An instantiated |
x |
An instantiated |
... |
Additional arguments, currently ignored. |
Steffen Neumann, Laurent Gatto and Qiang Kou
See for example peaks
to access the data for the spectra
in a "mzR"
class.
library(msdata) file <- system.file("microtofq/MM8.mzML", package = "msdata") mz <- openMSfile(file) fileName(mz) instrumentInfo(mz) runInfo(mz) close(mz) file <- system.file("cdf/ko15.CDF", package = "msdata") mz <- openMSfile(file, backend = "netCDF") fileName(mz) instrumentInfo(mz) runInfo(mz) close(mz) file <- system.file("mzid", "Tandem.mzid.gz", package="msdata") mzid <- openIDfile(file) softwareInfo(mzid) enzymes(mzid)
library(msdata) file <- system.file("microtofq/MM8.mzML", package = "msdata") mz <- openMSfile(file) fileName(mz) instrumentInfo(mz) runInfo(mz) close(mz) file <- system.file("cdf/ko15.CDF", package = "msdata") mz <- openMSfile(file, backend = "netCDF") fileName(mz) instrumentInfo(mz) runInfo(mz) close(mz) file <- system.file("mzid", "Tandem.mzid.gz", package="msdata") mzid <- openIDfile(file) softwareInfo(mzid) enzymes(mzid)
mzR
and sub-classesThe class mzR
is the main class for the common mass spectrometry
formats. It is a virtual class and thus not supposed to be instanciated
directly.
The sub-classes implement specific APIs to access the underlying data
and metadata in the files. Currently mzRpwiz
is available.
mzRpwiz
uses Proteowizard to access the relevant information in
mzXML
and mzML
files. mzRident
is used as an
interface to mzIdentML
files.
IMPORTANT: New developers that need to access and manipulate raw mass
spectrometry data are advised against using this infrastucture
directly. They are invited to use the corresponding MSnExp
(with on disk mode) from the MSnbase package instead. The
latter supports reading multiple files at once and offers access to
the spectra data (m/z and intensity) as well as all the spectra
metadata using a coherent interface. The MSnbase infrastructure itself
used the low level classes in mzR, thus offering fast and efficient
access.
mzR
is a virtual class, so instances cannot be created.
Objects can be created by calls of the form new("mzRpwiz",
...)
, but more often they will be created with
openMSfile
.
After creating an mzR
object, one can write it into a new file.
mzXML, mzML, mgf formats are supported.
fileName
:Object of class character
storing the
original filename used when the instance was created.
backend
: One of the implemented backens or
NULL
.
.__classVersion__
:Object of class "Versioned"
,
from Biobase.
Class "Versioned"
, directly.
For methods to access raw data (spectra and chromatograms), see
peaks
.
Methods currently implemented for mzR
signature(object = "mzR")
: ...
Methods currently implemented for mzRpwiz
signature(object = "mzRpwiz")
: ...
signature(object = "mzRpwiz")
: ...
signature(object = "mzRpwiz")
: ...
signature(object = "mzRpwiz")
: ...
signature(x = "mzRpwiz")
: ...
signature(object = "mzRpwiz")
: ...
signature(object = "mzRpwiz")
: ...
signature(object = "mzRpwiz")
: ...
signature(object = "mzRpwiz")
: ...
Methods currently implemented for mzRident
signature(object = "mzRident")
: ...
signature(object = "mzRident")
: ...
signature(object = "mzRident")
: ...
signature(object = "mzRident")
: ...
signature(x = "mzRident")
: ...
signature(object = "mzRident")
: ...
signature(object = "mzRident")
: ...
signature(object = "mzRident")
: ...
signature(object = "mzRident")
: ...
signature(object = "mzRident")
: ...
signature(object = "mzRident")
: ...
Steffen Neumann, Laurent Gatto, Qiang Kou
Proteowizard: http://proteowizard.sourceforge.net/
library(msdata) filepath <- system.file("microtofq", package = "msdata") file <- list.files(filepath, pattern="MM14.mzML", full.names=TRUE, recursive = TRUE) mzml <- openMSfile(file) close(mzml) ## using the pwiz backend mzml <- openMSfile(file, backend = "pwiz")
library(msdata) filepath <- system.file("microtofq", package = "msdata") file <- list.files(filepath, pattern="MM14.mzML", full.names=TRUE, recursive = TRUE) mzml <- openMSfile(file) close(mzml) ## using the pwiz backend mzml <- openMSfile(file, backend = "pwiz")
The openMSfile
constructor will create a new format-specifc
mzR
object by loading the 'filename' file. All header information is
loaded as a Rcpp module and made accessible through the pwiz
slot of the resulting object.
The openIDfile
constructor will create a new format-specifc
mzR
object by loading the 'filename' file. All information is
loaded as a Rcpp module. The mzid format is supported throught
pwiz
backend. Only mzIdentML 1.1 is supported.
openMSfile(filename, backend = NULL, verbose = FALSE) isInitialized(object) fileName(object, ...) openIDfile(filename, verbose = FALSE)
openMSfile(filename, backend = NULL, verbose = FALSE) isInitialized(object) fileName(object, ...) openIDfile(filename, verbose = FALSE)
filename |
Path name of the netCDF, mzXML or mzML file to read/write. |
backend |
A |
object |
An instantiated mzR object. |
verbose |
Enable verbose output. |
... |
Additional arguments, currently ignored. |
Steffen Neumann, Laurent Gatto, Qiang Kou
library(msdata) filepath <- system.file("microtofq", package = "msdata") file <- list.files(filepath, pattern="MM14.mzML", full.names=TRUE, recursive = TRUE) mz <- openMSfile(file) fileName(mz) runInfo(mz) close(mz) ## Not run: ## to use another backend mz <- openMSfile(file, backend = "pwiz") mz ## End(Not run) file <- system.file("mzid", "Tandem.mzid.gz", package="msdata") mzid <- openIDfile(file) softwareInfo(mzid) enzymes(mzid)
library(msdata) filepath <- system.file("microtofq", package = "msdata") file <- list.files(filepath, pattern="MM14.mzML", full.names=TRUE, recursive = TRUE) mz <- openMSfile(file) fileName(mz) runInfo(mz) close(mz) ## Not run: ## to use another backend mz <- openMSfile(file, backend = "pwiz") mz ## End(Not run) file <- system.file("mzid", "Tandem.mzid.gz", package="msdata") mzid <- openIDfile(file) softwareInfo(mzid) enzymes(mzid)
mzR
object.
Access the MS raw data. The peaks
, spectra
(can be used
interchangeably) and peaksCount
functions return the (m/z,
intensity) pairs and the number peaks in the
spectrum/spectra. peaks
and spectra
return a single
matrix if scans
is a numeric
of length 1 and a list of
matrices if several scans are asked for or no scans
argument is
provided (i.e all spectra in the oject are retured). peaksCount
will return a numeric of length n
.
The header
function returns a data.frame
containing
seqNum
, acquisitionNum
, msLevel
,
peaksCount
, totIonCurrent
, retentionTime
(in
seconds), basePeakMZ
, basePeakIntensity
,
collisionEnergy
, ionisationEnergy
, lowM
,
highMZ
, precursorScanNum
, precursorMZ
,
precursorCharge
, precursorIntensity
,
mergedScan
, mergedResultScanNum
,
mergedResultStartScanNum
, mergedResultEndScanNum
,
filterString
, spectrumId
, centroided
(logical
whether the data of the spectrum is in centroid mode
or profile mode; only for pwiz backend), injectionTime
(ion
injection time, in milliseconds), ionMobilityDriftTime
(in
milliseconds), isolationWindowTargetMZ
,
isolationWindowLowerOffset
,
isolationWindowUpperOffset
, scanWindowLowerLimit
and
scanWindowUpperLimit
. If multiple scans are queried, a
data.frame
is returned with the scans reported along the
rows. For missing or not defined spectrum variables NA
is reported.
The get3Dmap
function performs a simple resampling between
lowMz
and highMz
with reMz
resolution. A matrix
of dimensions length(scans)
times
seq(lowMz,highMz,resMz)
is returned.
The chromatogram
(chromatograms
) accessors return
chromatograms for the MS file handle. If a single index is provided,
as data.frame
containing the retention time (1st columns) and
intensities (2nd column) is returned. The name of the former is always
time
, while the latter will depend on the run parameters.
If more than 1 or no chromatogram indices are provided, then a list of
chromatograms is returned; either those passed as argument or all of
them. By default, the first (and possibly only) chromatogram is the
total ion count, which can also be accessed with the tic
method.
The nChrom
function returns the number of chromatograms,
including the total ion chromatogram.
The chromatogramHeader
returns (similar to the header
function for spectra) a data.frame
with metadata information
for the individual chromatograms. The data.frame
has the
columns: "chromatogramId"
(the ID of the chromatogram as
specified in the file), "chromatogramIndex"
(the index of the
chromatogram within the file), "polarity"
(the polarity for the
chromatogram, 0 for negative, +1 for positive and -1 for not set),
"precursorIsolationWindowTargetMZ"
(the isolation window m/z of
the precursor), "precursorIsolationWindowLowerOffset"
,
"precursorIsolationWindowUpperOffset"
(lower and upper offset
for the isolation window m/z), "precursorCollisionEnergy"
(collision energy),
"productIsolationWindowTargetMZ"
,
"productIsolationWindowLowerOffset"
and
"productIsolationWindowUpperOffset"
(definition of the m/z
isolation window for the product).
Note that access to chromatograms is only supported in the pwiz
backend.
header(object, scans, ...) peaksCount(object, scans, ...) ## S4 method for signature 'mzRpwiz' peaks(object, scans) ## S4 method for signature 'mzRnetCDF' peaks(object, scans) ## S4 method for signature 'mzRpwiz' spectra(object, scans) ## same as peaks ## S4 method for signature 'mzRnetCDF' spectra(object, scans) get3Dmap(object, scans, lowMz, highMz, resMz, ...) ## S4 method for signature 'mzRpwiz' chromatogram(object, chrom) ## S4 method for signature 'mzRpwiz' chromatograms(object, chrom) ## same as chromatogram ## S4 method for signature 'mzRpwiz' chromatogramHeader(object, chrom) tic(object, ...) nChrom(object)
header(object, scans, ...) peaksCount(object, scans, ...) ## S4 method for signature 'mzRpwiz' peaks(object, scans) ## S4 method for signature 'mzRnetCDF' peaks(object, scans) ## S4 method for signature 'mzRpwiz' spectra(object, scans) ## same as peaks ## S4 method for signature 'mzRnetCDF' spectra(object, scans) get3Dmap(object, scans, lowMz, highMz, resMz, ...) ## S4 method for signature 'mzRpwiz' chromatogram(object, chrom) ## S4 method for signature 'mzRpwiz' chromatograms(object, chrom) ## same as chromatogram ## S4 method for signature 'mzRpwiz' chromatogramHeader(object, chrom) tic(object, ...) nChrom(object)
object |
An instantiated |
scans |
A |
lowMz , highMz
|
|
resMz |
a |
chrom |
For |
... |
Other arguments. A |
The column acquisitionNum
in the data.frame
returned by
the header
method contains the index during the scan in which
the signal from the spectrum was measured. The pwiz
backend
extracts this number from the spectrum's ID provided in the mzML
file. In contrast, column seqNum
contains the index of each
spectrum within the file and is thus consecutively numbered. Spectra
from files with multiple MS levels are linked to each other via
their acquisitionNum
: column precursorScanNum
of an e.g. MS
level 2 spectrum contains the acquisitionNum
of the related MS
level 1 spectrum.
Spectrum identifiers are only specified in mzML files, thus,
for all other file types the column "spectrumId"
of the result
data.frame
returned by header
contains "scan="
followed by the acquisition number of the spectrum. Also, only the
pwiz
backend supports extraction of the spectras' IDs from
mzML files. Thus, only mzML files read with
backend = "pwiz"
provide the spectrum IDs defined in the file.
The content of the spectrum identifier depends on the vendor and the
instrument acquisition settings and is reported here as a character,
in its raw form, without further parsing.
Steffen Neumann and Laurent Gatto
instrumentInfo
for metadata access and the
"mzR"
class.
writeMSData
and copyWriteMSData
for
functions to write MS data in mzML or mzXML
format.
library(msdata) filepath <- system.file("microtofq", package = "msdata") file <- list.files(filepath, pattern="MM14.mzML", full.names=TRUE, recursive = TRUE) mz <- openMSfile(file) runInfo(mz) colnames(header(mz)) close(mz) ## A shotgun LCMSMS experiment f <- proteomics(full.names = TRUE, pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz") x <- openMSfile(f, backend = "pwiz") x nChrom(x) head(tic(x)) head(chromatogram(x, 1L)) ## same as tic(x) str(chromatogram(x)) ## as a list p <- peaks(x) ## extract all peak information head(peaks(x, scan=4)) ## extract just peaks from the 4th scan ## An MRM experiment f <- proteomics(full.names = TRUE, pattern = "MRM") x <- openMSfile(f, backend = "pwiz") x nChrom(x) head(tic(x)) head(chromatogram(x, 1L)) ## same as tic(x) str(chromatogram(x, 10:12)) ## get the header information for the chromatograms ch <- chromatogramHeader(x) head(ch)
library(msdata) filepath <- system.file("microtofq", package = "msdata") file <- list.files(filepath, pattern="MM14.mzML", full.names=TRUE, recursive = TRUE) mz <- openMSfile(file) runInfo(mz) colnames(header(mz)) close(mz) ## A shotgun LCMSMS experiment f <- proteomics(full.names = TRUE, pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz") x <- openMSfile(f, backend = "pwiz") x nChrom(x) head(tic(x)) head(chromatogram(x, 1L)) ## same as tic(x) str(chromatogram(x)) ## as a list p <- peaks(x) ## extract all peak information head(peaks(x, scan=4)) ## extract just peaks from the 4th scan ## An MRM experiment f <- proteomics(full.names = TRUE, pattern = "MRM") x <- openMSfile(f, backend = "pwiz") x nChrom(x) head(tic(x)) head(chromatogram(x, 1L)) ## same as tic(x) str(chromatogram(x, 10:12)) ## get the header information for the chromatograms ch <- chromatogramHeader(x) head(ch)
Get the version number of pwiz backend.
pwiz.version()
pwiz.version()
writeMSData
exports the MS spectrum data provided with
parameters header
and data
to an MS file in mzML or
mzXML format.
## S4 method for signature 'list,character' writeMSData(object, file, header, backend = "pwiz", outformat = "mzml", rtime_seconds = TRUE, software_processing)
## S4 method for signature 'list,character' writeMSData(object, file, header, backend = "pwiz", outformat = "mzml", rtime_seconds = TRUE, software_processing)
object |
|
file |
|
header |
|
backend |
|
outformat |
|
rtime_seconds |
|
software_processing |
|
Johannes Rainer
copyWriteMSData
for a function to copy general
information from a MS data file and writing eventually modified MS
data from that originating file.
## Open a MS file and read the spectrum and header information library(msdata) fl <- system.file("threonine", "threonine_i2_e35_pH_tree.mzXML", package = "msdata") ms_fl <- openMSfile(fl, backend = "pwiz") ## Get the spectra pks <- spectra(ms_fl) ## Get the header hdr <- header(ms_fl) ## Modify the spectrum data adding 100 to each intensity. pks <- lapply(pks, function(z) { z[, 2] <- z[, 2] + 100 z }) ## Write the data to a mzML file. out_file <- tempfile() writeMSData(object = pks, file = out_file, header = hdr)
## Open a MS file and read the spectrum and header information library(msdata) fl <- system.file("threonine", "threonine_i2_e35_pH_tree.mzXML", package = "msdata") ms_fl <- openMSfile(fl, backend = "pwiz") ## Get the spectra pks <- spectra(ms_fl) ## Get the header hdr <- header(ms_fl) ## Modify the spectrum data adding 100 to each intensity. pks <- lapply(pks, function(z) { z[, 2] <- z[, 2] + 100 z }) ## Write the data to a mzML file. out_file <- tempfile() writeMSData(object = pks, file = out_file, header = hdr)