Pipeline Guide¶

The Pipeline class chains spectral processing steps into a reusable, reproducible workflow. Steps are executed in order, and each step's output feeds into the next.

Basic Usage¶

from spectrakit import baseline_als, normalize_snv, smooth_savgol
from spectrakit.pipeline import Pipeline

pipe = Pipeline()
pipe.add("smooth", smooth_savgol, window_length=11, polyorder=3)
pipe.add("baseline", baseline_als, lam=1e6, p=0.01)
pipe.add("normalize", normalize_snv)

# Apply to data
processed = pipe.transform(raw_spectra)

Each add() call takes:

name — A descriptive label for logging and display
fn — Any function with signature fn(intensities, **kwargs) -> np.ndarray
**kwargs — Arguments forwarded to the function

Method Chaining¶

add() returns self, so you can chain calls:

pipe = Pipeline()
pipe.add("smooth", smooth_savgol, window_length=11).add(
    "baseline", baseline_als, lam=1e6
).add("normalize", normalize_snv)

Working with Spectrum Objects¶

Use transform_spectrum() to process a Spectrum container directly. It returns a new Spectrum with processed intensities and updated metadata:

from spectrakit.spectrum import Spectrum
from spectrakit.io import read_jcamp

spectrum = read_jcamp("sample.jdx")
processed = pipe.transform_spectrum(spectrum)

# Metadata records which pipeline steps were applied
print(processed.metadata["pipeline_steps"])
# ['smooth', 'baseline', 'normalize']

Custom Functions¶

Any callable matching the expected signature works as a pipeline step:

import numpy as np

def clip_negative(intensities: np.ndarray) -> np.ndarray:
    """Replace negative values with zero."""
    return np.clip(intensities, 0, None)

pipe = Pipeline()
pipe.add("clip", clip_negative)
pipe.add("normalize", normalize_snv)

Logging¶

Pipeline logs each step at the DEBUG level. Enable logging to see step execution:

import logging
logging.basicConfig(level=logging.DEBUG)

pipe.transform(spectra)
# DEBUG:spectrakit.pipeline:Pipeline step: smooth
# DEBUG:spectrakit.pipeline:Pipeline step: baseline
# DEBUG:spectrakit.pipeline:Pipeline step: normalize

Inspecting the Pipeline¶

print(pipe)
# Pipeline(steps=['smooth', 'baseline', 'normalize'])

# Access individual steps
for name, fn, kwargs in pipe.steps:
    print(f"{name}: {fn.__name__}({kwargs})")

Comparison with scikit-learn Pipeline¶

SpectraKit's Pipeline is lightweight and designed for spectral workflows. For ML integration, use the sklearn bridge instead:

Feature	`spectrakit.Pipeline`	`sklearn.Pipeline`
Dependencies	None (built-in)	Requires scikit-learn
Interface	`add()` / `transform()`	`fit()` / `transform()`
Use case	Spectral preprocessing	ML model pipelines
Stateful	No (pure functions)	Yes (fit stores state)

Both can be used together — preprocess with SpectraKit's Pipeline, then feed results into an sklearn Pipeline for modeling.

Next Steps¶

Processing Workflow — recommended step order
scikit-learn Integration — ML pipeline bridge
Pipeline API Reference — full documentation