meganorm.utils package

Submodules

meganorm.utils.EEGlab module

class meganorm.utils.EEGlab.RawEEGLAB(input_fname, eog=(), preload=False, *, uint16_codec=None, montage_units='auto', verbose=None)[source]

Bases: BaseRaw

Raw object from EEGLAB .set file.

Parameters:

input_fname (path-like) – Path to the .set file. If the data is stored in a separate .fdt file, it is expected to be in the same folder as the .set file.
eog (list | tuple | 'auto') – Names or indices of channels that should be designated EOG channels. If ‘auto’, the channel names containing EOG or EYE are used. Defaults to empty tuple.
preload (bool or str (default False)) – Preload data into memory for data manipulation and faster indexing. If True, the data will be preloaded into memory (fast, requires large amount of memory). If preload is a string, preload is the file name of a memory-mapped file which is used to store the data on the hard drive (slower, requires less memory). Note that preload=False will be effective only if the data is stored in a separate binary file.
uint16_codec (str | None) – If your set file contains non-ascii characters, sometimes reading it may fail and give rise to error message stating that “buffer is too small”. uint16_codec allows to specify what codec (for example: ‘latin1’ or ‘utf-8’) should be used when reading character arrays and can therefore help you solve this problem.
montage_units (str) –
Units that channel positions are represented in. Defaults to “mm” (millimeters), but can be any prefix + “m” combination (including just “m” for meters).

Added in version 1.3.
verbose (bool | str | int | None) – Control verbosity of the logging output. If None, use the default verbosity level. See the logging documentation and mne.verbose() for details. Should only be passed as a keyword argument.

meganorm.utils.IO module

class meganorm.utils.IO.BandRatio(*, numerator, denominator)[source]

Bases: BaseModel

Attributes:

model_extra: Get extra fields set during validation.
model_fields_set: Returns the set of fields that have been explicitly set on this model instance.

Parameters:

numerator (Literal['Delta', 'Theta', 'Alpha', 'Beta', 'Gamma'])
denominator (Literal['Delta', 'Theta', 'Alpha', 'Beta', 'Gamma'])

Methods

`copy`(*[, include, exclude, update, deep])	Returns a copy of the model.
`model_construct`([_fields_set])	Creates a new instance of the Model class with validated data.
`model_copy`(*[, update, deep])	!!! abstract "Usage Documentation"
`model_dump`(*[, mode, include, exclude, ...])	!!! abstract "Usage Documentation"
`model_dump_json`(*[, indent, ensure_ascii, ...])	!!! abstract "Usage Documentation"
`model_json_schema`([by_alias, ref_template, ...])	Generates a JSON schema for a model class.
`model_parametrized_name`(params)	Compute the class name for parametrizations of generic classes.
`model_post_init`(context, /)	Override this method to perform additional initialization after __init__ and model_construct.
`model_rebuild`(*[, force, raise_errors, ...])	Try to rebuild the pydantic-core schema for the model.
`model_validate`(obj, *[, strict, extra, ...])	Validate a pydantic model instance.
`model_validate_json`(json_data, *[, strict, ...])	!!! abstract "Usage Documentation"
`model_validate_strings`(obj, *[, strict, ...])	Validate the given object with string data against the Pydantic model.

construct
dict
from_orm
json
parse_file
parse_obj
parse_raw
schema
schema_json
update_forward_refs
validate

denominator: Literal['Delta', 'Theta', 'Alpha', 'Beta', 'Gamma']

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

numerator: Literal['Delta', 'Theta', 'Alpha', 'Beta', 'Gamma']

class meganorm.utils.IO.Config(*, which_meg_session=0, which_layout='all', which_sensor='meg', drop_noisy_flat_channel=True, apply_ica_elbow_detection=False, ica_n_component=None, ica_max_iter=800, ica_method='fastica', cutoffFreqLow=1.0, cutoffFreqHigh=80, resampling_rate=1000, digital_filter=True, notch_filter=True, apply_oversampled_temporal_projection=True, apply_Head_movement_correction=True, Head_movement_limit_from_mean=0.0015, apply_chpi_filter=False, apply_gedai=True, gedai_method='both', sensai_method='optimize', gedai_duration=12, gedai_overlap=0.5, gedai_preliminary_broadband_noise_multiplier=6.0, gedai_noise_multiplier=3.0, gedai_wavelet_type='haar', gedai_wavelet_level='auto', gedai_wavelet_low_cutoff=None, gedai_epoch_size_in_cycles=12, gedai_highpass_cutoff=0.1, muscle_activity_thr=4, muscle_activity_min_length_good=0.1, muscle_activity_filter_freq=(110, 140), apply_environmental_noise_correction=True, same_environmental_noise_removal=False, ctf_gradient_comp_level=3, apply_environmental_noise_ssp_with_eroom=False, apply_environmental_noise_ica_with_ref_meg=True, environmental_noise_ica_with_ref_meg_thr=2.5, ica_if_reject_by_annotation=True, environmental_noise_ica_with_ref_meg_method='separate', environmental_noise_ica_with_ref_meg_measure='zscore', apply_ica=True, auto_ica_corr_thr=0.5, rereference_method='average', bad_segment_removal_method='autoreject', mag_var_threshold=5e-12, grad_var_threshold=5e-10, eeg_var_threshold=4e-05, mag_flat_threshold=1e-14, grad_flat_threshold=1e-12, eeg_flat_threshold=4e-05, zscore_std_thresh=15, segments_tmin=20, segments_tmax=-20, segments_length=10, segments_overlap=2, autoreject_n_interpolates=[1, 4, 8, 16, 32], autoreject_consensus_percs=[np.float64(0.0), np.float64(0.1), np.float64(0.2), np.float64(0.30000000000000004), np.float64(0.4), np.float64(0.5), np.float64(0.6000000000000001), np.float64(0.7000000000000001), np.float64(0.8), np.float64(0.9), np.float64(1.0)], autoreject_cv='auto', autoreject_thresh_method='bayesian_optimization', apply_source_localization=False, apply_empty_room_recording=True, apply_mri_QC=False, apply_mri_template=False, freesurfer_template_path=None, freesurfer_home=None, freesurfer_license=None, coregisteration_scale_mode=None, make_new_watershed_bem=False, gcaatlas=True, SL_source_space='volumetric', SL_conductivity=(0.3,), SL_inverse_operator='lcmv', source_space_spacing='ico4', source_space_spacing_number=4, coregisteration_final_n_iterations=20, coregisteration_final_nasion_weight=10.0, covariance_method='empirical', beamformer_pick_ori='max-power', beamformer_weight_norm='unit-noise-gain', beamforme_depth=0.08, inverse_regularization_value=0.05, apply_morphing=False, parcellation_parc='aparc.a2009s', parcellation_annot_fname=None, psd_method='welch', psd_n_overlap=1, psd_n_fft=2, psd_n_per_seg=2, parametrization_method='irasa', irasa_hset=(1.05, 2.0, 0.05), fooof_freq_range_low=3, fooof_freq_range_high=40, aperiodic_mode='knee', fooof_peak_width_limits=[1.0, 12.0], fooof_min_peak_height=0, fooof_peak_threshold=2, save_source_localized_epochs=False, save_psds=False, freq_bands={'Alpha': (8, 13), 'Beta': (13, 30), 'Gamma': (30, 40), 'Theta': (3, 8)}, individualized_band_ranges={'Alpha': (-2, 3), 'Beta': (-8, 9), 'Gamma': (-5, 5), 'Theta': (-2, 3)}, power_band_ratios_list=[BandRatio(numerator='Theta', denominator='Beta'), BandRatio(numerator='Theta', denominator='Alpha'), BandRatio(numerator='Alpha', denominator='Beta'), BandRatio(numerator='Delta', denominator='Beta'), BandRatio(numerator='Delta', denominator='Alpha'), BandRatio(numerator='Delta', denominator='Theta'), BandRatio(numerator='Beta', denominator='Gamma'), BandRatio(numerator='Alpha', denominator='Gamma'), BandRatio(numerator='Theta', denominator='Gamma'), BandRatio(numerator='Delta', denominator='Gamma')], min_r_squared=0.9, feature_categories={'Adjusted_Band_Ratio': True, 'Adjusted_Canonical_Absolute_Power': True, 'Adjusted_Canonical_Relative_Power': True, 'Adjusted_Individualized_Absolute_Power': False, 'Adjusted_Individualized_Relative_Power': False, 'Exponent': True, 'Hemispheric_Asymmetry_index': True, 'Offset': True, 'OriginalPSD_Band_Ratio': True, 'OriginalPSD_Canonical_Absolute_Power': True, 'OriginalPSD_Canonical_Relative_Power': True, 'OriginalPSD_Individualized_Absolute_Power': False, 'OriginalPSD_Individualized_Relative_Power': False, 'Peak_Center': False, 'Peak_Power': False, 'Peak_Width': False}, fooof_res_save_path=None, random_state=42)[source]

Bases: BaseModel

Configuration for preprocessing, artifact correction, source localization, and spectral feature extraction of neurophysiological signals (MEG/EEG/OPM).

Parameters:

which_meg_session (int, default=0) – Index of the MEG session to process.
which_layout ({"all", "lobe", None}, default="all") – Sensor layout grouping used for reporting/analysis.
which_sensor ({"mag", "grad", "meg", "eeg", "opm"}, default="meg") – Sensor type to process.
drop_noisy_flat_channel (bool, default=True) – Drop channels flagged as noisy or flat before further processing.
Resampling (Filtering &)
-----------------------
cutoffFreqLow (float, PositiveInt, default=1.0, 80) – Bandpass filter cutoff frequencies (Hz).
cutoffFreqHigh (float, PositiveInt, default=1.0, 80) – Bandpass filter cutoff frequencies (Hz).
resampling_rate (PositiveInt, default=1000) – Target sampling rate (Hz).
digital_filter (bool, default=True) – Apply bandpass / line-noise notch filtering.
notch_filter (bool, default=True) – Apply bandpass / line-noise notch filtering.
apply_oversampled_temporal_projection (bool, default=True) – Apply oversampled temporal projection (OTP) denoising.
apply_Head_movement_correction (bool, default=True) – Correct for head movement during recording.
Head_movement_limit_from_mean (float, default=0.0015) – Maximum allowed deviation from mean head position (m).
apply_chpi_filter (bool, default=False) – Filter out cHPI coil signals.
ICA
---
apply_ica (bool, default=True) – Apply ICA-based artifact correction.
apply_ica_elbow_detection (bool, default=False) – Automatically select the number of ICA components via elbow detection.
ica_n_component (PositiveInt or None, default=None) – Number of ICA components to compute.
ica_max_iter (PositiveInt, default=800) – Maximum ICA iterations.
ica_method ({"fastica", "infomax", "picard"}, default="fastica") – ICA algorithm.
ica_if_reject_by_annotation (bool, default=True) – Exclude annotated bad segments when fitting ICA.
auto_ica_corr_thr (float, default=0.5) – Correlation threshold (0-1) for automatic ICA component rejection.
Removal (GEDAI Artifact)
-----------------------
apply_gedai (bool, default=True) – Apply GEDAI-based artifact removal.
gedai_method ({"both", "spectral", "broadband"}, default="both") – GEDAI denoising strategy.
sensai_method ({"optimize", "gridsearch"}, default="optimize") – Parameter search strategy for SensAI.
gedai_duration (float or int, default=12, 0.5) – Window duration (s) and overlap fraction for GEDAI.
gedai_overlap (float or int, default=12, 0.5) – Window duration (s) and overlap fraction for GEDAI.
gedai_preliminary_broadband_noise_multiplier (float, default=6.0) – Noise multiplier for preliminary broadband detection.
gedai_noise_multiplier (float, default=3.0) – Noise multiplier used in GEDAI thresholding.
gedai_wavelet_type (str, default="haar") – Wavelet family used for spectral GEDAI.
gedai_wavelet_level ("auto", PositiveInt, or 0, default="auto") – Wavelet decomposition level.
gedai_wavelet_low_cutoff (float or None, default=None) – Low-frequency cutoff for wavelet-based denoising.
gedai_epoch_size_in_cycles (PositiveInt, default=12) – Epoch size expressed in number of cycles.
gedai_highpass_cutoff (float, default=0.1) – High-pass cutoff applied before GEDAI (Hz).
Detection (Muscle Artifact)
---------------------------
muscle_activity_thr (int, default=4) – Detection threshold for muscle artifacts.
muscle_activity_min_length_good (float, default=0.1) – Minimum length of a clean segment retained after removal (s).
muscle_activity_filter_freq (tuple[int, int], default=(110, 140)) – Frequency band used for muscle artifact detection (Hz).
Correction (Environmental Noise)
--------------------------------
apply_environmental_noise_correction (bool, default=True) – Apply environmental/reference-based noise correction.
ctf_gradient_comp_level (PositiveInt, default=3) – CTF gradient compensation level.
apply_environmental_noise_ssp_with_eroom (bool, default=False) – Use empty-room SSP projectors for noise correction.
apply_environmental_noise_ica_with_ref_meg (bool, default=True) – Use reference-MEG-guided ICA for environmental noise removal.
environmental_noise_ica_with_ref_meg_thr (float, default=2.5) – Threshold for ref-MEG-guided ICA component rejection.
environmental_noise_ica_with_ref_meg_method ({"together", "separate"}, default="separate") – Whether to process reference channels jointly or separately.
environmental_noise_ica_with_ref_meg_measure ({"zscore", "correlation"}, default="zscore") – Metric used to score ICA components against reference channels.
Rejection (EEG Reference & Bad-Segment)
----------------------------------------
rereference_method ({"average", "REST", None}, default="average") – EEG re-referencing scheme.
bad_segment_removal_method ({"autoreject", "fixed_thr", None}, default="autoreject") – Method for rejecting bad data segments.
mag_var_threshold (float) – Variance-based rejection thresholds per channel type.
grad_var_threshold (float) – Variance-based rejection thresholds per channel type.
eeg_var_threshold (float) – Variance-based rejection thresholds per channel type.
mag_flat_threshold (float) – Flatline-detection thresholds per channel type.
grad_flat_threshold (float) – Flatline-detection thresholds per channel type.
eeg_flat_threshold (float) – Flatline-detection thresholds per channel type.
zscore_std_thresh (PositiveInt, default=15) – Z-score threshold for outlier rejection.
autoreject_n_interpolates (list[int], default=[1, 4, 8, 16, 32]) – Candidate interpolation counts for Autoreject.
autoreject_consensus_percs (list[float]) – Candidate consensus percentages for Autoreject (11 values, 0-1).
autoreject_cv (int or "auto", default="auto") – Cross-validation folds for Autoreject.
autoreject_thresh_method ({"bayesian_optimization", "random_search"}, default="bayesian_optimization") – Threshold search strategy for Autoreject.
Segmentation
------------
segments_tmin (PositiveInt, NegativeInt, default=20, -20) – Segment start/end times relative to event (s).
segments_tmax (PositiveInt, NegativeInt, default=20, -20) – Segment start/end times relative to event (s).
segments_length (int, default=10, 2) – Segment length and overlap (s).
segments_overlap (int, default=10, 2) – Segment length and overlap (s).
Localization (Source)
--------------------
apply_source_localization (bool, default=False) – Whether to perform source localization.
apply_empty_room_recording (bool, default=True) – Use empty-room recordings for noise covariance estimation.
apply_mri_QC (bool, default=False) – Run quality control on MRI/FreeSurfer output.
apply_mri_template (bool, default=False) – Use a template MRI instead of subject-specific anatomy.
freesurfer_template_path (str or None) – Paths to FreeSurfer template derivatives, installation, and license.
freesurfer_home (str or None) – Paths to FreeSurfer template derivatives, installation, and license.
freesurfer_license (str or None) – Paths to FreeSurfer template derivatives, installation, and license.
make_new_watershed_bem (bool, default=False) – Recompute the watershed BEM surfaces.
gcaatlas (bool, default=True) – Use the GCA atlas for subcortical segmentation.
SL_source_space ({"surface", "volumetric"}, default="volumetric") – Type of source space.
SL_conductivity (tuple[float, ...], default=(0.3,)) – Head-model layer conductivities (three values required for EEG).
SL_inverse_operator ({"lcmv"}, default="lcmv") – Inverse operator method.
source_space_spacing ({"ico3"..."ico6", "oct5", "oct6"}, default="ico4") – Source space resolution.
source_space_spacing_number ({3, 4, 5, 6}, default=4) – Numeric resolution; must match source_space_spacing.
coregisteration_final_n_iterations (int, default=20) – Iterations for the final coregistration refinement.
coregisteration_final_nasion_weight (float, default=10.0) – Weight applied to the nasion fiducial during coregistration.
covariance_method (str, default="empirical") – Method for noise covariance estimation.
Parcellation (Beamformer &)
----------------------------
beamformer_pick_ori ({None, "normal", "max-power", "vector"}, default="max-power") – Source orientation constraint.
beamformer_weight_norm ({None, "unit-noise-gain", "nai", "unit-noise-gain-invariant"}, default="unit-noise-gain") – Beamformer weight normalization.
beamforme_depth (float, default=0.08) – Depth-weighting factor correcting for center-of-head bias.
inverse_regularization_value (float, default=0.05) – Regularization applied to the data covariance matrix.
apply_morphing (bool, default=False) – Morph source estimates to a common template brain.
parcellation_parc ({None, "aparc.a2009s", "parac"}, default="aparc.a2009s") – Predefined cortical parcellation.
parcellation_annot_fname (Path or None) – Custom parcellation (.annot) file, used if parcellation_parc is None.
Parametrization (PSD & Spectral)
---------------------------------
psd_method ({"multitaper", "welch"}, default="welch") – PSD estimation method.
psd_n_overlap (PositiveInt, default=1, 2, 2) – Welch/multitaper PSD parameters.
psd_n_fft (PositiveInt, default=1, 2, 2) – Welch/multitaper PSD parameters.
psd_n_per_seg (PositiveInt, default=1, 2, 2) – Welch/multitaper PSD parameters.
parametrization_method ({"fooof", "irasa"}, default="irasa") – Method for separating aperiodic and periodic spectral components.
irasa_hset (tuple[float, float, float], default=(1.05, 2.0, 0.05)) – Resampling factor range/step for IRASA.
fooof_freq_range_low (PositiveInt, default=3, 40) – Frequency range for FOOOF fitting (Hz).
fooof_freq_range_high (PositiveInt, default=3, 40) – Frequency range for FOOOF fitting (Hz).
aperiodic_mode ({"knee", "fixed"}, default="knee") – Aperiodic component model.
fooof_peak_width_limits (list[float], default=[1.0, 12.0]) – Allowed peak width range (Hz).
fooof_min_peak_height (int, default=0) – Minimum peak height for detection.
fooof_peak_threshold (PositiveInt, default=2) – Peak detection threshold (in SD of the flattened spectrum).
fooof_res_save_path (str or None) – Path to save FOOOF results.
save_source_localized_epochs (bool, default=False) – Persist intermediate source-localized epochs / PSDs to disk.
save_psds (bool, default=False) – Persist intermediate source-localized epochs / PSDs to disk.
Extraction (Feature)
-------------------
freq_bands (dict[str, tuple[int, int]]) – Canonical frequency band definitions (Theta, Alpha, Beta, Gamma).
individualized_band_ranges (dict[str, tuple[int, int]]) – Per-band offsets (Hz) used to individualize canonical bands.
power_band_ratios_list (list[BandRatio]) – Band-power ratios to compute (e.g. Theta/Beta).
min_r_squared (float, default=0.9) – Minimum R² required to accept a spectral model fit.
feature_categories (dict[str, bool]) – Flags selecting which feature families to extract (offset, exponent, peak parameters, canonical/individualized band power, band ratios, hemispheric asymmetry).
Miscellaneous
-------------
random_state (int, default=42) – Random seed for reproducibility.
same_environmental_noise_removal (bool)
coregisteration_scale_mode (Literal['uniform', '3-axis', None])

save(save_path, overwrite=False)[source]

Serialize the configuration to a JSON file.

Parameters:: save_path (str)

load(path)[source]

Load a configuration from a JSON file.

Parameters:: path (str)

Notes

Model validators enforce cross-field consistency, e.g.: a three-layer SL_conductivity is required for EEG source localization; beamformer_pick_ori == “vector” requires beamformer_weight_norm == “unit-noise-gain-invariant”; source_space_spacing must match source_space_spacing_number; GEDAI parameters must be consistent with the chosen gedai_method; and MRI template use is mutually exclusive with MRI QC.

Attributes:

model_extra: Get extra fields set during validation.
model_fields_set: Returns the set of fields that have been explicitly set on this model instance.

Parameters:

which_meg_session (int)
which_layout (Literal['all', 'lobe', None])
which_sensor (Literal['mag', 'grad', 'meg', 'eeg', 'opm'])
drop_noisy_flat_channel (bool)
apply_ica_elbow_detection (bool)
ica_n_component (Annotated[int, Gt(gt=0)] | None)
ica_max_iter (Annotated[int, Gt(gt=0)])
ica_method (Literal['fastica', 'infomax', 'picard'])
cutoffFreqLow (float)
cutoffFreqHigh (Annotated[int, Gt(gt=0)])
resampling_rate (Annotated[int, Gt(gt=0)])
digital_filter (bool)
notch_filter (bool)
apply_oversampled_temporal_projection (bool)
apply_Head_movement_correction (bool)
Head_movement_limit_from_mean (float)
apply_chpi_filter (bool)
apply_gedai (bool)
gedai_method (Literal['both', 'spectral', 'broadband'])
sensai_method (Literal['optimize', 'gridsearch'])
gedai_duration (float | int)
gedai_overlap (float | int)
gedai_preliminary_broadband_noise_multiplier (float)
gedai_noise_multiplier (float)
gedai_wavelet_type (str)
gedai_wavelet_level (Literal['auto'] | ~typing.Annotated[int, ~annotated_types.Gt(gt=0)] | ~typing.Literal[0])
gedai_wavelet_low_cutoff (None | float)
gedai_epoch_size_in_cycles (Annotated[int, Gt(gt=0)])
gedai_highpass_cutoff (float)
muscle_activity_thr (int)
muscle_activity_min_length_good (float)
muscle_activity_filter_freq (Tuple[int, int])
apply_environmental_noise_correction (bool)
same_environmental_noise_removal (bool)
ctf_gradient_comp_level (Annotated[int, Gt(gt=0)])
apply_environmental_noise_ssp_with_eroom (bool)
apply_environmental_noise_ica_with_ref_meg (bool)
environmental_noise_ica_with_ref_meg_thr (float)
ica_if_reject_by_annotation (bool)
environmental_noise_ica_with_ref_meg_method (Literal['together', 'separate'])
environmental_noise_ica_with_ref_meg_measure (Literal['zscore', 'correlation'])
apply_ica (bool)
auto_ica_corr_thr (Annotated[float, None, Interval(gt=None, ge=0, lt=None, le=1), None, None])
rereference_method (Literal['average', 'REST', 'None'])
bad_segment_removal_method (Literal['autoreject', 'fixed_thr', None])
mag_var_threshold (float)
grad_var_threshold (float)
eeg_var_threshold (float)
mag_flat_threshold (float)
grad_flat_threshold (float)
eeg_flat_threshold (float)
zscore_std_thresh (Annotated[int, Gt(gt=0)])
segments_tmin (Annotated[int, Gt(gt=0)])
segments_tmax (Annotated[int, Lt(lt=0)])
segments_length (Annotated[int, Gt(gt=0)])
segments_overlap (int)
autoreject_n_interpolates (List[int])
autoreject_consensus_percs (List[float])
autoreject_cv (int | Literal['auto'])
autoreject_thresh_method (Literal['bayesian_optimization', 'random_search'])
apply_source_localization (bool)
apply_empty_room_recording (bool)
apply_mri_QC (bool)
apply_mri_template (bool)
freesurfer_template_path (str | None)
freesurfer_home (str | None)
freesurfer_license (str | None)
coregisteration_scale_mode (Literal['uniform', '3-axis', None])
make_new_watershed_bem (bool)
gcaatlas (bool)
SL_source_space (Literal['surface', 'volumetric'])
SL_conductivity (Tuple[float, ...])
SL_inverse_operator (Literal['lcmv'])
source_space_spacing (Literal['ico3', 'ico4', 'ico5', 'ico6', 'oct5', 'oct6'])
source_space_spacing_number (Literal[3, 4, 5, 6])
coregisteration_final_n_iterations (int)
coregisteration_final_nasion_weight (float)
covariance_method (str)
beamformer_pick_ori (Literal[None, 'normal', 'max-power', 'vector'])
beamformer_weight_norm (Literal[None, 'unit-noise-gain', 'nai', 'unit-noise-gain-invariant'])
beamforme_depth (Annotated[float, None, Interval(gt=None, ge=0, lt=None, le=1), None, None])
inverse_regularization_value (Annotated[float, None, Interval(gt=None, ge=0, lt=None, le=1), None, None])
apply_morphing (bool)
parcellation_parc (Literal[None, 'aparc.a2009s', 'parac'])
parcellation_annot_fname (Path | None)
psd_method (Literal['multitaper', 'welch'])
psd_n_overlap (Annotated[int, Gt(gt=0)])
psd_n_fft (Annotated[int, Gt(gt=0)])
psd_n_per_seg (Annotated[int, Gt(gt=0)])
parametrization_method (Literal['fooof', 'irasa'])
irasa_hset (Tuple[float, float, float])
fooof_freq_range_low (Annotated[int, Gt(gt=0)])
fooof_freq_range_high (Annotated[int, Gt(gt=0)])
aperiodic_mode (Literal['knee', 'fixed'])
fooof_peak_width_limits (List[float])
fooof_min_peak_height (int)
fooof_peak_threshold (Annotated[int, Gt(gt=0)])
save_source_localized_epochs (bool)
save_psds (bool)
freq_bands (Dict[str, Tuple[int, int]])
individualized_band_ranges (Dict[str, Tuple[int, int]])
power_band_ratios_list (List[BandRatio])
min_r_squared (Annotated[float, None, Interval(gt=None, ge=0, lt=None, le=1), None, None])
feature_categories (Dict[str, bool])
fooof_res_save_path (str | None)
random_state (int)

Methods

`copy`(*[, include, exclude, update, deep])	Returns a copy of the model.
`model_construct`([_fields_set])	Creates a new instance of the Model class with validated data.
`model_copy`(*[, update, deep])	!!! abstract "Usage Documentation"
`model_dump`(*[, mode, include, exclude, ...])	!!! abstract "Usage Documentation"
`model_dump_json`(*[, indent, ensure_ascii, ...])	!!! abstract "Usage Documentation"
`model_json_schema`([by_alias, ref_template, ...])	Generates a JSON schema for a model class.
`model_parametrized_name`(params)	Compute the class name for parametrizations of generic classes.
`model_post_init`(context, /)	Override this method to perform additional initialization after __init__ and model_construct.
`model_rebuild`(*[, force, raise_errors, ...])	Try to rebuild the pydantic-core schema for the model.
`model_validate`(obj, *[, strict, extra, ...])	Validate a pydantic model instance.
`model_validate_json`(json_data, *[, strict, ...])	!!! abstract "Usage Documentation"
`model_validate_strings`(obj, *[, strict, ...])	Validate the given object with string data against the Pydantic model.
`save`(save_path[, overwrite])	save the configurations to a JSON file

SL_conductivity_mv
beamformer_arg_check
center_head_bias_scale_check
construct
dict
either_template_mri_or_mri_qc
env_noise_removal_same
from_orm
gedai_params_check
ica_e_noise_removal
json
load
mri_template_check
muscle_activity_filter_freq_fv
muscle_activity_thr_fv
pacellation_checker
parse_file
parse_obj
parse_raw
schema
schema_json
source_space_res
update_forward_refs
validate

Head_movement_limit_from_mean: float

SL_conductivity: Tuple[float, ...]

SL_conductivity_mv()[source]

SL_inverse_operator: Literal['lcmv']

SL_source_space: Literal['surface', 'volumetric']

aperiodic_mode: Literal['knee', 'fixed']

apply_Head_movement_correction: bool

apply_chpi_filter: bool

apply_empty_room_recording: bool

apply_environmental_noise_correction: bool

apply_environmental_noise_ica_with_ref_meg: bool

apply_environmental_noise_ssp_with_eroom: bool

apply_gedai: bool

apply_ica: bool

apply_ica_elbow_detection: bool

apply_morphing: bool

apply_mri_QC: bool

apply_mri_template: bool

apply_oversampled_temporal_projection: bool

apply_source_localization: bool

auto_ica_corr_thr: Annotated[float, None, Interval(gt=None, ge=0, lt=None, le=1), None, None]

autoreject_consensus_percs: List[float]

autoreject_cv: int | Literal['auto']

autoreject_n_interpolates: List[int]

autoreject_thresh_method: Literal['bayesian_optimization', 'random_search']

bad_segment_removal_method: Literal['autoreject', 'fixed_thr', None]

beamforme_depth: Annotated[float, None, Interval(gt=None, ge=0, lt=None, le=1), None, None]

beamformer_arg_check()[source]

beamformer_pick_ori: Literal[None, 'normal', 'max-power', 'vector']

beamformer_weight_norm: Literal[None, 'unit-noise-gain', 'nai', 'unit-noise-gain-invariant']

center_head_bias_scale_check()[source]

coregisteration_final_n_iterations: int

coregisteration_final_nasion_weight: float

coregisteration_scale_mode: Literal['uniform', '3-axis', None]

covariance_method: str

ctf_gradient_comp_level: Annotated[int, Gt(gt=0)]

cutoffFreqHigh: Annotated[int, Gt(gt=0)]

cutoffFreqLow: float

digital_filter: bool

drop_noisy_flat_channel: bool

eeg_flat_threshold: float

eeg_var_threshold: float

either_template_mri_or_mri_qc()[source]

env_noise_removal_same()[source]

environmental_noise_ica_with_ref_meg_measure: Literal['zscore', 'correlation']

environmental_noise_ica_with_ref_meg_method: Literal['together', 'separate']

environmental_noise_ica_with_ref_meg_thr: float

feature_categories: Dict[str, bool]

fooof_freq_range_high: Annotated[int, Gt(gt=0)]

fooof_freq_range_low: Annotated[int, Gt(gt=0)]

fooof_min_peak_height: int

fooof_peak_threshold: Annotated[int, Gt(gt=0)]

fooof_peak_width_limits: List[float]

fooof_res_save_path: str | None

freesurfer_home: str | None

freesurfer_license: str | None

freesurfer_template_path: str | None

freq_bands: Dict[str, Tuple[int, int]]

gcaatlas: bool

gedai_duration: float | int

gedai_epoch_size_in_cycles: Annotated[int, Gt(gt=0)]

gedai_highpass_cutoff: float

gedai_method: Literal['both', 'spectral', 'broadband']

gedai_noise_multiplier: float

gedai_overlap: float | int

gedai_params_check()[source]

gedai_preliminary_broadband_noise_multiplier: float

gedai_wavelet_level: Literal['auto'] | Annotated[int, Gt(gt=0)] | Literal[0]

gedai_wavelet_low_cutoff: None | float

gedai_wavelet_type: str

grad_flat_threshold: float

grad_var_threshold: float

ica_e_noise_removal()[source]

ica_if_reject_by_annotation: bool

ica_max_iter: Annotated[int, Gt(gt=0)]

ica_method: Literal['fastica', 'infomax', 'picard']

ica_n_component: Annotated[int, Gt(gt=0)] | None

individualized_band_ranges: Dict[str, Tuple[int, int]]

inverse_regularization_value: Annotated[float, None, Interval(gt=None, ge=0, lt=None, le=1), None, None]

irasa_hset: Tuple[float, float, float]

classmethod load(path)[source]

Parameters:: path (str)

mag_flat_threshold: float

mag_var_threshold: float

make_new_watershed_bem: bool

min_r_squared: Annotated[float, None, Interval(gt=None, ge=0, lt=None, le=1), None, None]

model_config = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

mri_template_check()[source]

muscle_activity_filter_freq: Tuple[int, int]

classmethod muscle_activity_filter_freq_fv(v)[source]

muscle_activity_min_length_good: float

muscle_activity_thr: int

classmethod muscle_activity_thr_fv(v)[source]

notch_filter: bool

pacellation_checker()[source]

parametrization_method: Literal['fooof', 'irasa']

parcellation_annot_fname: Path | None

parcellation_parc: Literal[None, 'aparc.a2009s', 'parac']

power_band_ratios_list: List[BandRatio]

psd_method: Literal['multitaper', 'welch']

psd_n_fft: Annotated[int, Gt(gt=0)]

psd_n_overlap: Annotated[int, Gt(gt=0)]

psd_n_per_seg: Annotated[int, Gt(gt=0)]

random_state: int

rereference_method: Literal['average', 'REST', 'None']

resampling_rate: Annotated[int, Gt(gt=0)]

same_environmental_noise_removal: bool

save(save_path, overwrite=False)[source]

save the configurations to a JSON file

Parameters:: save_path (str)

save_psds: bool

save_source_localized_epochs: bool

segments_length: Annotated[int, Gt(gt=0)]

segments_overlap: int

segments_tmax: Annotated[int, Lt(lt=0)]

segments_tmin: Annotated[int, Gt(gt=0)]

sensai_method: Literal['optimize', 'gridsearch']

source_space_res()[source]

source_space_spacing: Literal['ico3', 'ico4', 'ico5', 'ico6', 'oct5', 'oct6']

source_space_spacing_number: Literal[3, 4, 5, 6]

which_layout: Literal['all', 'lobe', None]

which_meg_session: int

which_sensor: Literal['mag', 'grad', 'meg', 'eeg', 'opm']

zscore_std_thresh: Annotated[int, Gt(gt=0)]

meganorm.utils.IO.clean_nan_columns(df, nan_threshold)[source]

Remove columns with more NaNs than nan_threshold, otherwise impute NaNs with the column median.

Parameters:

df (pd.DataFrame) – Input dataframe
nan_threshold (int) – Max allowed NaNs per column

Returns:

Cleaned dataframe

Return type:

pd.DataFrame

meganorm.utils.IO.find_failed_meg_subjects(log_path)[source]

Identify subjects whose MEG processing failed, based on log files.

Scans the given directory for error log files (files with “err” in the name) and flags any subject whose log contains the string “error”.

Parameters:: log_path (str or Path) – Directory containing per-subject log files.
Returns:: Unique subject IDs (parsed from the log filename, before the first underscore) whose logs indicate a processing error.
Return type:: set of str

meganorm.utils.IO.find_other_meg_session(base_meg_path, missing_meg_subjects, str_meg_ending, task_name, which_session)[source]

Find an alternative MEG recording for each subject at a given session index.

Searches recursively under each subject’s directory for files matching the given task name and filename ending, then selects the file at the requested session index. Intended for locating a fallback MEG session for subjects whose primary recording is missing or failed processing.

Parameters:

base_meg_path (str or Path) – Root directory containing per-subject MEG folders.
missing_meg_subjects (iterable of str) – Subject IDs to search for.
str_meg_ending (str) – Filename suffix/pattern used to match MEG files (e.g. “raw.fif”).
task_name (str) – Task identifier expected to appear in the filename (e.g. “rest”).
which_session (int) – 1-based index of the session to select from each subject’s matched file list.

Returns:

Mapping of subject ID to the matched MEG file path. Subjects with fewer than which_session matching files are omitted.

Return type:

dict[str, str]

meganorm.utils.IO.find_other_mri_session(base_mri_path, missing_mri_subjects, str_mri_ending, which_session)[source]

Find an alternative MRI file for each subject at a given session index.

Searches recursively under each subject’s directory for files matching the given filename ending, then selects the file at the requested session index. Intended for locating a fallback MRI session (e.g. a second scan) for subjects whose primary MRI is missing or unusable.

Parameters:

base_mri_path (str or Path) – Root directory containing per-subject MRI folders.
missing_mri_subjects (iterable of str) – Subject IDs to search for.
str_mri_ending (str) – Filename suffix/pattern used to match MRI files (e.g. “T1w.nii.gz”).
which_session (int) – 1-based index of the session to select from each subject’s matched file list.

Returns:

Mapping of subject ID to the matched MRI file path. Subjects with fewer than which_session matching files are omitted.

Return type:

dict[str, str]

meganorm.utils.IO.make_demo_file_bids(file_dir, save_dir, id_col, age_col, *columns)[source]

Convert formats of demographic data into a single format so it can be used in later stages.

Parameters:

file_dir (str) – Path to the input demographic file (supports CSV, TSV, or XLSX).
save_dir (str) – Path where the BIDS-formatted demographic file will be saved (as TSV).
id_col (int) – Column index containing the participant ID.
age_col (int) – Column index containing participant age.
*extra_columns (dict) –
Additional column definitions. While age and participants id were defined using positional arguments, extra coulmn modification (e.g., sex and eyes condition) can be revised and converted to a single format across dataset using this function. Each dict can contain:
- ’col_name’: str, required name for the output column. This does not
  necessarly match the column name before being passed to this function.
- ’col_id’: int, index of the column that the revision should be applied to.
- ’single_value’: value to assign to all rows if no col_id and mapping are given.
  This can be helpful when all subjects in a dataset have the same properties e.g., eyes open condition.
- ’mapping’: dict, if single value is not defined, value mapping can be passed
  to map the initial values to the target values.

Return type:

None

meganorm.utils.IO.merge_datasets_with_glob(datasets)[source]

Merges file paths across multiple datasets using glob pattern matching.

This function walks through the provided datasets’ base directories to find subject folders and file paths matching a specified task and file ending. It creates a dictionary mapping each subject to a glob pattern that can be used to aggregate files across multiple runs or sessions.

Parameters:

datasets (dict) –

Dictionary where each key is a dataset name, and each value is a dictionary with the following keys:

”base_dir” (str): Base directory containing subject subdirectories.

”task” (str): Task keyword to search for in filenames.

”ending” (str): File ending (e.g., ‘.nii.gz’) to filter relevant files.

Returns:

A dictionary mapping subject IDs to a glob-style path string that aggregates all matching files for that subject. Only subjects with at least one matched file are included.

Return type:

dict

Notes

This function is designed to assist in scenarios where each subject may have multiple files (e.g., different runs or sessions), and the goal is to create a single pattern that can be used to load all related files for a subject.

meganorm.utils.IO.merge_fidp_demo(datasets_paths, features_dir, dataset_names, drop_columns=['eyes'])[source]

Merge demographic metadata and extracted features into a single DataFrame.

This function loads demographic data and feature data, assigns a site label to each participant if missing, removes unnecessary columns, and merges demographic information with corresponding extracted features.

Parameters:

datasets_paths (list) – List of paths to the dataset directories containing demographic files (‘participants_bids.tsv’).
features_dir (str) – Path to the directory containing the extracted features (‘all_features.csv’).
dataset_names (list of str) – List of dataset names corresponding to each dataset path. Used to populate missing ‘site’ information if necessary.
drop_columns (list of str, optional) – Columns to drop from the demographic data before merging. Default is [“eyes”].

Returns:

data (pandas.DataFrame) – Merged DataFrame containing both demographic information and feature data, with participants indexed as strings.
Raises – —— FileNotFoundError

If the ‘participants_bids.tsv’ file is missing in any of the dataset paths or the ‘all_features.csv’ file is missing in the provided features directory.

meganorm.utils.IO.separate_eyes_open_close_eeglab(input_base_path, output_base_path, annotation_description_open, annotation_description_close, trim_before=5, trim_after=5)[source]

Split resting-state EEGLAB recordings into separate eyes-open and eyes-closed files based on annotations.

Scans input_base_path for BIDS-style resting-state .set files, extracts and trims annotated eyes-open and eyes-closed segments, concatenates each condition’s segments, and writes them out as new EEGLAB .set files under a subject-specific folder in output_base_path.

Parameters:

input_base_path (str) – Root directory containing subject subfolders with resting-state EEGLAB recordings, matched via the pattern */eeg/*_task-rest_eeg.set.
output_base_path (str) – Root directory where the separated eyes-open and eyes-closed files will be saved, created if it does not already exist.
annotation_description_open (str) – Annotation description label marking eyes-open segments.
annotation_description_close (str) – Annotation description label marking eyes-closed segments.
trim_before (float, optional) – Duration in seconds to trim from the start of each annotated segment. Default is 5.
trim_after (float, optional) – Duration in seconds to trim from the end of each annotated segment. Default is 5.

Return type:

None

meganorm.utils.IO.set_path(project_dir)[source]

Create and initialize directory structure for a given project.

This function generates a set of predefined directories for feature extraction and normative modeling workflows within the specified project directory. If any of these directories do not exist, they will be created. The function returns the path to the features log directory.

Parameters:: project_dir (str) – Path to the root project directory where the folder structure will be created.
Returns:: Absolute path to the ‘log’ directory inside the ‘Features’ folder.
Return type:: str

Notes

The function creates the following directory structure:

Features/ - log/ (for saving logs of feature extraction) - temp/ (for temporarily storing extracted features) - figures/ (for saving generated figures)
Normative modeling/ - Runs/ (for saving model run outputs) - Figures/ (for visual outputs related to modeling) - Models summary/ (for summaries of model results)

meganorm.utils.freesurfer module

meganorm.utils.freesurfer.check_log_for_success(results_directory, subject_ids=None, *, processing_directory=None, write_manifests=True, success_token='finished without error', consider_running_as_failure=False, tail_lines_to_scan=200, fresh_minutes=30, stalled_hours=24, return_details=True)[source]

Scan SUBJECTS_DIR for recon-all outcomes, print a summary, and (optionally) write failed/stalled/missing manifests to processing_directory.

Returns:

dict[str, dict] of FAILED/MISSING/STALLED subjects (default), or list[str] of subject IDs if return_details=False.

Parameters:

results_directory (str)
subject_ids (list[str] | None)
processing_directory (str | None)
write_manifests (bool)
success_token (str)
consider_running_as_failure (bool)
tail_lines_to_scan (int)
fresh_minutes (int)
stalled_hours (int)
return_details (bool)

meganorm.utils.freesurfer.classify_subject_status(results_directory, subject_id, *, success_token='finished without error', tail_lines_to_scan=200, fresh_minutes=30, stalled_hours=24)[source]

Classify a single subject by inspecting recon-all logs and IsRunning* locks.

Returns:

(status, info_dict) status ∈ {“success”,”running”,”stalled”,”failed”,”missing”}

Parameters:

results_directory (str)
subject_id (str)
success_token (str)
tail_lines_to_scan (int)
fresh_minutes (int)
stalled_hours (int)

Return type:

tuple[str, dict]

meganorm.utils.freesurfer.create_slurm_script(t1_path, job_label, results_dir, processing_directory, freesurfer_path, nodes=1, ntasks=1, cpus_per_task=1, mem='16G', time='48:00:00', i_option=True)[source]: Create a Slurm batch script for running recon-all with given parameters. BIDS-aware, and FreeSurfer path comes from freesurfer_path or $FREESURFER_HOME.

meganorm.utils.freesurfer.discover_subjects(results_directory, exclude_subjects)[source]

Parameters:

results_directory (str)
exclude_subjects (set[str])

Return type:

list[str]

meganorm.utils.freesurfer.find_bids_t1w_files(subjects_directory, subject_id)[source]

Return a list of dicts for all T1w files for a BIDS subject. Handles no-session, multi-session, and multi-run cases.

Parameters:

subjects_directory (str)
subject_id (str)

meganorm.utils.freesurfer.freesurfer_QC(results_directory, method='MAD', threshold=3.0, verbose=False)[source]

Perform Euler-number–based quality control (QC) on FreeSurfer outputs.

This function retrieves Euler numbers for all subjects in a FreeSurfer results directory and performs outlier detection based on the distribution of Euler numbers across subjects. The worst hemisphere (i.e., most negative Euler number) is used per subject as the QC metric.

Two QC strategies are supported:

MAD-based robust z-score (default, recommended) Computes a robust z-score using the median and median absolute deviation (MAD). Subjects with |z| > threshold are flagged as QC failures. This approach is robust and recommended for large multi-site datasets.
Absolute deviation from median Computes absolute deviation from the median Euler number and excludes subjects whose deviation exceeds the specified threshold (in Euler units).

Parameters:

results_directory (str) – Path to the FreeSurfer results directory containing subject folders.
method ({'MAD', 'ABS'}, optional) – QC method to use: - ‘MAD’ : Median absolute deviation–based robust z-score (default). - ‘ABS’ : Absolute deviation from median Euler number. The default is ‘MAD’.
threshold (float, optional) – Threshold for exclusion. - If method=’MAD’: threshold is the robust z-score cutoff (default=3). - If method=’ABS’: threshold is in Euler-number units. The default is 3.0.
verbose (bool, optional) – If True, prints progress and diagnostic information during execution. The default is False.

Returns:

qc_passed_samples (list of str) – List of subject IDs that passed QC.
qc_failed_samples (list of str) – List of subject IDs that failed QC.
missing_samples (list of str) – List of subjects for which Euler numbers could not be retrieved or computed.

Notes

Euler numbers are extracted from FreeSurfer outputs using retrieve_freesurfer_eulernum.
The worst hemisphere (most negative Euler number) is used per subject, as more negative values typically indicate poorer surface reconstruction.
For MAD-based QC, the robust z-score is computed as:

z = (x - median) / (1.4826 * MAD)

where MAD is the median absolute deviation.
If MAD is zero (rare but possible), an IQR-based fallback is used to estimate scale.
This function assumes all subjects in results_directory belong to a single site. For multi-site studies, QC should ideally be performed separately per site.

Examples

>>> passed, failed, missing = freesurfer_QC("/path/to/freesurfer_dir")

>>> passed, failed, missing = freesurfer_QC(
...     "/path/to/freesurfer_dir",
...     method="MAD",
...     threshold=2.5,
...     verbose=True
... )

meganorm.utils.freesurfer.get_freesurfer_home(freesurfer_path)[source]

Resolve FreeSurfer installation path: - If freesurfer_path is provided, use it. - Else read $FREESURFER_HOME from the environment.

Parameters:: freesurfer_path (str | None)
Return type:: str

meganorm.utils.freesurfer.is_success(results_directory, subject_id, token='finished without error', tail_lines_to_scan=200, fresh_minutes=30, stalled_hours=24)[source]

Parameters:

results_directory (str)
subject_id (str)
token (str)
tail_lines_to_scan (int)
fresh_minutes (int)
stalled_hours (int)

Return type:

bool

meganorm.utils.freesurfer.log_tail_lines(path, n=200)[source]

Return the last n lines of a log file efficiently.

Parameters:

path (Path)
n (int)

Return type:

list[str]

meganorm.utils.freesurfer.prepare_mri_data(mri_directory)[source]

This function is written to prepare the BTNRH MRI data for recon-all processing.

Parameters:: mri_directory (str) – Directory to MRI data

meganorm.utils.freesurfer.retrieve_freesurfer_eulernum(freesurfer_dir, subjects=None, save_path=None, verbose=False)[source]

This function receives the freesurfer directory (including processed data for several subjects) and retrieves the Euler number from the log files. If the log file does not exist, this function uses ‘mris_euler_number’ to recompute the Euler numbers (ENs). The function returns the ENs in a dataframe and the list of missing subjects (that for which computing EN is failed). If ‘save_path’ is specified then the results will be saved in a pickle file.

Basic usage:

ENs, missing_subjects = retrieve_freesurfer_eulernum(freesurfer_dir)

where the arguments are defined below.

Parameters:

freesurfer_dir – absolute path to the Freesurfer directory.
subjects – List of subject that we want to retrieve the ENs for. If it is ‘None’ (the default), the list of the subjects will be automatically retreived from existing directories in the ‘freesurfer_dir’ (i.e. the ENs for all subjects will be retrieved).
save_path – The path to save the results. If ‘None’ (default) the results are not saves on the disk.

Outputs:

ENs - A dataframe of retrieved ENs.
missing_subjects - The list of missing subjects.

Developed by S.M. Kia

meganorm.utils.freesurfer.run_parallel_reconall(subjects_directory, results_directory=None, processing_directory='.', freesurfer_path=None, file_postfix='.nii', skip_completed=True, skip_running=True, resubmit_statuses=('failed', 'missing', 'stalled'), success_token='finished without error', tail_lines_to_scan=200, fresh_minutes=30, stalled_hours=24, selected_subjects=None, selected_sessions=None)[source]

Submit recon-all for BIDS subjects, with optional filtering by specific subjects and sessions.

Parameters:

skip_completed (bool)
skip_running (bool)
resubmit_statuses (tuple[str, ...])
success_token (str)
tail_lines_to_scan (int)
fresh_minutes (int)
stalled_hours (int)
selected_subjects (list[str] | str | None)
selected_sessions (list[str] | str | None)

meganorm.utils.nm module

meganorm.utils.nm.cal_stats_for_INOCs(q_path, features, site_id, sex_id, age, num_of_datasets, num_points=100)[source]

Calculates population statistics (centiles of variation) give a subject age, sex and site.

Parameters:

q_path (str) – Path to the pickled file containing ‘quantiles’, ‘synthetic_X’, and ‘batch_effects’. This is the output of ‘estimate_centiles()’ function.
features (list of str) – List of biomarker feature names.
site_id (int) – Index representing the participant’s site. If None, averages across all sites.
sex_id (int) – Index representing the participant’s sex.
age (float) – Age of the participant.
num_of_datasets (int) – Number of datasets used to generate quantiles.
num_points (int, optional) – Number of points for synthetic X axis (default is 100).

Returns:

Dictionary mapping each feature to a list of statistics across quantiles at the given age.

Return type:

dict

meganorm.utils.nm.calculate_PNOCs(quantiles_path, gender_ids, frequency_band_model_ids, quantile_id=2, site_id=None, point_num=100, sex_batch_ind=0, site_batch_ind=1, num_of_sexs=2, num_of_datasets=None, age_slices=None)[source]

Prepares the data required for the plot_PNOCs function.

This function slices the covariate into multiple bins and calculates the mean and standard deviation of each frequency band across the population for both sexes.

Parameters:

quantiles_path (str) – Path to a pickle file containing the keys: ‘quantiles’, ‘synthetic_X’, and ‘batch_effects’.
gender_ids (dict) – Dictionary mapping gender labels (e.g., {“male”: 0, “female”: 1}) to their batch indices.
frequency_band_model_ids (dict) – Dictionary mapping frequency band names (e.g., {“alpha”: 0, “beta”: 1}) to model indices.
quantile_id (int, optional) – Index of the quantile to use from the loaded quantiles array (default is 2). This number corresponds to the ith element of the computed percentiles. If the computed percentiles were [0.05, 0.25, 0.5, 0.75, 0.95], then ‘quantile_id=2’ corresponds to 0.5.
site_id (int, optional) – Site ID to condition the P-NOCs on. If None, PNOCs from all sites are averaged (default is None).
point_num (int, optional) – Number of synthetic data points used in deriving quantiles (default is 100).
sex_batch_ind (int, optional) – Index in the batch array corresponding to sex (default is 0).
site_batch_ind (int, optional) – Index in the batch array corresponding to site (default is 1).
num_of_sexs (int, optional) – Number of sex categories (default is 2).
num_of_datasets (int, optional) – Number of datasets used in data aggregation (required if site_id is None).
age_slices (array-like of int, optional) – Array of starting ages to define age bins. If None, defaults to np.arange(5, 80, 5).

Returns:

oscilogram (dict) – Nested dictionary with structure: oscilogram[gender][frequency_band] = list of [mean, std] values for each age slice.
age_slices (numpy.ndarray) – Array of age slice start values used for binning.

Notes

The input pickle file must contain:
- ‘quantiles’: array of shape (n_samples, n_quantiles, n_models)
- ‘synthetic_X’: array of age values of shape (n_samples, 1)
- ‘batch_effects’: array of shape (n_samples, n_batch_dims)

meganorm.utils.nm.estimate_centiles(processing_dir, bio_num, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95], batch_sizes=[2, 6], age_range=(0, 100), point_num=100, outputsuffix='estimate', save=True)[source]

Estimate centile curves using a normative model for synthetic subjects across batch combinations.

Parameters:

processing_dir (str) – Path to the normative modeling output directory (Models, log, and batch files).
bio_num (int) – Number of biomarkers or target variables (i.e., number of models to load).
quantiles (list of float, optional) – List of quantiles to estimate (default is [0.05, 0.25, 0.5, 0.75, 0.95]).
batch_sizes (list of int, optional) – List indicating number of levels for each batch variable. Example: [2, 2] for two binary batch variables (e.g., sex and site).
age_range (tuple of float, optional) – Age range over which to generate synthetic samples (default is (0, 100)).
point_num (int, optional) – Number of age points per batch combination (default is 100).
outputsuffix (str, optional) – Suffix used when loading model output files (default is ‘estimate’).
save (bool, optional) – If True, saves the estimated quantiles and synthetic inputs to disk (default is True).

Returns:

q – Estimated quantile array of shape (N, Q, B) where: - N is the number of synthetic points, - Q is the number of quantiles, - B is the number of biomarkers.

Return type:

np.ndarray

meganorm.utils.nm.evaluate_mace(model_path, X_path, y_path, be_path, save_path=None, model_id=0, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95], plot=False, outputsuffix='ms')[source]

Evaluate model calibration using the Mean Absolute Calibration Error (MACE) metric.

This function computes MACE by comparing model-predicted quantiles with the empirical distribution of outcomes across batch groups. Optionally, it plots a reliability diagram to visually assess calibration performance.

Parameters:

model_path (str) – Path to the directory containing the saved model and its metadata.
X_path (str) – Path to the test covariates (.pkl file), expected as a pandas DataFrame.
y_path (str) – Path to the true test responses (.pkl file), expected as a pandas DataFrame.
be_path (str) – Path to the batch effect file (.pkl file), with each column as a batch dimension.
save_path (str, optional) – Directory to save the reliability diagram if plot is True. Required when plotting.
model_id (int, optional) – Index of the model (biomarker) to evaluate. Corresponds to index X in ‘NM_0_X_<suffix>.pkl’.
quantiles (list of float, optional) – Quantiles to use for computing calibration (default: [0.05, 0.25, 0.5, 0.75, 0.95]).
plot (bool, optional) – Whether to generate and save a reliability diagram (default: False).
outputsuffix (str, optional) – Suffix of the saved model filename (default: “ms”).

Returns:

Mean absolute calibration error (MACE) across all batches and batch IDs.

Return type:

float

Notes

This function assumes all inputs are pickled files in the expected format.
Empirical quantiles are computed within each batch group and compared to the target quantiles.
Plotting requires matplotlib and seaborn.
Input file formats:
- X_path: shape (n_samples, n_features)
- y_path: shape (n_samples, n_outputs)
- be_path: shape (n_samples, n_batch_dims)

meganorm.utils.nm.haddbr_data_split(data, save_path, covariates=['age'], batch_effects=None, train_split=0.5, validation_split=None, drop_nans=False, random_seed='23d', prefix='', stratification_columns=['site', 'sex'])[source]

Splits a given DataFrame into training, validation, and test sets for normative modeling, while considering stratification based on specified categorical columns. The data is saved as pickled files for normative modeling (PCNToolkit requires paths to the files).

Parameters:

data (pd.DataFrame) – A Pandas DataFrame containing the data to be split. Created using functions like “load_camcan_data”.
save_path (str) – Path where the resulting training, validation, and test sets will be saved as pickled files.
covariates (list of str, optional, default=["age"]) – List of covariates to be used in the analysis (default is [“age”]).
batch_effects (list of str, optional, default=None) – List of batch effects to be accounted for in the HBR model. Default is None.
train_split (float, optional, default=0.5) – Proportion of the data to be used for training (default is 0.5).
validation_split (float, optional, default=None) – Proportion of the training data to be used for validation (default is None, meaning no validation set is created).
drop_nans (bool, optional, default=False) – If True, rows with missing values are dropped (default is False).
random_seed (int or str, optional, default="23d") – Seed for random number generation to ensure reproducibility (default is 23d).
prefix (str, optional, default="") – Prefix to be added to the filenames when saving the pickled data (default is “”).
stratification_columns (list of str, optional, default=["site", "sex"]) – List of categorical columns used for stratification during splitting (default is [“site”, “sex”]).

Returns:

A list of biomarker names (columns in the target y DataFrame), which represent the dependent variables for the HBR normative modeling.

Return type:

list of str

Notes

The function performs the following steps:

Drops any rows with missing values if drop_nans=True.
Creates a new column “combination” based on the specified stratification columns.
Splits the data into training, validation (optional), and test sets while preserving the stratification.
Saves the resulting splits (x_train, y_train, b_train, etc.) as pickled files in the specified save_path.
Saves the random seed used for splitting into a separate pickled file.
Returns the names of the biomarkers (columns in y_train).

Example

biomarker_names = hbr_data_split(: data=df, save_path=”./data_split/”, covariates=[“age”, “sex”], batch_effects=[“site”], train_split=0.7, validation_split=0.2, random_seed=42

)

meganorm.utils.nm.prepare_prediction_data(data, save_path, covariates=['age'], batch_effects=None, drop_nans=False, prefix='')[source]

Prepares and saves test data (covariates, batch effects, and targets) for normative model prediction.

Parameters:

data (pd.DataFrame) – Input dataframe containing covariates, batch effects, and target biomarkers.
save_path (str) – Directory to save the output .pkl files.
covariates (list of str, optional) – List of column names to be used as covariates (default is [“age”]).
batch_effects (list of str, optional) – List of column names to be treated as batch effects. If None, a dummy batch column is used.
drop_nans (bool, optional) – Whether to drop rows containing NaN values (default is False).
prefix (str, optional) – Prefix for the saved .pkl file names (default is “”).
Saves
-----
{prefix}x_test.pkl (-)
{prefix}y_test.pkl (-)
{prefix}b_test.pkl (-)

Return type:

None

meganorm.utils.nm.shapiro_stat(z_scores, covariates, n_bins=10)[source]

Computes Shapiro-Wilk test statistics for z-scores stratified by covariate bins.

The z-scores are grouped into bins based on the values of the covariate, and the Shapiro-Wilk test for normality is applied within each bin for every feature. The function returns the average Shapiro-Wilk statistic across all bins for each biomarker.

Parameters:

z_scores (numpy.ndarray) – A 2D array of shape (n_samples, n_features) containing the z-scores for each subject and feature.
covariates (numpy.ndarray) – A 1D or 2D array of shape (n_samples,) or (n_samples, 1) containing the covariate values used for binning.
n_bins (int, optional) – The number of equal-width bins to divide the covariate range into. Default is 10.

Returns:

A 1D array of length n_features, where each element is the mean Shapiro-Wilk test statistic across bins for the corresponding feature. NaN is returned for bins with fewer than 3 samples.

Return type:

numpy.ndarray

Notes

The Shapiro-Wilk test is only performed for bins with at least 3 samples. Bins with fewer samples contribute NaN to the average.
The output values range from 0 to 1, where values closer to 1 suggest better adherence to a normal distribution.

meganorm.utils.nm.wilcoxon_rank_test(proposed_dict, baseline_dict)[source]

Applies the Wilcoxon rank-sum test to compare metric distributions between two model configurations across multiple biomarkers. Applies FDR correction (Benjamini-Hochberg) to the resulting p-values.

Parameters:

proposed_dict (dict) – Dictionary of metrics for the proposed model configuration. Expected format: {metric: {biomarker: list of values}}.
baseline_dict (dict) – Dictionary of metrics for the baseline model configuration. Same format as proposed_dict.

Returns:

stat_df (pandas.DataFrame) – DataFrame of Wilcoxon rank-sum test statistics. Rows = metrics, Columns = biomarkers.
pval_df (pandas.DataFrame) – DataFrame of uncorrected p-values.
fdr_corrected_df (pandas.DataFrame) – DataFrame of Benjamini-Hochberg FDR-corrected p-values.

meganorm.utils.parallel module

meganorm.utils.parallel.auto_parallel_feature_extraction(mainParallel_path, project_dir, datasets, job_configs, config_file_path, which_subjects=None, username=None, auto_rerun=True, auto_collect=True, freesurfer_home=None, freesurfer_license=None, max_try=3)[source]

Automatically submits, monitors, and reruns jobs for feature extraction on multiple subjects, and collects the results.

Parameters:

mainParallel_path (str) – Path to the mainParallel.py script that will be executed in parallel for each subject.
project_dir (str) – Root project directory containing the Features directory where results, temporary files, and configuration are stored.
datasets (dict) – Mapping of dataset names to dataset metadata (e.g., base directory, surfaces directory), used to locate subjects and merge them via glob patterns.
job_configs (dict) – Dictionary containing job configuration settings (e.g., memory, time, partition, etc.).
config_file_path (str) – Path to a JSON configuration file containing additional settings for the feature extraction jobs.
which_subjects (list or None, optional) – If provided, restrict processing to these subject IDs only. Default is None.
username (str, optional) – The SLURM username. If not provided, it will be fetched from the environment. Default is None.
auto_rerun (bool, optional) – Whether to automatically rerun failed jobs. Default is True.
auto_collect (bool, optional) – Whether to automatically collect and merge results after job completion. Default is True.
freesurfer_home (str or None, optional) – Path to the FreeSurfer installation directory, passed to each submitted job. Default is None.
freesurfer_license (str or None, optional) – Path to the FreeSurfer license file, passed to each submitted job. Default is None.
max_try (int, optional) – The maximum number of retry attempts for failed jobs. Default is 3.

Returns:

A list of failed jobs after all attempts. If no jobs failed, the list will be empty.

Return type:

list

Notes

Subjects missing a resting-state recording, failing MRI QC (when source localization and MRI QC are enabled without a template), or not present in which_subjects are excluded before submission. Excluded subject lists are written as JSON files under Features/excluded_participants.
Runner parameters are persisted to Features/Configurations/runner_params.json before job submission.
If auto_collect is True, per-subject results are merged and combined with demographic data into Features/all_features.csv.

meganorm.utils.parallel.check_jobs_status(username, start_time, delay=20)[source]

Checks the status of submitted jobs to the SLURM cluster.

Parameters:

username (str) – The SLURM username used to check the status of the jobs.
start_time (str) – The start time for the batch job submission, formatted as ‘YYYY-MM-DDTHH:MM:SS’.
delay (int, optional) – The delay, in seconds, between each status check. Default is 20 seconds.

Returns:

A list of names of jobs that have failed.

Return type:

list

meganorm.utils.parallel.check_user_jobs(username, start_time)[source]

Count the status of jobs submitted to the SLURM scheduler.

Parameters:

username (str) – The SLURM username used to check the status of the jobs.
start_time (str) – The start time for the batch job submission, formatted as ‘YYYY-MM-DDTHH:MM:SS’.

Returns:

A 3-tuple (status_counts, failed_jobs, ok): - status_counts : dict

Counts of jobs per state (PENDING, RUNNING, COMPLETED, FAILED, CANCELLED).

failed_jobslist
Job names that have failed.
okbool
True if the sacct query succeeded, False otherwise. When False the counts are all zero and failed_jobs is empty.

Return type:

tuple

meganorm.utils.parallel.collect_results(target_dir, subjects, temp_path, file_name='features', clean=True)[source]

Collects and merges the results of all jobs into a single file.

Parameters:

target_dir (str) – Path to the target directory where the merged results will be saved.
subjects (dict) – A dictionary with subject names as keys and their corresponding file paths as values.
temp_path (str) – Path to the temporary directory where individual subject result files are stored.
file_name (str, optional) – The name of the file where the merged results will be saved. Default is ‘features’.
clean (bool, optional) – Whether to remove the temporary files after merging the results. Default is True.

Returns:

This function does not return anything but writes the merged results to a CSV file in the target directory.

Return type:

None

meganorm.utils.parallel.progress_bar(current, total, bar_length=20)[source]

Displays or updates a console progress bar.

Parameters:

current (int) – The current progress (must be between 0 and total).
total (int) – The total steps for complete progress.
bar_length (int, optional) – The character length of the progress bar. Default is 20.

meganorm.utils.parallel.sbatch_feature_extraction_runner(project_dir, datasets, job_configs, config_file=None, time='48:00:00', mem='16GB', freesurfer_home=None, freesurfer_license=None, auto_rerun=True, auto_collect=True, max_try=5, which_subjects=None)[source]

Set up and generate a SLURM sbatch script that launches the full parallel feature-extraction pipeline as a single driver job.

Creates the project’s Features directory structure, saves the pipeline configuration (custom or default), serializes all runner parameters needed by auto_parallel_feature_extraction to a JSON file, and writes an sbatch script that runs the parallel driver when submitted to the scheduler.

Parameters:

project_dir (str) – Root project directory in which the Features directory and outputs will be created.
datasets (dict) – Mapping of dataset names to dataset metadata (e.g., base directory, surfaces directory), used to locate subjects and anatomical data.
job_configs (dict) – SLURM job configuration, including keys such as “partition”, “module”, and “slurm_username”. Updated in place with the computed “log_path”.
config_file (Config or None, optional) – A meganorm.utils.IO.Config instance specifying pipeline settings. If None, a default Config is created and saved. Default is None.
time (str, optional) – Maximum wall time for the sbatch driver job (format “HH:MM:SS”). Default is “48:00:00”.
mem (str, optional) – Memory allocation for the sbatch driver job (e.g., “16GB”). Default is “16GB”.
freesurfer_home (str or None, optional) – Path to the FreeSurfer installation directory, passed through to per-subject jobs. Default is None.
freesurfer_license (str or None, optional) – Path to the FreeSurfer license file, passed through to per-subject jobs. Default is None.
auto_rerun (bool, optional) – Whether failed per-subject jobs should be automatically resubmitted. Default is True.
auto_collect (bool, optional) – Whether results should be automatically collected and merged after job completion. Default is True.
max_try (int, optional) – Maximum number of rerun attempts for failed jobs. Default is 5.
which_subjects (list or None, optional) – Optional list restricting processing to specific subject IDs. Default is None.

Returns:

Writes runner_params.json and feature_extraction_runner.sbatch to the project’s Features directory.

Return type:

None

meganorm.utils.parallel.sbatchfile(mainParallel_path, bash_file_path, log_path=None, module='mne', time='1:00:00', memory='20GB', partition='normal', core=1, node=1, batch_file_name='batch_job', freesurfer_home=None, freesurfer_license=None, with_config=None)[source]

Generates a batch script file for submission to a job scheduler (e.g., SLURM) for parallel execution.

Parameters:

mainParallel_path (str) – Path to the mainParallel.py script that will be executed in the batch job.
bash_file_path (str) – Path where the generated batch job file will be saved.
log_path (str, optional) – Path to the log file where output from the job will be saved. Default is None.
module (str, optional) – The module to load in the batch job environment. Default is ‘mne’.
time (str, optional) – Maximum wall time for the job (format: HH:MM:SS). Default is ‘1:00:00’.
memory (str, optional) – Amount of memory allocated for the job (e.g., ‘20GB’). Default is ‘20GB’.
partition (str, optional) – The partition or queue to submit the job to. Default is ‘normal’.
core (int, optional) – Number of CPU cores to allocate for the job. Default is 1.
node (int, optional) – Number of nodes to request for the job. Default is 1.
batch_file_name (str, optional) – Name for the generated batch job file. Default is ‘batch_job’.
with_config (bool, optional) – Whether to include the configuration in the batch file. Default is True.

Returns:

This function generates a batch script file and saves it to the specified path.

Return type:

None

meganorm.utils.parallel.submit_jobs(mainParallel_path, bash_file_path, subjects, temp_path, config_file=None, job_configs=None, progress=False, freesurfer_home=None, freesurfer_license=None)[source]

Submits jobs for each subject to the SLURM cluster for parallel execution.

Parameters:

mainParallel_path (str) – Path to the mainParallel.py script that will be executed in the batch job.
bash_file_path (str) – Path where the generated batch job file will be saved.
subjects (dict) – A dictionary of subject names (keys) and their corresponding paths (values). Each subject will have a job submitted to the cluster.
temp_path (str) – Path where temporary files will be stored.
config_file (str, optional) – Path to a JSON configuration file. If provided, this will be passed to the batch job. Default is None.
job_configs (dict, optional) – Dictionary containing job-specific configurations (e.g., memory, time, partition). Defaults to None, in which case default configurations will be used.
progress (bool, optional) – Whether to show a progress bar during job submission. Default is False.

Returns:

The start time for the batch job submission, formatted as ‘YYYY-MM-DDTHH:MM:SS’.

Return type:

str

`add_channels`(add_list[, force_update_info])	Append new channels from other MNE objects to the instance.
`add_events`(events[, stim_channel, replace])	Add events to stim channel.
`add_proj`(projs[, remove_existing, verbose])	Add SSP projection vectors.
`add_reference_channels`(ref_channels)	Add reference channels to data that consists of all zeros.
`anonymize`([daysback, keep_his, verbose])	Anonymize measurement information in place.
`append`(raws[, preload])	Concatenate raw instances as if they were continuous.
`apply_function`(fun[, picks, dtype, n_jobs, ...])	Apply a function to a subset of channels.
`apply_gradient_compensation`(grade[, verbose])	Apply CTF gradient compensation.
`apply_hilbert`([picks, envelope, n_jobs, ...])	Compute analytic signal or envelope for a subset of channels/vertices.
`apply_proj`([verbose])	Apply the signal space projection (SSP) operators to the data.
`close`()	Clean up the object.
`compute_psd`([method, fmin, fmax, tmin, ...])	Perform spectral analysis on sensor data.
`compute_tfr`(method, freqs, *[, tmin, tmax, ...])	Compute a time-frequency representation of sensor data.
`copy`()	Return copy of the instance.
`crop`([tmin, tmax, include_tmax, ...])	Crop raw data file.
`crop_by_annotations`([annotations, verbose])	Get crops of raw data file for selected annotations.
`del_proj`([idx])	Remove SSP projection vector.
`describe`([data_frame])	Describe channels (name, type, descriptive statistics).
`drop_channels`(ch_names[, on_missing])	Drop channel(s).
`export`(fname[, fmt, physical_range, ...])	Export Raw to external formats.
`filter`(l_freq, h_freq[, picks, ...])	Filter a subset of channels/vertices.
`get_channel_types`([picks, unique, only_data_chs])	Get a list of channel type for each channel.
`get_data`([picks, start, stop, ...])	Get data in the given range.
`get_montage`()	Get a DigMontage from instance.
`interpolate_bads`([reset_bads, mode, origin, ...])	Interpolate bad MEG and EEG channels.
`interpolate_to`(sensors[, origin, method, ...])	Interpolate data onto a new sensor configuration.
`load_bad_channels`([bad_file, force, verbose])	Mark channels as bad from a text file.
`load_data`([verbose])	Load raw data.
`notch_filter`(freqs[, picks, filter_length, ...])	Notch filter a subset of channels.
`pick`(picks[, exclude, verbose])	Pick a subset of channels.
`pick_channels`(ch_names[, ordered, verbose])
`pick_types`([meg, eeg, stim, eog, ecg, emg, ...])
`plot`([events, duration, start, n_channels, ...])	Plot raw data.
`plot_projs_topomap`([ch_type, sensors, ...])	Plot SSP vector.
`plot_psd`([fmin, fmax, tmin, tmax, picks, ...])
`plot_psd_topo`([tmin, tmax, fmin, fmax, ...])
`plot_psd_topomap`([bands, tmin, tmax, ...])
`plot_sensors`([kind, ch_type, title, ...])	Plot sensor positions.
`rename_channels`(mapping[, allow_duplicates, ...])	Rename channels.
`reorder_channels`(ch_names)	Reorder channels.
`resample`(sfreq, *[, npad, window, ...])	Resample all channels.
`rescale`(scalings, *[, verbose])	Rescale channels.
`save`(fname[, picks, tmin, tmax, ...])	Save raw data to file.
`savgol_filter`(h_freq[, verbose])	Filter the data using Savitzky-Golay polynomial method.
`set_annotations`(annotations[, emit_warning, ...])	Setter for annotations.
`set_channel_types`(mapping, *[, ...])	Specify the sensor types of channels.
`set_eeg_reference`([ref_channels, ...])	Specify which reference to use for EEG data.
`set_meas_date`(meas_date)	Set the measurement start date.
`set_montage`(montage[, match_case, ...])	Set EEG/sEEG/ECoG/DBS/fNIRS channel positions and digitization points.
`time_as_index`(times[, use_rounding, origin])	Convert time to indices.
`to_data_frame`([picks, index, scalings, ...])	Export data in tabular structure as a pandas DataFrame.

meganorm.utils package

Submodules

meganorm.utils.EEGlab module

meganorm.utils.IO module

meganorm.utils.freesurfer module

meganorm.utils.nm module

meganorm.utils.parallel module

Module contents