-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acoustics Module #349
Acoustics Module #349
Conversation
# MHKiT v0.8.1 MHKiT v0.8.1, includes bug fixes in the example notebooks and fixes the dependencies to requirements updates prior to Numpy 2.0.0. Fixes MHKIT v0.8.0 installation issues (MHKiT-Software#334) by fixing the dependencies. - MHKiT-Software#335 Fixes bugs in MHKiT example notebooks - MHKiT-Software#327
…-Software#326) * tke updates * Fix shear velocity functions * More detail for tke shear production * Don't rotate heading beyond 360 degrees * Fix some typos in notebook * Rename deprecated function
…#340) * Test: Determine method using input frequency index * Feat: Use sum of sines if ifft is not computable This change allows `surface_elevation` to return a result if the user inputs a spectrum with a frequency index that does not have a zero frequency. If the non zero frequency index condition is found when the method is `ifft` we warn the user and change the method to `sum_of_sines` * Fix: Use previously found frequency index S.index may not exist for some input datasets, but f[0] does and we should use the value of f[0] here. * Test: Warn when using ifft with a non zero frequency * Lint
This PR adds a Github action to test the example notebooks as part of or CD pipeline. Additionally a timeout is added to which notebooks will fail if they exceed the given time.
This PR addresses MHKiT-Software#315 by: - Changes the data called by the Wind Toolkit tests so that they run faster - Updates the .csv files that the tests compare against - Updates a few descriptions in the metocean example, fixes a sorting issue, reduces the data downloaded there - Tests in hindcast match the notebooks and use the same cache.
Bug fix by removing matplotlib version check needed for < 3.8.0
*MHKiT v0.8.2
…ython into acoustics_module
@jmcvey3 could you take a look at my comments above? |
…ython into acoustics_module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking the PR on in bites. Today I got through the Acoustics example notebook. This is really great work @jmcvey3 I only have a couple of minor comments I think could add clarity for those less familiar with the field/ example.
" gain=0, \n", | ||
" start_time=\"2024-06-01T05:31:14\"\n", | ||
")\n", | ||
"print(P)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider addint the follow comments above the print or somewhere else to add a little additional context about the output
# `P` is returned as an xarray DataArray, which allows easy handling of labeled multi-dimensional arrays.
# In this case, the array represents sound pressure in Pascals (Pa) over time because the sensitivity was passed.
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\"Smart\" hydrophones are those where the hydrophone element, pre-amplifier board, ADC, motherboard and memory card are sold in a single package. Companies that sell these often store metadata in the .wav file header, and MHKiT has a couple of wrapper functions for these hydrophones.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this paragraph I think you could add clarity by adding a sentence to the end like: "Ocean Sonics icListen and Ocean Instruments Soundtrap are two common smart hydrophones datafiles with examples as follows". This would help add context about the following two examples
examples/acoustics_example.ipynb
Outdated
"\n", | ||
"A calibration curve consists of the hydrophone's sensitivity (in units of dB rel $1 V^2/uPa^2$) vs frequency and should be applied to the spectral density we just calculated.\n", | ||
"\n", | ||
"The easiest way to do apply a sensitivity calibration curve in MHKiT is to first copy the calibration data into a CSV file, where the left column contains the calibrated frequencies and the right column contains the sensitivity values. Here we use the function in the following codeblock to read in a CSV file I created with the column headers \"Frequency\" and \"Analog Sensitivity\"." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "do" => "The easiest way to do apply..."
remove "I" => "to read in a CSV file I created with"
examples/acoustics_example.ipynb
Outdated
} | ||
], | ||
"source": [ | ||
"spsdl[spsdl.dims[-1]].max().item()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a comment like "Get max frequency if we are going to keep this"
examples/acoustics_example.ipynb
Outdated
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete empty cell
mhkit/acoustics/analysis.py
Outdated
return out | ||
|
||
|
||
def time_average( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are multiple methods what about:
aggregate_spectral_density
or bin_and_aggregate_spectral_density
?
Using aggregate here to refer to a generalized aggregation method. Or bin and aggregate to be explicit that you're binning the data in time and applying a general aggregation method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go with "time_aggregate", since that's the dimension that's being operated on.
raise ValueError("'pressure' must be indexed by 'time' dimension.") | ||
|
||
# window length of each time series | ||
nbin = bin_length * fs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If bin and fs can be floats I think this could be a float. Maybe best practice to cast as int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"nbin" doesn't have to be a float, and it won't throw an error with the binning functions used here.
psd_adj, | ||
coords={"time": psd_adj["time"], "freq": psd_adj["freq"]}, | ||
attrs={ | ||
"units": pressure.units + "^2/Hz", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are pressure units enforced to be defined somewhere else?
Maybe we use getattr
and assign a default null value otherwise?
units = getattr(pressure, 'units', '') + '^2/Hz'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually hardcode them in the io.py file, but if a user wants to use their own imported data they'll run into more issues than just this, I think.
spsd_calibrated /= sensitivity_ratio # uPa^2/Hz | ||
spsd_calibrated /= 1e12 # Pa^2/Hz | ||
spsd_calibrated.attrs["units"] = "Pa^2/Hz" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You give long names elsewhere. Maybe add one here?
spsd_calibrated.attrs["long_name"] = "Calibrated Sound Pressure Spectral Density"
mhkit/acoustics/analysis.py
Outdated
method: Union[str, Dict[str, Union[float, int]]] | ||
) -> Tuple[str, Optional[Union[float, int]]]: | ||
""" | ||
Validates the 'method' parameter and returns the method name and argument (if any). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through this the docstring could be improved by describing what this function raises and showing examples:
"""
Validates the 'method' parameter and returns the method name and its argument (if any).
Parameters
----------
method : str or dict
The aggregation method to validate. It can be either:
- A string representing one of the supported methods without additional arguments,
e.g., 'mean', 'sum'.
- A dictionary with a single key-value pair where the key is the method name and
the value is its argument, e.g., {'quantile': 0.25}.
Supported methods are:
- 'median'
- 'mean'
- 'min'
- 'max'
- 'sum'
- 'quantile' (requires an argument between 0 and 1)
- 'std'
- 'var'
- 'count'
Returns
-------
method_name : str
The validated method name in lowercase.
method_arg : float, int, or None
The argument associated with the method, if applicable; otherwise, None.
Raises
------
ValueError
- If the method name is not supported.
- If the 'quantile' method is provided without an argument or with an invalid argument.
- If the 'method' dictionary does not contain exactly one key-value pair.
- If 'method' is of an unsupported type.
TypeError
- If the key in the 'method' dictionary is not a string.
Examples
--------
>>> _validate_method('mean')
('mean', None)
>>> _validate_method({'quantile': 0.75})
('quantile', 0.75)
>>> _validate_method('quantile')
ValueError: The 'quantile' method must be provided as a dictionary with the quantile value, e.g., {'quantile': 0.25}.
>>> _validate_method({'quantile': 1.5})
ValueError: The 'quantile' method must have a float between 0 and 1 as an argument.
>>> _validate_method({'unsupported_method': None})
ValueError: Method 'unsupported_method' is not supported. Supported methods are: ['median', 'mean', 'min', 'max', 'sum', 'quantile', 'std', 'var', 'count']
"""
```
mhkit/acoustics/analysis.py
Outdated
return method_name, method_arg | ||
|
||
|
||
def band_average( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a misleading name.
What about band_aggregate
to refer to a generalized aggregation method dependent on the passed method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work
mhkit/acoustics/analysis.py
Outdated
Returns | ||
------- | ||
mspl : xarray.DataArray (time, freq_bins) | ||
Sound pressure level [dB re 1 uPa] indexed by time and third octave bands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decidecade bands (not third octave)
I still need to review graphics and io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of the acoustics graphics module
mhkit/acoustics/graphics.py
Outdated
time = spsdl.dims[0] | ||
freq = spsdl.dims[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that time is the first dimension and frequency is the last is probably not the best design.
What if we allowed the user to pass the data or specify the keys?
At a minimum we should check and raise an error if not:
if 'time' in spsdl.dims and 'freq' in spsdl.dims:
time = 'time'
freq = 'freq'
else:
raise ValueError("spsdl must have 'time' and 'freq' dimensions.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well no, because you can have a "time_bins" or "freq_bins" coordinates, which are always first and second, respectively, using this codebase.
If you're smart enough to know how to a read a hydrophone file yourself you probably also have plotting functionality.
mhkit/acoustics/graphics.py
Outdated
fig, ax = plt.subplots(figsize=(6, 5), subplot_kw={"yscale": "log"}) | ||
fig.subplots_adjust(left=0.1, right=0.95, top=0.97, bottom=0.11) | ||
h = ax.pcolormesh( | ||
spsdl[time].values, spsdl[freq].values, spsdl.T, shading="nearest", **kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible the user will have more than the 2 dimensions expected here (time, freq)?
I think it is and we should explicitly transpose on the desired dimension:
spsdl_transposed = spsdl.transpose(freq, time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they use our I/O functions, no. But your line there is safer.
h = ax.pcolormesh( | ||
spsdl[time].values, spsdl[freq].values, spsdl.T, shading="nearest", **kwargs | ||
) | ||
fig.colorbar(h, ax=ax, label=spsdl.units) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expect the user will always have units defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they use the I/O functions they will. I can't imagine they'd import data elsewhere.
mhkit/acoustics/graphics.py
Outdated
|
||
def plot_spectogram(spsdl, fmin=10, fmax=100000, fig=None, ax=None, **kwargs): | ||
""" | ||
Plots the spectogram of the sound pressure spectral density level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spectrogram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lmao I can't spell
ax: matplotlib.pyplot.axis | ||
Figure plot axis | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add some error handling to the function:
if not isinstance(fmin, (int, float)) or not isinstance(fmax, (int, float)):
raise TypeError("fmin and fmax must be numeric types.")
if fmin >= fmax:
raise ValueError("fmin must be less than fmax.")
mhkit/acoustics/graphics.py
Outdated
from .analysis import _fmax_warning | ||
|
||
|
||
def plot_spectogram(spsdl, fmin=10, fmax=100000, fig=None, ax=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add typehints
""" | ||
|
||
# Set dimension names | ||
freq = spsdl.dims[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that the frequency dimension is the last dimension may not always hold true. We should at least check but should probably make this configurable for ease of use:
if 'freq' in spsdl.dims:
freq = 'freq'
else:
raise ValueError("spsdl must have a 'freq' dimension.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, no. Because if you do any band averaging you'll have a "freq_bins" coordinate. So I assume frequency is the second dimension because that's how I've got it set up
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
|
||
Parameters | ||
---------- | ||
spsdl: xarray DataArray (time, freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function will work for an unaggregated spsdl. The way you use it in the example is post a time_averaging/ aggregation method.
Should this function have a default aggregation if passed an unaggregated spsdl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean yeah, no it won't. Each color in the plot above is of a different timestamp, and that's why the "plot_spectogram" function is there.
We can add a warning if the input has more than one timestamp, but I try to leave the plotting functions very generic and with the most flexibility possible. A user should be able to see the output, even if you or I don't find it immediately useful.
MHKiT Acoustics Module
This pull request introduces a new acoustics module to MHKiT, aimed at providing comprehensive tools for processing and analyzing passive acoustic data from hydrophone recordings. The module is designed to facilitate compliance with the IEC-TS 62600-40 standard for assessing the effects of marine energy devices on the marine environment. It includes functionalities for reading hydrophone data, performing spectral analyses, applying calibrations, calculating sound pressure levels, and visualizing results.
The addition of this acoustics module significantly enhances MHKiT's capabilities in passive acoustic data analysis, providing standardized, robust tools for the marine renewable energy community. It facilitates compliance with international standards and supports detailed acoustic monitoring and environmental impact assessments of marine energy devices.
Dependencies and Compatibility
Key Components
analysis.py
Provides core analytical functions that process sound pressure data in both the frequency and time domains:
Frequency Validation and Warning:
_fmax_warning
: Ensures the specified maximum frequency does not exceed the Nyquist frequency, adjusting it if necessary to prevent aliasing issues.Shallow Water Cutoff Frequency:
minimum_frequency
: Calculates the minimum frequency cutoff based on water depth and the speed of sound in water and seabed materials, addressing limitations in shallow water environments.Spectral Density Calculations:
sound_pressure_spectral_density
: Computes the mean square sound pressure spectral density (SPSD) using FFT binning with Hanning windowing and 50% overlap, converting time-series data into the frequency domain.Calibration:
apply_calibration
: Applies calibration adjustments to the spectral density data using a sensitivity curve, ensuring accurate representation of the hydrophone's frequency response.Spectral Density Level Calculation:
sound_pressure_spectral_density_level
: Converts mean square spectral density values to sound pressure spectral density levels (SPSDL) in decibels.Spectral Density Aggregation:
band_aggregate
: Aggregates spectral density data into fractional octave bands (e.g., third-octave, decidecade) using specified statistical methods like median or mean.time_aggregate
: Aggregates spectral density data into specified time windows, facilitating statistical analyses over time.Sound Pressure Level Calculation:
sound_pressure_level
: Computes the overall sound pressure level (SPL) within a frequency band from SPSD data.third_octave_sound_pressure_level
anddecidecade_sound_pressure_level
: Calculate SPLs across third-octave and decidecade bands, respectively, providing detailed frequency band analysis per IEC standards.io.py
Offers input/output functionalities specific to hydrophone data handling:
Data Reading:
read_hydrophone
: Main function to read a WAV file from a hydrophone, converting raw data into voltage or pressure time series based on available sensitivity data.read_soundtrap
: Specialized reader for Ocean Instruments SoundTrap hydrophone files, automatically incorporating appropriate metadata.read_iclisten
: Specialized reader for Ocean Sonics icListen hydrophone files, including metadata processing to apply hydrophone sensitivity for direct sound pressure calculation.Audio Export:
export_audio
: Converts processed sound pressure data back into a WAV file format, with optional gain adjustment to enhance playback quality.Data Extraction:
_read_wav_metadata
: Extracts metadata from a WAV file, such as bit depth and header information._calculate_voltage_and_time
: Converts raw WAV data into voltage values and generates a time index based on the sampling frequency.graphics.py
Provides plotting functions for visualizing passive acoustics data:
Spectrogram Plotting:
plot_spectrogram
: Generates a spectrogram plot from SPSDL data, using a logarithmic frequency scale by default for improved readability and analysis of acoustic data over time.Spectra Plotting:
plot_spectra
: Produces a spectral density plot with a log-transformed x-axis, allowing for clear visualization of spectral density across frequency bands.All plotting functions accept Matplotlib keyword arguments via
**kwargs
, enabling users to customize plot aspects such as color, scale, and labels.Example Notebook Workflow
An accompanying example notebook demonstrates the practical application of the module's functionalities, following procedures outlined by the IEC-TS 62600-40 standard:
Data Import and Scaling:
read_hydrophone
to read hydrophone WAV files, converting raw data into calibrated voltage or pressure values, considering peak voltage, sensitivity, and gain.Spectral Density Calculation:
sound_pressure_spectral_density
to compute SPSD, essential for understanding sound pressure distribution across frequencies.Calibration Curve Application:
apply_calibration
to adjust SPSD data based on the hydrophone's sensitivity curve, ensuring accurate spectral measurements.Spectrogram Visualization:
plot_spectrogram
to generate spectrograms that help visualize sound events and detect potential contaminations like boat noise.Window Averaging for IEC-40 Compliance:
time_aggregate
to compute statistical summaries over specified time windows, meeting standardized requirements for characterizing marine energy devices.Sound Pressure Level Calculation:
sound_pressure_level
, considering factors like shallow water cutoff frequencies and potential flow noise, adjusting frequency limits as necessary.Decidecade and Third Octave SPLs:
decidecade_sound_pressure_level
andthird_octave_sound_pressure_level
, enabling detailed analysis of sound distribution in line with IEC requirements.Optional Playback:
export_audio
to rescale and export audio files, facilitating better playback and interpretation of acoustic data.