π Integrating Seal-Derived Ocean Profiles into NOAA WOD Workflow
To integrate seal-derived ocean profiles into a NOAA World Ocean Database (WOD) workflow, you're essentially building a multi-source observational pipeline. The most widely used seal dataset comes from the Marine Mammals Exploring the Oceans Pole to Pole (MEOP) initiative.
1) Data Sources & Formats
| Dataset | Format | Variables |
|---|---|---|
| WOD (NOAA) | NetCDF / CSV | temperature, salinity, depth, time, lat, lon |
| MEOP (Seal data) | NetCDF | TEMP_ADJUSTED, PSAL_ADJUSTED, PRES_ADJUSTED, LATITUDE, LONGITUDE, JULD |
Argo floats struggle under sea ice because they can’t surface to transmit data. Seals, however, routinely: Dive under ice shelves Surface through breathing holes Traverse regions inaccessible to ships This makes them uniquely effective in: Antarctic coastal zones Seasonal ice regions Marginal ice zones
Depth profiles often reach 500–2000 m Data includes temperature, salinity, pressure Some tags also capture oxygen and fluorescence Sampling is opportunistic (driven by animal behavior)
1. Spatial bias Data is constrained by animal movement patterns Certain regions (e.g., tropics) are underrepresented because suitable species are absent 2. Lack of control You cannot “task” a seal to sample a specific location or time Sampling is irregular and non-uniform 3. Sensor limitations Payload size restricts: Sensor precision Battery life Calibration drift can be an issue compared to ship-based CTDs
2) Python Pipeline (xarray + pandas)
Install dependencies
pip install xarray netCDF4 pandas numpy scipy
Step A: Load datasets
import xarray as xr
import pandas as pd
import numpy as np
# Load WOD
wod = xr.open_dataset("wod_data.nc")
# Load MEOP (seal data)
meop = xr.open_dataset("meop_seal_data.nc")
Step B: Standardize variable names
# WOD standard
wod_df = wod.to_dataframe().reset_index()
wod_df = wod_df.rename(columns={
'depth': 'DEPTH',
'temperature': 'TEMP',
'salinity': 'PSAL'
})
# MEOP standardization
meop_df = meop.to_dataframe().reset_index()
meop_df = meop_df.rename(columns={
'PRES_ADJUSTED': 'DEPTH',
'TEMP_ADJUSTED': 'TEMP',
'PSAL_ADJUSTED': 'PSAL',
'LATITUDE': 'LAT',
'LONGITUDE': 'LON',
'JULD': 'TIME'
})
Step C: Quality control filtering
MEOP includes QC flags (critical step often skipped—don't skip it).
# Keep only high-quality measurements (QC = 1 or 2)
if 'PSAL_ADJUSTED_QC' in meop:
meop_df = meop_df[meop_df['PSAL_ADJUSTED_QC'].isin([1, 2])]
# Drop NaNs
meop_df = meop_df.dropna(subset=['TEMP', 'PSAL', 'DEPTH'])
Step D: Convert pressure → depth
Seal data uses pressure (dbar), not depth.
def pressure_to_depth(p):
return p # ~1 dbar ≈ 1 m (acceptable approximation)
meop_df['DEPTH'] = pressure_to_depth(meop_df['DEPTH'])
gsw library.
Step E: Merge datasets
# Align columns
common_cols = ['TIME', 'LAT', 'LON', 'DEPTH', 'TEMP', 'PSAL']
wod_df = wod_df[common_cols]
meop_df = meop_df[common_cols]
# Combine
combined = pd.concat([wod_df, meop_df], ignore_index=True)
Step F: Interpolate onto common depth grid
depth_grid = np.arange(0, 2000, 10)
def interpolate_profile(df):
return df.set_index('DEPTH').reindex(depth_grid).interpolate()
combined_interp = (
combined.groupby(['TIME', 'LAT', 'LON'])
.apply(interpolate_profile)
.reset_index()
)
3) R Pipeline (tidyverse + ncdf4)
Install packages
install.packages(c("ncdf4", "tidyverse", "oce"))
Step A–E: Load, Extract, QC, Merge, Interpolate
library(ncdf4)
library(dplyr)
library(oce)
wod <- nc_open("wod_data.nc")
meop <- nc_open("meop_seal_data.nc")
wod_df <- data.frame(
TEMP = ncvar_get(wod, "temperature"),
PSAL = ncvar_get(wod, "salinity"),
DEPTH = ncvar_get(wod, "depth"),
LAT = ncvar_get(wod, "lat"),
LON = ncvar_get(wod, "lon"),
TIME = ncvar_get(wod, "time")
)
meop_df <- data.frame(
TEMP = ncvar_get(meop, "TEMP_ADJUSTED"),
PSAL = ncvar_get(meop, "PSAL_ADJUSTED"),
DEPTH = ncvar_get(meop, "PRES_ADJUSTED"),
LAT = ncvar_get(meop, "LATITUDE"),
LON = ncvar_get(meop, "LONGITUDE"),
TIME = ncvar_get(meop, "JULD")
)
qc <- ncvar_get(meop, "PSAL_ADJUSTED_QC")
meop_df <- meop_df %>%
filter(qc %in% c(1, 2)) %>%
na.omit()
combined <- bind_rows(wod_df, meop_df)
interp_profile <- function(df) {
approx(df$DEPTH, df$PSAL, xout = seq(0, 2000, 10))$y
}
4) Advanced: Data Assimilation Readiness
- Convert to CF-compliant NetCDF
- Add metadata:
source = "WOD + MEOP"instrument = "CTD / Animal-borne"
- Use formats compatible with: ROMS, HYCOM, ECCO
5) Common Pitfalls
- Ignoring QC flags → corrupt salinity fields
- Mixing pressure & depth incorrectly
- No temporal alignment (seal data is irregular)
- Double counting profiles in overlapping regions
- Bias from animal movement (not random sampling)
No comments:
Post a Comment