Paired Data
This guide covers reading, writing, and managing paired data records in HEC-DSS files using pydsstools.
Paired data records store tabular relationships between an independent variable (x-axis) and one or more dependent variables (y-axis curves). Common uses include stage-discharge rating curves, frequency-flow relationships, elevation-area-volume tables, and damage-stage functions.
Key Concepts
Concept |
Description |
|---|---|
Paired Data |
A table of x-values mapped to one or more y-value curves. DSS record type 200 (float) or 205 (double). |
PairedDataContainer |
Write-side container. Holds pathname, shape, x/y data, units, types, and labels before writing to DSS. |
PairedDataStruct |
Read-side structure returned by the Cython layer. Wraps the C |
Curve |
A single column of y-values sharing the same x-ordinates. A paired data record can hold many curves. |
Label |
An optional name for each curve (e.g., “1999”, “Stage”, “Flow”). |
Shape |
|
Window |
An index tuple for reading or writing a sub-region of the table. |
Preallocated Record |
A paired data record with space reserved for curves to be filled in later, one at a time. |
Data Layout
Curve 0 Curve 1 Curve 2
x[0] y[0,0] y[0,1] y[0,2]
x[1] y[1,0] y[1,1] y[1,2]
x[2] y[2,0] y[2,1] y[2,2]
... ... ... ...
x[rows-1] y[r-1,0] y[r-1,1] y[r-1,2]
All curves share the same x-ordinates.
Internally, curves are stored as rows in the C array (each row = one curve). The Python API transposes this so that each curve becomes a column in a DataFrame.
Example 1 — Read Paired Data as DataFrame
The most common use case. read_pd() returns a pandas.DataFrame with
x-values as the index, curves as columns, and a two-level column index
(primary name and label).
from pydsstools.heclib.dss import HecDss
dss_file = "sample.dss"
pathname = "/PAIREDDATA/COWLITZ/FREQ-FLOW////"
with HecDss.Open(dss_file) as fid:
df = fid.read_pd(pathname)
print(df)
print("Index (x-values):", df.index.tolist())
print("Column labels:", df.columns.get_level_values("labels").tolist())
What the DataFrame looks like:
y0
labels 1999
x_data
0.950000 30.0
0.800000 40.0
0.600000 54.0
0.500000 60.0
...
Column index structure:
The DataFrame has a MultiIndex on columns with two levels:
primary— sequential identifier (y0,y1,y2, …)labels— the curve label string from the DSS record
This design ensures columns always have a unique programmatic name (y0, y1)
even when labels are missing or duplicated.
Example 2 — Read Paired Data with a Window
Read a subset of rows and curves using the window parameter. The window
is a 4-tuple of 0-based, inclusive indices:
(row_start, row_end, col_start, col_end).
with HecDss.Open(dss_file) as fid:
# Read rows 2-5 (inclusive), all curves
df = fid.read_pd(pathname, window=(2, 5, 0, None))
print(df)
# Read all rows, first curve only
df = fid.read_pd(pathname, window=(0, None, 0, 0))
print(df)
# Read last 3 rows, last 2 curves (negative indices supported)
df = fid.read_pd(pathname, window=(-3, None, -2, None))
print(df)
Window indexing rules:
Feature |
Behavior |
|---|---|
Base |
0-based |
Bounds |
Inclusive at both ends |
|
Defaults to first index (0) |
|
Defaults to last index |
Negative indices |
Wrapped Python-style ( |
Overflow end |
Clipped to last valid index |
Invalid range |
Raises |
Example 3 — Read as PairedDataStruct (Low-Level)
For lower-level access without the pandas dependency, pass dataframe=False
to get the raw PairedDataStruct.
with HecDss.Open(dss_file) as fid:
pds = fid.read_pd(pathname, dataframe=False)
print("Shape:", pds.shape) # (rows, cols)
print("X units:", pds.x_units) # e.g., "ft"
print("Y units:", pds.y_units) # e.g., "cfs"
print("X type:", pds.x_type) # e.g., "linear"
print("Y type:", pds.y_type) # e.g., "linear"
print("Labels:", pds.y_labels) # e.g., ["1999"]
x, y, labels = pds.get_data()
# x: shape (1, rows) — single row of x-ordinates
# y: shape (cols, rows) — each row is one curve
PairedDataStruct properties:
Property |
Type |
Description |
|---|---|---|
|
array (1, rows) |
X-ordinate values (shared by all curves) |
|
array (cols, rows) |
Y-values; each row is one curve |
|
tuple |
(rows, cols) |
|
int |
Number of data points per curve |
|
int |
Number of curves |
|
str |
Units of the independent axis |
|
str |
Units of the dependent axis |
|
str |
Type of x-axis (e.g., “linear”) |
|
str |
Type of y-axis (e.g., “linear”) |
|
list[str] |
Label for each curve |
Example 4 — Query Record Info and Labels
Get metadata about a paired data record without reading the full dataset.
with HecDss.Open(dss_file) as fid:
# Record dimensions and metadata
info = fid.pd_info(pathname)
print(f"Curves: {info['curve_no']}")
print(f"Points per curve: {info['data_no']}")
print(f"Label size: {info['label_size']}")
# Label mapping (primary name -> label string)
labels = fid.read_pd_labels(pathname)
print(labels) # e.g., {'y0': '1999'}
pd_info() returns:
Key |
Type |
Description |
|---|---|---|
|
int |
Number of curves (columns) |
|
int |
Number of data points (rows) |
|
int |
DSS data type code (200 = float, 205 = double) |
|
int |
Average label size in characters |
Example 5 — Write Paired Data with PairedDataContainer
Build a PairedDataContainer, populate it with data, and write to DSS.
from pydsstools.core import PairedDataContainer
from pydsstools.heclib.dss import HecDss
dss_file = "output.dss"
pathname = "/BASIN/LOCATION/STAGE-FLOW///EXAMPLE/"
rows = 5
curves = 2
with HecDss.Open(dss_file) as fid:
pdc = PairedDataContainer(pathname, (rows, curves))
pdc.x_data = [100, 200, 300, 400, 500]
pdc.y_data = [
[10, 20, 30, 40, 50], # Curve 0
[15, 25, 35, 45, 55], # Curve 1
]
pdc.x_units = "ft"
pdc.x_type = "linear"
pdc.y_units = "cfs"
pdc.y_type = "linear"
pdc.y_labels = ["Rating 2020", "Rating 2021"]
fid.put_pd(pdc)
PairedDataContainer shape convention:
shape = (rows, cols) where:
rows= number of data points per curve (length ofx_data)cols= number of curves (number of sub-arrays iny_data)
y_data is set as a list of lists (or 2-D array) where each sub-list is one
curve. Internally this is stored as a (cols, rows) C-contiguous float32
array.
Example 6 — Write Paired Data from a DataFrame
Pass a pathname and a DataFrame directly to put_pd(). The DataFrame index
becomes x-data, and each column becomes a curve.
import pandas as pd
from pydsstools.heclib.dss import HecDss
dss_file = "output.dss"
pathname = "/BASIN/LOCATION/STAGE-FLOW///FROM-DF/"
df = pd.DataFrame(
{
"Rating 2020": [10, 20, 30, 40, 50],
"Rating 2021": [15, 25, 35, 45, 55],
},
index=[100, 200, 300, 400, 500],
)
with HecDss.Open(dss_file) as fid:
fid.put_pd(
pathname,
y_data=df,
x_units="ft",
x_type="linear",
y_units="cfs",
y_type="linear",
)
How the DataFrame is mapped:
DataFrame Part |
DSS Field |
|---|---|
|
x-ordinates |
Each column |
One curve of y-values |
Column names |
Curve labels |
|
Internal |
If the DataFrame has a MultiIndex on columns with a level named "labels",
those labels are used instead of the column names.
Example 7 — Preallocate and Fill Curves Individually
For large datasets or incremental writes, preallocate an empty record and fill it one curve at a time.
Step 1 — Preallocate
from pydsstools.heclib.dss import HecDss
dss_file = "output.dss"
pathname = "/PAIRED/RESERVOIR/ELEV-AREA///PREALLOC/"
rows = 100
cols = 5
with HecDss.Open(dss_file) as fid:
fid.preallocate_pd(
pathname,
shape=(rows, cols),
x_units="ft",
y_units="acres",
label_size=31, # characters reserved per label
)
label_size controls how many characters are allocated for each curve label.
This determines the maximum label length; labels longer than this are
truncated. The minimum is 12 characters.
Step 2 — Write individual curves
with HecDss.Open(dss_file) as fid:
# Write full curve to column 0
y_values = [i * 2.0 for i in range(rows)]
fid.put_pd(pathname, col_index=0, y_data=y_values, y_labels=["Pool Area"])
# Write full curve to column 1, keeping default label
y_values = [i * 3.0 for i in range(rows)]
fid.put_pd(pathname, col_index=1, y_data=y_values)
# Write partial curve to column 2 (rows 10-19 only)
partial = [99.0] * 10
fid.put_pd(pathname, col_index=2, y_data=partial, window=(10, 19))
Key points about preallocated records:
col_indexis 0-based in the Python API (converted to 1-based internally).windowfor partial writes is(row_start, row_end), 0-based, inclusive.If
y_labelsis omitted or empty, the existing label is preserved.If
y_labelsis provided, the label is updated (truncated tolabel_size).The number of values in
y_datamust not exceed the available row range.
Example 8 — Round-Trip: Read, Modify, Write Back
Read an existing record, modify it, and write it back under a new pathname.
import numpy as np
from pydsstools.heclib.dss import HecDss
dss_file = "sample.dss"
pathname_in = "/PAIREDDATA/COWLITZ/FREQ-FLOW////"
pathname_out = "/PAIREDDATA/COWLITZ/FREQ-FLOW///MODIFIED/"
with HecDss.Open(dss_file) as fid:
# Read
df = fid.read_pd(pathname_in)
# Modify — scale all y-values by 1.1
df.iloc[:, :] = df.values * 1.1
# Write back under a new F-part
fid.put_pd(
pathname_out,
y_data=df,
x_units="",
x_type="linear",
y_units="cfs",
y_type="linear",
)
When writing back a DataFrame that was originally read with read_pd(), the
MultiIndex column structure (with the "labels" level) is automatically
picked up by put_pd(), preserving the original curve labels.
DSS Pathname Structure for Paired Data
/A-Part/B-Part/C-Part/D-Part/E-Part/F-Part/
Part |
Typical Use |
Example |
|---|---|---|
A |
Collection or project |
|
B |
Location |
|
C |
Parameter pair |
|
D |
Date (often empty for paired data) |
|
E |
Date (often empty for paired data) |
|
F |
Version or variant |
|
For paired data, D-part and E-part are typically empty since the data represents a relationship rather than a time series.
API Summary
Reading
Method |
Returns |
Description |
|---|---|---|
|
|
Read all data as pandas DataFrame. |
|
|
Read a sub-region. |
|
|
Read as low-level structure. |
|
|
Get |
|
|
Get record dimensions and metadata. |
Writing
Method |
Description |
|---|---|
|
Write a |
|
Write from a DataFrame. |
|
Write one curve to a preallocated record. |
|
Create an empty record for incremental filling. |
PairedDataContainer Construction
from pydsstools.core import PairedDataContainer
pdc = PairedDataContainer(pathname, (rows, curves))
pdc.x_data = [...] # list, tuple, or ndarray of length rows
pdc.y_data = [[...], [...]] # list of curves, each of length rows
pdc.x_units = "ft" # independent axis units
pdc.x_type = "linear" # independent axis type
pdc.y_units = "cfs" # dependent axis units
pdc.y_type = "linear" # dependent axis type
pdc.y_labels = ["A", "B"] # one label per curve (optional)
Accepted input types for x_data: list, tuple, numpy.ndarray, or
array.array. Internally converted to float32.
Accepted input types for y_data: list of lists, 2-D numpy.ndarray, or
tuple of lists. A 1-D input is reshaped to a single curve. Internally
converted to a (cols, rows) float32 array.