Paired Data

This guide covers reading, writing, and managing paired data records in HEC-DSS files using pydsstools.

Paired data records store tabular relationships between an independent variable (x-axis) and one or more dependent variables (y-axis curves). Common uses include stage-discharge rating curves, frequency-flow relationships, elevation-area-volume tables, and damage-stage functions.

Key Concepts

Concept	Description
Paired Data	A table of x-values mapped to one or more y-value curves. DSS record type 200 (float) or 205 (double).
PairedDataContainer	Write-side container. Holds pathname, shape, x/y data, units, types, and labels before writing to DSS.
PairedDataStruct	Read-side structure returned by the Cython layer. Wraps the C `zStructPairedData` and exposes x/y arrays.
Curve	A single column of y-values sharing the same x-ordinates. A paired data record can hold many curves.
Label	An optional name for each curve (e.g., “1999”, “Stage”, “Flow”).
Shape	`(rows, cols)` — rows is the number of data points per curve, cols is the number of curves.
Window	An index tuple for reading or writing a sub-region of the table.
Preallocated Record	A paired data record with space reserved for curves to be filled in later, one at a time.

Data Layout

              Curve 0    Curve 1    Curve 2
x[0]          y[0,0]     y[0,1]     y[0,2]
x[1]          y[1,0]     y[1,1]     y[1,2]
x[2]          y[2,0]     y[2,1]     y[2,2]
 ...           ...        ...        ...
x[rows-1]    y[r-1,0]   y[r-1,1]   y[r-1,2]

All curves share the same x-ordinates.
Internally, curves are stored as rows in the C array (each row = one curve). The Python API transposes this so that each curve becomes a column in a DataFrame.

Example 1 — Read Paired Data as DataFrame

The most common use case. read_pd() returns a pandas.DataFrame with x-values as the index, curves as columns, and a two-level column index (primary name and label).

from pydsstools.heclib.dss import HecDss

dss_file = "sample.dss"
pathname = "/PAIREDDATA/COWLITZ/FREQ-FLOW////"

with HecDss.Open(dss_file) as fid:
    df = fid.read_pd(pathname)
    print(df)
    print("Index (x-values):", df.index.tolist())
    print("Column labels:", df.columns.get_level_values("labels").tolist())

What the DataFrame looks like:

                 y0
labels         1999
x_data
0.950000       30.0
0.800000       40.0
0.600000       54.0
0.500000       60.0
...

Column index structure:

The DataFrame has a MultiIndex on columns with two levels:

primary — sequential identifier (y0, y1, y2, …)
labels — the curve label string from the DSS record

This design ensures columns always have a unique programmatic name (y0, y1) even when labels are missing or duplicated.

Example 2 — Read Paired Data with a Window

Read a subset of rows and curves using the window parameter. The window is a 4-tuple of 0-based, inclusive indices: (row_start, row_end, col_start, col_end).

with HecDss.Open(dss_file) as fid:
    # Read rows 2-5 (inclusive), all curves
    df = fid.read_pd(pathname, window=(2, 5, 0, None))
    print(df)

    # Read all rows, first curve only
    df = fid.read_pd(pathname, window=(0, None, 0, 0))
    print(df)

    # Read last 3 rows, last 2 curves (negative indices supported)
    df = fid.read_pd(pathname, window=(-3, None, -2, None))
    print(df)

Window indexing rules:

Feature	Behavior
Base	0-based
Bounds	Inclusive at both ends
`None` for start	Defaults to first index (0)
`None` for end	Defaults to last index
Negative indices	Wrapped Python-style (`-1` = last)
Overflow end	Clipped to last valid index
Invalid range	Raises `IndexError`

Example 3 — Read as PairedDataStruct (Low-Level)

For lower-level access without the pandas dependency, pass dataframe=False to get the raw PairedDataStruct.

with HecDss.Open(dss_file) as fid:
    pds = fid.read_pd(pathname, dataframe=False)

    print("Shape:", pds.shape)          # (rows, cols)
    print("X units:", pds.x_units)      # e.g., "ft"
    print("Y units:", pds.y_units)      # e.g., "cfs"
    print("X type:", pds.x_type)        # e.g., "linear"
    print("Y type:", pds.y_type)        # e.g., "linear"
    print("Labels:", pds.y_labels)      # e.g., ["1999"]

    x, y, labels = pds.get_data()
    # x: shape (1, rows) — single row of x-ordinates
    # y: shape (cols, rows) — each row is one curve

PairedDataStruct properties:

Property	Type	Description
`x_data`	array (1, rows)	X-ordinate values (shared by all curves)
`y_data`	array (cols, rows)	Y-values; each row is one curve
`shape`	tuple	(rows, cols)
`rows`	int	Number of data points per curve
`cols`	int	Number of curves
`x_units`	str	Units of the independent axis
`y_units`	str	Units of the dependent axis
`x_type`	str	Type of x-axis (e.g., “linear”)
`y_type`	str	Type of y-axis (e.g., “linear”)
`y_labels`	list[str]	Label for each curve

Example 4 — Query Record Info and Labels

Get metadata about a paired data record without reading the full dataset.

with HecDss.Open(dss_file) as fid:
    # Record dimensions and metadata
    info = fid.pd_info(pathname)
    print(f"Curves: {info['curve_no']}")
    print(f"Points per curve: {info['data_no']}")
    print(f"Label size: {info['label_size']}")

    # Label mapping (primary name -> label string)
    labels = fid.read_pd_labels(pathname)
    print(labels)  # e.g., {'y0': '1999'}

pd_info() returns:

Key	Type	Description
`curve_no`	int	Number of curves (columns)
`data_no`	int	Number of data points (rows)
`dtype`	int	DSS data type code (200 = float, 205 = double)
`label_size`	int	Average label size in characters

Example 5 — Write Paired Data with PairedDataContainer

Build a PairedDataContainer, populate it with data, and write to DSS.

from pydsstools.core import PairedDataContainer
from pydsstools.heclib.dss import HecDss

dss_file = "output.dss"
pathname = "/BASIN/LOCATION/STAGE-FLOW///EXAMPLE/"
rows = 5
curves = 2

with HecDss.Open(dss_file) as fid:
    pdc = PairedDataContainer(pathname, (rows, curves))
    pdc.x_data = [100, 200, 300, 400, 500]
    pdc.y_data = [
        [10, 20, 30, 40, 50],       # Curve 0
        [15, 25, 35, 45, 55],       # Curve 1
    ]
    pdc.x_units = "ft"
    pdc.x_type = "linear"
    pdc.y_units = "cfs"
    pdc.y_type = "linear"
    pdc.y_labels = ["Rating 2020", "Rating 2021"]
    fid.put_pd(pdc)

PairedDataContainer shape convention:

shape = (rows, cols) where:

rows = number of data points per curve (length of x_data)
cols = number of curves (number of sub-arrays in y_data)

y_data is set as a list of lists (or 2-D array) where each sub-list is one curve. Internally this is stored as a (cols, rows) C-contiguous float32 array.

Example 6 — Write Paired Data from a DataFrame

Pass a pathname and a DataFrame directly to put_pd(). The DataFrame index becomes x-data, and each column becomes a curve.

import pandas as pd
from pydsstools.heclib.dss import HecDss

dss_file = "output.dss"
pathname = "/BASIN/LOCATION/STAGE-FLOW///FROM-DF/"

df = pd.DataFrame(
    {
        "Rating 2020": [10, 20, 30, 40, 50],
        "Rating 2021": [15, 25, 35, 45, 55],
    },
    index=[100, 200, 300, 400, 500],
)

with HecDss.Open(dss_file) as fid:
    fid.put_pd(
        pathname,
        y_data=df,
        x_units="ft",
        x_type="linear",
        y_units="cfs",
        y_type="linear",
    )

How the DataFrame is mapped:

DataFrame Part	DSS Field
`df.index`	x-ordinates
Each column	One curve of y-values
Column names	Curve labels
`df.values.T`	Internal `(cols, rows)` y-data array

If the DataFrame has a MultiIndex on columns with a level named "labels", those labels are used instead of the column names.

Example 7 — Preallocate and Fill Curves Individually

For large datasets or incremental writes, preallocate an empty record and fill it one curve at a time.

Step 1 — Preallocate

from pydsstools.heclib.dss import HecDss

dss_file = "output.dss"
pathname = "/PAIRED/RESERVOIR/ELEV-AREA///PREALLOC/"
rows = 100
cols = 5

with HecDss.Open(dss_file) as fid:
    fid.preallocate_pd(
        pathname,
        shape=(rows, cols),
        x_units="ft",
        y_units="acres",
        label_size=31,        # characters reserved per label
    )

label_size controls how many characters are allocated for each curve label. This determines the maximum label length; labels longer than this are truncated. The minimum is 12 characters.

Step 2 — Write individual curves

with HecDss.Open(dss_file) as fid:
    # Write full curve to column 0
    y_values = [i * 2.0 for i in range(rows)]
    fid.put_pd(pathname, col_index=0, y_data=y_values, y_labels=["Pool Area"])

    # Write full curve to column 1, keeping default label
    y_values = [i * 3.0 for i in range(rows)]
    fid.put_pd(pathname, col_index=1, y_data=y_values)

    # Write partial curve to column 2 (rows 10-19 only)
    partial = [99.0] * 10
    fid.put_pd(pathname, col_index=2, y_data=partial, window=(10, 19))

Key points about preallocated records:

col_index is 0-based in the Python API (converted to 1-based internally).
window for partial writes is (row_start, row_end), 0-based, inclusive.
If y_labels is omitted or empty, the existing label is preserved.
If y_labels is provided, the label is updated (truncated to label_size).
The number of values in y_data must not exceed the available row range.

Example 8 — Round-Trip: Read, Modify, Write Back

Read an existing record, modify it, and write it back under a new pathname.

import numpy as np
from pydsstools.heclib.dss import HecDss

dss_file = "sample.dss"
pathname_in  = "/PAIREDDATA/COWLITZ/FREQ-FLOW////"
pathname_out = "/PAIREDDATA/COWLITZ/FREQ-FLOW///MODIFIED/"

with HecDss.Open(dss_file) as fid:
    # Read
    df = fid.read_pd(pathname_in)

    # Modify — scale all y-values by 1.1
    df.iloc[:, :] = df.values * 1.1

    # Write back under a new F-part
    fid.put_pd(
        pathname_out,
        y_data=df,
        x_units="",
        x_type="linear",
        y_units="cfs",
        y_type="linear",
    )

When writing back a DataFrame that was originally read with read_pd(), the MultiIndex column structure (with the "labels" level) is automatically picked up by put_pd(), preserving the original curve labels.

DSS Pathname Structure for Paired Data

/A-Part/B-Part/C-Part/D-Part/E-Part/F-Part/

Part	Typical Use	Example
A	Collection or project	`PAIREDDATA`, `BASIN`
B	Location	`COWLITZ`, `RESERVOIR`
C	Parameter pair	`STAGE-FLOW`, `FREQ-FLOW`, `ELEV-AREA`
D	Date (often empty for paired data)
E	Date (often empty for paired data)
F	Version or variant	`1999`, `MODIFIED`, `PREALLOC`

For paired data, D-part and E-part are typically empty since the data represents a relationship rather than a time series.

API Summary

Reading

Method	Returns	Description
`fid.read_pd(pathname)`	`DataFrame`	Read all data as pandas DataFrame.
`fid.read_pd(pathname, window=(...))`	`DataFrame`	Read a sub-region.
`fid.read_pd(pathname, dataframe=False)`	`PairedDataStruct`	Read as low-level structure.
`fid.read_pd_labels(pathname)`	`dict`	Get `{primary: label}` mapping.
`fid.pd_info(pathname)`	`dict`	Get record dimensions and metadata.

Writing

Method	Description
`fid.put_pd(pdc)`	Write a `PairedDataContainer` object.
`fid.put_pd(pathname, y_data=df, ...)`	Write from a DataFrame.
`fid.put_pd(pathname, col_index=i, y_data=[...])`	Write one curve to a preallocated record.
`fid.preallocate_pd(pathname, shape=(...))`	Create an empty record for incremental filling.

PairedDataContainer Construction

from pydsstools.core import PairedDataContainer

pdc = PairedDataContainer(pathname, (rows, curves))
pdc.x_data = [...]            # list, tuple, or ndarray of length rows
pdc.y_data = [[...], [...]]   # list of curves, each of length rows
pdc.x_units = "ft"            # independent axis units
pdc.x_type = "linear"         # independent axis type
pdc.y_units = "cfs"           # dependent axis units
pdc.y_type = "linear"         # dependent axis type
pdc.y_labels = ["A", "B"]     # one label per curve (optional)

Accepted input types for x_data: list, tuple, numpy.ndarray, or array.array. Internally converted to float32.

Accepted input types for y_data: list of lists, 2-D numpy.ndarray, or tuple of lists. A 1-D input is reshaped to a single curve. Internally converted to a (cols, rows) float32 array.