Paired Data

This guide covers reading, writing, and managing paired data records in HEC-DSS files using pydsstools.

Paired data records store tabular relationships between an independent variable (x-axis) and one or more dependent variables (y-axis curves). Common uses include stage-discharge rating curves, frequency-flow relationships, elevation-area-volume tables, and damage-stage functions.

Key Concepts

Concept

Description

Paired Data

A table of x-values mapped to one or more y-value curves. DSS record type 200 (float) or 205 (double).

PairedDataContainer

Write-side container. Holds pathname, shape, x/y data, units, types, and labels before writing to DSS.

PairedDataStruct

Read-side structure returned by the Cython layer. Wraps the C zStructPairedData and exposes x/y arrays.

Curve

A single column of y-values sharing the same x-ordinates. A paired data record can hold many curves.

Label

An optional name for each curve (e.g., “1999”, “Stage”, “Flow”).

Shape

(rows, cols) — rows is the number of data points per curve, cols is the number of curves.

Window

An index tuple for reading or writing a sub-region of the table.

Preallocated Record

A paired data record with space reserved for curves to be filled in later, one at a time.

Data Layout

              Curve 0    Curve 1    Curve 2
x[0]          y[0,0]     y[0,1]     y[0,2]
x[1]          y[1,0]     y[1,1]     y[1,2]
x[2]          y[2,0]     y[2,1]     y[2,2]
 ...           ...        ...        ...
x[rows-1]    y[r-1,0]   y[r-1,1]   y[r-1,2]
  • All curves share the same x-ordinates.

  • Internally, curves are stored as rows in the C array (each row = one curve). The Python API transposes this so that each curve becomes a column in a DataFrame.


Example 1 — Read Paired Data as DataFrame

The most common use case. read_pd() returns a pandas.DataFrame with x-values as the index, curves as columns, and a two-level column index (primary name and label).

from pydsstools.heclib.dss import HecDss

dss_file = "sample.dss"
pathname = "/PAIREDDATA/COWLITZ/FREQ-FLOW////"

with HecDss.Open(dss_file) as fid:
    df = fid.read_pd(pathname)
    print(df)
    print("Index (x-values):", df.index.tolist())
    print("Column labels:", df.columns.get_level_values("labels").tolist())

What the DataFrame looks like:

                 y0
labels         1999
x_data
0.950000       30.0
0.800000       40.0
0.600000       54.0
0.500000       60.0
...

Column index structure:

The DataFrame has a MultiIndex on columns with two levels:

  • primary — sequential identifier (y0, y1, y2, …)

  • labels — the curve label string from the DSS record

This design ensures columns always have a unique programmatic name (y0, y1) even when labels are missing or duplicated.


Example 2 — Read Paired Data with a Window

Read a subset of rows and curves using the window parameter. The window is a 4-tuple of 0-based, inclusive indices: (row_start, row_end, col_start, col_end).

with HecDss.Open(dss_file) as fid:
    # Read rows 2-5 (inclusive), all curves
    df = fid.read_pd(pathname, window=(2, 5, 0, None))
    print(df)

    # Read all rows, first curve only
    df = fid.read_pd(pathname, window=(0, None, 0, 0))
    print(df)

    # Read last 3 rows, last 2 curves (negative indices supported)
    df = fid.read_pd(pathname, window=(-3, None, -2, None))
    print(df)

Window indexing rules:

Feature

Behavior

Base

0-based

Bounds

Inclusive at both ends

None for start

Defaults to first index (0)

None for end

Defaults to last index

Negative indices

Wrapped Python-style (-1 = last)

Overflow end

Clipped to last valid index

Invalid range

Raises IndexError


Example 3 — Read as PairedDataStruct (Low-Level)

For lower-level access without the pandas dependency, pass dataframe=False to get the raw PairedDataStruct.

with HecDss.Open(dss_file) as fid:
    pds = fid.read_pd(pathname, dataframe=False)

    print("Shape:", pds.shape)          # (rows, cols)
    print("X units:", pds.x_units)      # e.g., "ft"
    print("Y units:", pds.y_units)      # e.g., "cfs"
    print("X type:", pds.x_type)        # e.g., "linear"
    print("Y type:", pds.y_type)        # e.g., "linear"
    print("Labels:", pds.y_labels)      # e.g., ["1999"]

    x, y, labels = pds.get_data()
    # x: shape (1, rows) — single row of x-ordinates
    # y: shape (cols, rows) — each row is one curve

PairedDataStruct properties:

Property

Type

Description

x_data

array (1, rows)

X-ordinate values (shared by all curves)

y_data

array (cols, rows)

Y-values; each row is one curve

shape

tuple

(rows, cols)

rows

int

Number of data points per curve

cols

int

Number of curves

x_units

str

Units of the independent axis

y_units

str

Units of the dependent axis

x_type

str

Type of x-axis (e.g., “linear”)

y_type

str

Type of y-axis (e.g., “linear”)

y_labels

list[str]

Label for each curve


Example 4 — Query Record Info and Labels

Get metadata about a paired data record without reading the full dataset.

with HecDss.Open(dss_file) as fid:
    # Record dimensions and metadata
    info = fid.pd_info(pathname)
    print(f"Curves: {info['curve_no']}")
    print(f"Points per curve: {info['data_no']}")
    print(f"Label size: {info['label_size']}")

    # Label mapping (primary name -> label string)
    labels = fid.read_pd_labels(pathname)
    print(labels)  # e.g., {'y0': '1999'}

pd_info() returns:

Key

Type

Description

curve_no

int

Number of curves (columns)

data_no

int

Number of data points (rows)

dtype

int

DSS data type code (200 = float, 205 = double)

label_size

int

Average label size in characters


Example 5 — Write Paired Data with PairedDataContainer

Build a PairedDataContainer, populate it with data, and write to DSS.

from pydsstools.core import PairedDataContainer
from pydsstools.heclib.dss import HecDss

dss_file = "output.dss"
pathname = "/BASIN/LOCATION/STAGE-FLOW///EXAMPLE/"
rows = 5
curves = 2

with HecDss.Open(dss_file) as fid:
    pdc = PairedDataContainer(pathname, (rows, curves))
    pdc.x_data = [100, 200, 300, 400, 500]
    pdc.y_data = [
        [10, 20, 30, 40, 50],       # Curve 0
        [15, 25, 35, 45, 55],       # Curve 1
    ]
    pdc.x_units = "ft"
    pdc.x_type = "linear"
    pdc.y_units = "cfs"
    pdc.y_type = "linear"
    pdc.y_labels = ["Rating 2020", "Rating 2021"]
    fid.put_pd(pdc)

PairedDataContainer shape convention:

shape = (rows, cols) where:

  • rows = number of data points per curve (length of x_data)

  • cols = number of curves (number of sub-arrays in y_data)

y_data is set as a list of lists (or 2-D array) where each sub-list is one curve. Internally this is stored as a (cols, rows) C-contiguous float32 array.


Example 6 — Write Paired Data from a DataFrame

Pass a pathname and a DataFrame directly to put_pd(). The DataFrame index becomes x-data, and each column becomes a curve.

import pandas as pd
from pydsstools.heclib.dss import HecDss

dss_file = "output.dss"
pathname = "/BASIN/LOCATION/STAGE-FLOW///FROM-DF/"

df = pd.DataFrame(
    {
        "Rating 2020": [10, 20, 30, 40, 50],
        "Rating 2021": [15, 25, 35, 45, 55],
    },
    index=[100, 200, 300, 400, 500],
)

with HecDss.Open(dss_file) as fid:
    fid.put_pd(
        pathname,
        y_data=df,
        x_units="ft",
        x_type="linear",
        y_units="cfs",
        y_type="linear",
    )

How the DataFrame is mapped:

DataFrame Part

DSS Field

df.index

x-ordinates

Each column

One curve of y-values

Column names

Curve labels

df.values.T

Internal (cols, rows) y-data array

If the DataFrame has a MultiIndex on columns with a level named "labels", those labels are used instead of the column names.


Example 7 — Preallocate and Fill Curves Individually

For large datasets or incremental writes, preallocate an empty record and fill it one curve at a time.

Step 1 — Preallocate

from pydsstools.heclib.dss import HecDss

dss_file = "output.dss"
pathname = "/PAIRED/RESERVOIR/ELEV-AREA///PREALLOC/"
rows = 100
cols = 5

with HecDss.Open(dss_file) as fid:
    fid.preallocate_pd(
        pathname,
        shape=(rows, cols),
        x_units="ft",
        y_units="acres",
        label_size=31,        # characters reserved per label
    )

label_size controls how many characters are allocated for each curve label. This determines the maximum label length; labels longer than this are truncated. The minimum is 12 characters.

Step 2 — Write individual curves

with HecDss.Open(dss_file) as fid:
    # Write full curve to column 0
    y_values = [i * 2.0 for i in range(rows)]
    fid.put_pd(pathname, col_index=0, y_data=y_values, y_labels=["Pool Area"])

    # Write full curve to column 1, keeping default label
    y_values = [i * 3.0 for i in range(rows)]
    fid.put_pd(pathname, col_index=1, y_data=y_values)

    # Write partial curve to column 2 (rows 10-19 only)
    partial = [99.0] * 10
    fid.put_pd(pathname, col_index=2, y_data=partial, window=(10, 19))

Key points about preallocated records:

  • col_index is 0-based in the Python API (converted to 1-based internally).

  • window for partial writes is (row_start, row_end), 0-based, inclusive.

  • If y_labels is omitted or empty, the existing label is preserved.

  • If y_labels is provided, the label is updated (truncated to label_size).

  • The number of values in y_data must not exceed the available row range.


Example 8 — Round-Trip: Read, Modify, Write Back

Read an existing record, modify it, and write it back under a new pathname.

import numpy as np
from pydsstools.heclib.dss import HecDss

dss_file = "sample.dss"
pathname_in  = "/PAIREDDATA/COWLITZ/FREQ-FLOW////"
pathname_out = "/PAIREDDATA/COWLITZ/FREQ-FLOW///MODIFIED/"

with HecDss.Open(dss_file) as fid:
    # Read
    df = fid.read_pd(pathname_in)

    # Modify — scale all y-values by 1.1
    df.iloc[:, :] = df.values * 1.1

    # Write back under a new F-part
    fid.put_pd(
        pathname_out,
        y_data=df,
        x_units="",
        x_type="linear",
        y_units="cfs",
        y_type="linear",
    )

When writing back a DataFrame that was originally read with read_pd(), the MultiIndex column structure (with the "labels" level) is automatically picked up by put_pd(), preserving the original curve labels.


DSS Pathname Structure for Paired Data

/A-Part/B-Part/C-Part/D-Part/E-Part/F-Part/

Part

Typical Use

Example

A

Collection or project

PAIREDDATA, BASIN

B

Location

COWLITZ, RESERVOIR

C

Parameter pair

STAGE-FLOW, FREQ-FLOW, ELEV-AREA

D

Date (often empty for paired data)

E

Date (often empty for paired data)

F

Version or variant

1999, MODIFIED, PREALLOC

For paired data, D-part and E-part are typically empty since the data represents a relationship rather than a time series.


API Summary

Reading

Method

Returns

Description

fid.read_pd(pathname)

DataFrame

Read all data as pandas DataFrame.

fid.read_pd(pathname, window=(...))

DataFrame

Read a sub-region.

fid.read_pd(pathname, dataframe=False)

PairedDataStruct

Read as low-level structure.

fid.read_pd_labels(pathname)

dict

Get {primary: label} mapping.

fid.pd_info(pathname)

dict

Get record dimensions and metadata.

Writing

Method

Description

fid.put_pd(pdc)

Write a PairedDataContainer object.

fid.put_pd(pathname, y_data=df, ...)

Write from a DataFrame.

fid.put_pd(pathname, col_index=i, y_data=[...])

Write one curve to a preallocated record.

fid.preallocate_pd(pathname, shape=(...))

Create an empty record for incremental filling.

PairedDataContainer Construction

from pydsstools.core import PairedDataContainer

pdc = PairedDataContainer(pathname, (rows, curves))
pdc.x_data = [...]            # list, tuple, or ndarray of length rows
pdc.y_data = [[...], [...]]   # list of curves, each of length rows
pdc.x_units = "ft"            # independent axis units
pdc.x_type = "linear"         # independent axis type
pdc.y_units = "cfs"           # dependent axis units
pdc.y_type = "linear"         # dependent axis type
pdc.y_labels = ["A", "B"]     # one label per curve (optional)

Accepted input types for x_data: list, tuple, numpy.ndarray, or array.array. Internally converted to float32.

Accepted input types for y_data: list of lists, 2-D numpy.ndarray, or tuple of lists. A 1-D input is reshaped to a single curve. Internally converted to a (cols, rows) float32 array.