# Paired Data

This guide covers reading, writing, and managing paired data records in
HEC-DSS files using **pydsstools**.

Paired data records store tabular relationships between an independent variable
(x-axis) and one or more dependent variables (y-axis curves). Common uses
include stage-discharge rating curves, frequency-flow relationships,
elevation-area-volume tables, and damage-stage functions.

## Key Concepts

| Concept | Description |
|---------|-------------|
| **Paired Data** | A table of x-values mapped to one or more y-value curves. DSS record type 200 (float) or 205 (double). |
| **PairedDataContainer** | Write-side container. Holds pathname, shape, x/y data, units, types, and labels before writing to DSS. |
| **PairedDataStruct** | Read-side structure returned by the Cython layer. Wraps the C `zStructPairedData` and exposes x/y arrays. |
| **Curve** | A single column of y-values sharing the same x-ordinates. A paired data record can hold many curves. |
| **Label** | An optional name for each curve (e.g., "1999", "Stage", "Flow"). |
| **Shape** | `(rows, cols)` — rows is the number of data points per curve, cols is the number of curves. |
| **Window** | An index tuple for reading or writing a sub-region of the table. |
| **Preallocated Record** | A paired data record with space reserved for curves to be filled in later, one at a time. |

### Data Layout

```
              Curve 0    Curve 1    Curve 2
x[0]          y[0,0]     y[0,1]     y[0,2]
x[1]          y[1,0]     y[1,1]     y[1,2]
x[2]          y[2,0]     y[2,1]     y[2,2]
 ...           ...        ...        ...
x[rows-1]    y[r-1,0]   y[r-1,1]   y[r-1,2]
```

- All curves share the same x-ordinates.
- Internally, curves are stored as rows in the C array (each row = one curve).
  The Python API transposes this so that each curve becomes a column in a
  DataFrame.

---

## Example 1 — Read Paired Data as DataFrame

The most common use case. `read_pd()` returns a `pandas.DataFrame` with
x-values as the index, curves as columns, and a two-level column index
(primary name and label).

```python
from pydsstools.heclib.dss import HecDss

dss_file = "sample.dss"
pathname = "/PAIREDDATA/COWLITZ/FREQ-FLOW////"

with HecDss.Open(dss_file) as fid:
    df = fid.read_pd(pathname)
    print(df)
    print("Index (x-values):", df.index.tolist())
    print("Column labels:", df.columns.get_level_values("labels").tolist())
```

**What the DataFrame looks like:**

```
                 y0
labels         1999
x_data
0.950000       30.0
0.800000       40.0
0.600000       54.0
0.500000       60.0
...
```

**Column index structure:**

The DataFrame has a `MultiIndex` on columns with two levels:

- `primary` — sequential identifier (`y0`, `y1`, `y2`, ...)
- `labels` — the curve label string from the DSS record

This design ensures columns always have a unique programmatic name (`y0`, `y1`)
even when labels are missing or duplicated.

---

## Example 2 — Read Paired Data with a Window

Read a subset of rows and curves using the `window` parameter. The window
is a 4-tuple of 0-based, inclusive indices:
`(row_start, row_end, col_start, col_end)`.

```python
with HecDss.Open(dss_file) as fid:
    # Read rows 2-5 (inclusive), all curves
    df = fid.read_pd(pathname, window=(2, 5, 0, None))
    print(df)

    # Read all rows, first curve only
    df = fid.read_pd(pathname, window=(0, None, 0, 0))
    print(df)

    # Read last 3 rows, last 2 curves (negative indices supported)
    df = fid.read_pd(pathname, window=(-3, None, -2, None))
    print(df)
```

**Window indexing rules:**

| Feature | Behavior |
|---------|----------|
| Base | 0-based |
| Bounds | Inclusive at both ends |
| `None` for start | Defaults to first index (0) |
| `None` for end | Defaults to last index |
| Negative indices | Wrapped Python-style (`-1` = last) |
| Overflow end | Clipped to last valid index |
| Invalid range | Raises `IndexError` |

---

## Example 3 — Read as PairedDataStruct (Low-Level)

For lower-level access without the pandas dependency, pass `dataframe=False`
to get the raw `PairedDataStruct`.

```python
with HecDss.Open(dss_file) as fid:
    pds = fid.read_pd(pathname, dataframe=False)

    print("Shape:", pds.shape)          # (rows, cols)
    print("X units:", pds.x_units)      # e.g., "ft"
    print("Y units:", pds.y_units)      # e.g., "cfs"
    print("X type:", pds.x_type)        # e.g., "linear"
    print("Y type:", pds.y_type)        # e.g., "linear"
    print("Labels:", pds.y_labels)      # e.g., ["1999"]

    x, y, labels = pds.get_data()
    # x: shape (1, rows) — single row of x-ordinates
    # y: shape (cols, rows) — each row is one curve
```

**PairedDataStruct properties:**

| Property | Type | Description |
|----------|------|-------------|
| `x_data` | array (1, rows) | X-ordinate values (shared by all curves) |
| `y_data` | array (cols, rows) | Y-values; each row is one curve |
| `shape` | tuple | (rows, cols) |
| `rows` | int | Number of data points per curve |
| `cols` | int | Number of curves |
| `x_units` | str | Units of the independent axis |
| `y_units` | str | Units of the dependent axis |
| `x_type` | str | Type of x-axis (e.g., "linear") |
| `y_type` | str | Type of y-axis (e.g., "linear") |
| `y_labels` | list[str] | Label for each curve |

---

## Example 4 — Query Record Info and Labels

Get metadata about a paired data record without reading the full dataset.

```python
with HecDss.Open(dss_file) as fid:
    # Record dimensions and metadata
    info = fid.pd_info(pathname)
    print(f"Curves: {info['curve_no']}")
    print(f"Points per curve: {info['data_no']}")
    print(f"Label size: {info['label_size']}")

    # Label mapping (primary name -> label string)
    labels = fid.read_pd_labels(pathname)
    print(labels)  # e.g., {'y0': '1999'}
```

`pd_info()` returns:

| Key | Type | Description |
|-----|------|-------------|
| `curve_no` | int | Number of curves (columns) |
| `data_no` | int | Number of data points (rows) |
| `dtype` | int | DSS data type code (200 = float, 205 = double) |
| `label_size` | int | Average label size in characters |

---

## Example 5 — Write Paired Data with PairedDataContainer

Build a `PairedDataContainer`, populate it with data, and write to DSS.

```python
from pydsstools.core import PairedDataContainer
from pydsstools.heclib.dss import HecDss

dss_file = "output.dss"
pathname = "/BASIN/LOCATION/STAGE-FLOW///EXAMPLE/"
rows = 5
curves = 2

with HecDss.Open(dss_file) as fid:
    pdc = PairedDataContainer(pathname, (rows, curves))
    pdc.x_data = [100, 200, 300, 400, 500]
    pdc.y_data = [
        [10, 20, 30, 40, 50],       # Curve 0
        [15, 25, 35, 45, 55],       # Curve 1
    ]
    pdc.x_units = "ft"
    pdc.x_type = "linear"
    pdc.y_units = "cfs"
    pdc.y_type = "linear"
    pdc.y_labels = ["Rating 2020", "Rating 2021"]
    fid.put_pd(pdc)
```

**PairedDataContainer shape convention:**

`shape = (rows, cols)` where:

- `rows` = number of data points per curve (length of `x_data`)
- `cols` = number of curves (number of sub-arrays in `y_data`)

`y_data` is set as a list of lists (or 2-D array) where each sub-list is one
curve. Internally this is stored as a `(cols, rows)` C-contiguous float32
array.

---

## Example 6 — Write Paired Data from a DataFrame

Pass a pathname and a DataFrame directly to `put_pd()`. The DataFrame index
becomes x-data, and each column becomes a curve.

```python
import pandas as pd
from pydsstools.heclib.dss import HecDss

dss_file = "output.dss"
pathname = "/BASIN/LOCATION/STAGE-FLOW///FROM-DF/"

df = pd.DataFrame(
    {
        "Rating 2020": [10, 20, 30, 40, 50],
        "Rating 2021": [15, 25, 35, 45, 55],
    },
    index=[100, 200, 300, 400, 500],
)

with HecDss.Open(dss_file) as fid:
    fid.put_pd(
        pathname,
        y_data=df,
        x_units="ft",
        x_type="linear",
        y_units="cfs",
        y_type="linear",
    )
```

**How the DataFrame is mapped:**

| DataFrame Part | DSS Field |
|----------------|-----------|
| `df.index` | x-ordinates |
| Each column | One curve of y-values |
| Column names | Curve labels |
| `df.values.T` | Internal `(cols, rows)` y-data array |

If the DataFrame has a MultiIndex on columns with a level named `"labels"`,
those labels are used instead of the column names.

---

## Example 7 — Preallocate and Fill Curves Individually

For large datasets or incremental writes, preallocate an empty record and
fill it one curve at a time.

### Step 1 — Preallocate

```python
from pydsstools.heclib.dss import HecDss

dss_file = "output.dss"
pathname = "/PAIRED/RESERVOIR/ELEV-AREA///PREALLOC/"
rows = 100
cols = 5

with HecDss.Open(dss_file) as fid:
    fid.preallocate_pd(
        pathname,
        shape=(rows, cols),
        x_units="ft",
        y_units="acres",
        label_size=31,        # characters reserved per label
    )
```

`label_size` controls how many characters are allocated for each curve label.
This determines the maximum label length; labels longer than this are
truncated. The minimum is 12 characters.

### Step 2 — Write individual curves

```python
with HecDss.Open(dss_file) as fid:
    # Write full curve to column 0
    y_values = [i * 2.0 for i in range(rows)]
    fid.put_pd(pathname, col_index=0, y_data=y_values, y_labels=["Pool Area"])

    # Write full curve to column 1, keeping default label
    y_values = [i * 3.0 for i in range(rows)]
    fid.put_pd(pathname, col_index=1, y_data=y_values)

    # Write partial curve to column 2 (rows 10-19 only)
    partial = [99.0] * 10
    fid.put_pd(pathname, col_index=2, y_data=partial, window=(10, 19))
```

**Key points about preallocated records:**

- `col_index` is 0-based in the Python API (converted to 1-based internally).
- `window` for partial writes is `(row_start, row_end)`, 0-based, inclusive.
- If `y_labels` is omitted or empty, the existing label is preserved.
- If `y_labels` is provided, the label is updated (truncated to `label_size`).
- The number of values in `y_data` must not exceed the available row range.

---

## Example 8 — Round-Trip: Read, Modify, Write Back

Read an existing record, modify it, and write it back under a new pathname.

```python
import numpy as np
from pydsstools.heclib.dss import HecDss

dss_file = "sample.dss"
pathname_in  = "/PAIREDDATA/COWLITZ/FREQ-FLOW////"
pathname_out = "/PAIREDDATA/COWLITZ/FREQ-FLOW///MODIFIED/"

with HecDss.Open(dss_file) as fid:
    # Read
    df = fid.read_pd(pathname_in)

    # Modify — scale all y-values by 1.1
    df.iloc[:, :] = df.values * 1.1

    # Write back under a new F-part
    fid.put_pd(
        pathname_out,
        y_data=df,
        x_units="",
        x_type="linear",
        y_units="cfs",
        y_type="linear",
    )
```

When writing back a DataFrame that was originally read with `read_pd()`, the
MultiIndex column structure (with the `"labels"` level) is automatically
picked up by `put_pd()`, preserving the original curve labels.

---

## DSS Pathname Structure for Paired Data

```
/A-Part/B-Part/C-Part/D-Part/E-Part/F-Part/
```

| Part | Typical Use | Example |
|------|-------------|---------|
| A | Collection or project | `PAIREDDATA`, `BASIN` |
| B | Location | `COWLITZ`, `RESERVOIR` |
| C | Parameter pair | `STAGE-FLOW`, `FREQ-FLOW`, `ELEV-AREA` |
| D | Date (often empty for paired data) | ` ` |
| E | Date (often empty for paired data) | ` ` |
| F | Version or variant | `1999`, `MODIFIED`, `PREALLOC` |

For paired data, D-part and E-part are typically empty since the data
represents a relationship rather than a time series.

---

## API Summary

### Reading

| Method | Returns | Description |
|--------|---------|-------------|
| `fid.read_pd(pathname)` | `DataFrame` | Read all data as pandas DataFrame. |
| `fid.read_pd(pathname, window=(...))` | `DataFrame` | Read a sub-region. |
| `fid.read_pd(pathname, dataframe=False)` | `PairedDataStruct` | Read as low-level structure. |
| `fid.read_pd_labels(pathname)` | `dict` | Get `{primary: label}` mapping. |
| `fid.pd_info(pathname)` | `dict` | Get record dimensions and metadata. |

### Writing

| Method | Description |
|--------|-------------|
| `fid.put_pd(pdc)` | Write a `PairedDataContainer` object. |
| `fid.put_pd(pathname, y_data=df, ...)` | Write from a DataFrame. |
| `fid.put_pd(pathname, col_index=i, y_data=[...])` | Write one curve to a preallocated record. |
| `fid.preallocate_pd(pathname, shape=(...))` | Create an empty record for incremental filling. |

### PairedDataContainer Construction

```python
from pydsstools.core import PairedDataContainer

pdc = PairedDataContainer(pathname, (rows, curves))
pdc.x_data = [...]            # list, tuple, or ndarray of length rows
pdc.y_data = [[...], [...]]   # list of curves, each of length rows
pdc.x_units = "ft"            # independent axis units
pdc.x_type = "linear"         # independent axis type
pdc.y_units = "cfs"           # dependent axis units
pdc.y_type = "linear"         # dependent axis type
pdc.y_labels = ["A", "B"]     # one label per curve (optional)
```

**Accepted input types for `x_data`:** list, tuple, `numpy.ndarray`, or
`array.array`. Internally converted to float32.

**Accepted input types for `y_data`:** list of lists, 2-D `numpy.ndarray`, or
tuple of lists. A 1-D input is reshaped to a single curve. Internally
converted to a `(cols, rows)` float32 array.