# Usage

The primary data object that all functionality is built around is the
{class}`pygsdata.GSData` object. This object stores all relevant data from
a global-signal measurement, and has useful methods for accessing parts of the data,
as well as reading/writing the data to standard formats (including HDF5 and ACQ).

The `GSData` object is considered to be immutable. This means that functions that
process the object return a *new* object with updated data. This makes reasoning about
the code much simpler, and makes it easier to write code that is reusable.
All processing functions that take in a GSData object and return a new one are
decorated with the `@gsregister` decorator, which registers the function to be
discoverable by the CLI (more on that later), but also enables automatic updating
of the `history` of the object, so no manual updating of the history is required.

## Reading/Writing Data

To read in data as a GSData object, simply use the `from_file` method:

```python
from pygsdata import GSData
data = GSData.from_file('data.acq', telescope_name="EDGES-low")
```

Notice that we passed the telescope name, which is a piece of (optional) metadata that
the ACQ file doesn't store. Any parameter to GSData that ACQ doesn't natively contain
can be passed in this way when constructing the object.
While ACQ is readable, the GSData object supports a native HDF5-based format which is
both faster to read, and is able to contain more metadata. We can write such a file:

```python
data.write_gsh5("data.gsh5")
```

This file can be read using the same method as above:

```python
data = GSData.from_file('data.gsh5')
```

Notice that here we didn't have to specify the `telescope_name` parameter, because
the file format contains this information.

## Updating the Object

As already stated, the GSData object is to be considered immutable. This means that you
can be confident that any function that "changes" your data object will in fact return
a *new* object with the updated data. Thus, you can keep a reference to the original
unchanged object if necessary. Despite this, if any arrays are the *same* between the
objects, then the memory will not be copied. Thus, if you were to inadvertantly in-place
modify one of the arrays, both objects would be affected. Don't do this.

The "official" way to update the object is to use the `update` method:

```python
data = data.update(data=data.data * 3, data_unit="uncalibrated")
```

This will return the new object. However, this doesn't update the history automatically.
The history can be updated by supplying a dictionary with at least a message:

```python
data = data.update(
    data=data.data * 3,
    data_unit="uncalibrated",
    history={"message": "Multiplied by 3"}
)
```

In actual fact, the history object that is added is a {class}`edges_analysis.pygsdata.Stamp`
object, which is a lightweight object that can be easily serialized to YAML, and adds
a default timestamp and set of code versions to the history. You can use one of these
directly if you wish:

```python
from pygsdata import Stamp
data = data.update(
    data=data.data * 3,
    data_unit="uncalibrated",
    history=Stamp(message="Multiplied by 3", timestamp=datetime.now())
)
```

If you write a function that updates a GSData object, it is better to include the
function name and the parameters it uses in the history:

```python
def multiply_by_3(data, data_unit):
    return data.update(
        data=data.data * 3,
        data_unit=data_unit,
        history={"function": "multiply_by_3", "parameters": {"data_unit": data_unit}}
    )

data = multiply_by_3(data, "uncalibrated")
```

However, if you are going to write functions that update the data, there is a better way
to do it, as we shall see now.

## Using the Register

There is a decorator defined that makes writing new functions that update GSData objects
simpler, called {func}`pygsdata.gsregister`.
This decorator does a few things: it registers the function into a global dictionary,
`GSDATA_PROCESSORS`, and it adds the function to the `history` of the object.
*Using* registered functions is simple: just call the function with the object as the
first argument, and any other parameters as keyword arguments. Since most
internally-defined functions have already been registered, you can use them out of the
box. For example:

```python
from pygsdata.select import select_freqs
from astropy import units as un

data = select_freqs(data, freq_range=(50*un.MHz, 100*un.MHz))
```

The returned `data` object has a different data-shape (it has frewer frequencies), and
the history contains a new entry. You can print that history:

```python
print(str(data.history))
```

Or just print the most recent addition to the history:

```python
print(str(data.history[-1]))
```

The `history` attribute also has a {meth}`pretty` method, which can be used with the
rich library to pretty-print the history:

```
from rich.console import Console
console = Console()
console.print(data.history.pretty())
```

Adding your own registered processor is simple -- just use the decorator over a function
with the correct signature:

```python
from pygsdata import gsregister, GSData

@gsregister("calibrate")
def pow_data(data: GSData, *, n: int=2) -> GSData:
    return data.update(data=data.data**n)
```

Note here that the first argument to the function is always a GSData instance, and the
return value is always another GSData instance. All other parameters should be keyword
arguments, and can in principle be anything, but it is best to make them types that can
easily be understood by YAML (this helps with writing out the history, and also for
defining workflows for the CLI).
Note also that the `gsregister` decorator takes a single argument: the *kind* of
processor. This is important, because it enables the workflow to make judgments on how
to call the function in certain cases, and also makes it possible to find subsets of the
available processors.


## Making Plots

The {mod}`pygsdata.plots` module contains functions that can be used to make plots
from a GSData object. For example, let's say we have a GSData file:

```python
from pygsdata import GSData, plots
data = GSData.from_file('2015_202_00.gsh5')

# Plot a flagged waterfall of the data (whether it's residuals or spectra)
plots.plot_waterfall(data)

# Plot the same but show the nsamples intsead of data
plots.plot_waterfall(data, attribute='nsamples')

# Plot the data residuals (if they exist) and don't apply any flags.
plots.plot_waterfall(data, attribute='resids', which_flags=())
```