---
title: "Data formats"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data formats}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---


The NeuroDecodeR (NDR) uses two very similar data formats called `raster format`
and `binned format`. For almost all analysis, one starts by saving data from
each site in `raster format`. One then converts the data to `binned format`
using the `create_binned_data()` which has the data from all the sites in a
single data frame at a coarser temporal resolution that is stored in a single
file. The `binned format` data is then used in all subsequent decoding analyses.
More information about what is required to have data in these specified formats
is described below.


## Raster format

Raster format data contains the data at the highest temporal resolution. For
raster data, there is a separate file that contains a data frame of data each
site (e.g., for electrophysiology experiments there is a separate file for each
single neuron, for EEG experiments there is a separate file for each EEG
channel, etc.). The reason for having data from each site in a separate file is
to prevent memory from running out of memory when trying to load data from many
sites when the data is at a high temporal resolution.

For raster format data, the number of rows in the data frame correspond to the
number of trials in the experiment. Data that is in raster format is a data
frame that must contain variables (columns) that start with the following
prefixes:

1. `labels.XXX`  These variables contain labels of which experimental conditions
were shown on a given trial.

2. `time.XXX_YYY`  These variables contain the data for a given time, where
where XXX is the start time of the data in a particular bin and YYY is the end
time. The time interval should be specified such in the form [XXX, YYY), so that
the start time is a closed interval and the end time is an open interval. Thus,
for data that is recording continuously, the value YYY of one bin, would be
equal to the value XXX for the next bin (e.g., `time.100_101`, `time.101_102`,
`time.102_103`, etc.).


There can also be two additional optional variables in a raster format data frame which are: 

3. `site_info.XXX` These variables contain additional meta data about the site.
For example, one could have a variable called `site_info.brain_area` which
indicates which brain region a given site came from. All rows for a given
`site_info.XXX` variable typically have the same value.

4. `trial_number` This variable specifies a unique number for each row
indicating which trial a given row of data came from. This is useful for data
where all sites were recorded simultaneously in order to allow one to do the
decoding on actual simultaneously recorded data (e.g., by using the `ds_basic()`
`create_simultaneously_recorded_populations` argument).

The class attribute for data in raster format should be set to
`attr(raster_data, "class") <- c("raster_data", "data.frame")`. This will enable
the plot() function to correctly plot data in raster format.


### Checking if data is in valid raster format

To test whether data correctly conforms to the requirements of raster format,
one can use the internal function `NeuroDecodeR:::test_valid_raster_data_format()`.


### Example raster-format data

Below is an example of raster format data file from the [Zhang-Desimone 7 object
data set](datasets.html).


```{r load_raster_file}

raster_dir_name <- file.path(system.file("extdata", package = "NeuroDecodeR"), "Zhang_Desimone_7object_raster_data_small_rda")
full_file_name <- file.path(raster_dir_name, "bp1001spk_01A_raster_data.rda")

# test the file is in valid raster format
NeuroDecodeR::test_valid_raster_format(full_file_name)

# load the data to see the variables in it
load(full_file_name)
head(raster_data[, 1:10])

```


## Binned format


Binned format data contains data from multiple sites (e.g., data from many
neurons, EEG channels, etc.). Data that is in binned format is very similar to
data that is in raster format except that it contains information from multiple
sites and usually contains the information at a coarser temporal resolution. For
example, binned data would typically contain firing rates over some time
interval sampled at a lower rate, as opposed to raster format data that would
typically contain individual spikes sampled at a higher rate. Binned format data
is typically created from raster format data using the function
`create_binned_data()` which converts a directory of raster format files into a
binned-format file that is used in subsequent decoding analyses.

Binned format data must be in a data frame where the number of rows in the data
frame correspond to the number of trial in all experimental recording sessions
across all sites.  The binned format data frame must also contain the variables
that start with the following prefixes:

1. `siteID.XXX` A unique number indicating a site a given row of data corresponds to.
These are typically automatically generated by the `create_binned_data()` function.

2. `labels.XXX`  These variables contain labels of which experimental conditions
occurred on a given trial. These are typically copied from the raster data when
`create_binned_data()` is called.

3. `time.XXX_YYY`  These variables contain data in a time range from [XXX, YYY).
These values are typically derived from the raster data `time.XXX_YYY` values when
the `create_binned_data()` is called.


There can also be two additional optional variables in a binned format data frame which are: 

4. `site_info.XXX` These variables contain additional meta data out the site.
For example, one could have a variable called `site_info.brain_area` which
indicated which brain region a given site came from. 

5. `trial_number` This is a variable that specifies a unique for each row
indicating which trial a given row of data came from. This is useful for data
where all sites were recorded simultaneously in order to allow one to do the
decoding on actual simultaneously recorded data (e.g., by using the `ds_basic()`
`create_simultaneously_recorded_populations` argument).


### Checking if data is in valid binned format

To test whether data correctly conforms to the requirements of binned format,
one can use the internal function `NeuroDecodeR:::test_valid_binned_data_format()`.


### Example binned-format data

Below is an example of binned format data file from the [Zhang-Desimone 7 object
data set](datasets.html).


```{r load_binned_file}

binned_file_name <- system.file("extdata/ZD_150bins_50sampled.Rda", package="NeuroDecodeR")


# test the file is in valid binned format using an internal function
NeuroDecodeR:::test_valid_binned_format(binned_file_name)


# load the data to see the variables in it
load(binned_file_name)
head(binned_data[, 1:10])


```