Information theoretic methods are now widely used for the analysis of spike train data. However, developing robust implementations of these methods can be tedious and time-consuming. In order to facilitate further adoption of these methods, we have developed the Spike Train Analysis Toolkit, a software package which implements several information-theoretic spike train analysis techniques. This implementation behaves like a typical Matlab toolbox, but the underlying computations are coded in C and optimized for efficiency.

In describing the capabilities of the toolkit, we further distinguish between what we call *information methods* and *entropy methods*.

## Getting started

How can I tell how much information is conveyed by the neural responses I have recorded?

- My experiments involve recording neural responses in two or more behavioral contexts, or in response to two or more stimuli. To what extent do my neural data represent, or encode, these categories?
*Use the direct categorical information method, metric space method, or binless method.*
- My experiments record repeated responses to the same continuously changing stimulus. How can I measure the amount of information that the neural responses contain?
*Use the direct formal information method or the context tree method.*

To help you get started, we have created demonstrations of the various information methods, which you should use as a template for your own analyses. The demonstrations illustrate how to select and use these methods, and what to look for in the output. They also guide the selection and use of the associated entropy methods, options, and parameters.

The direct method makes the fewest assumptions about the nature of the neural code, but generally requires the most data (e.g., hundreds of repeats for some stimuli). The metric space method and the binless method require only about 10 repeats per stimulus, but make assumptions about the nature of the neural code (see the references below). For multineuronal data, only the direct and metric space methods are applicable. The direct methods require one to consider time in discrete bins (i.e., data are symbol sequences), whereas the metric space and binless methods work with continuous time (i.e., data are point processes).

This is enough to get started; further detail on the various methods contained in the STAToolkit is presented below.

## Information methods

Information methods are those methods which estimate the mutual information between an ensemble of spike trains and some other experimental variable. We distinguish between *formal* and *attribute-specific* information, as proposed by Reich et al. (2001). Formal information concerns all aspects of the response that depend on the stimulus. It is estimated from the difference between the entropy of responses to an ensemble of temporally rich stimuli and the entropy of responses to an ensemble of repeated stimuli. Attribute-specific information refers to the amount of information that responses convey about a particular experimental parameter. If the parameter describes one of several discrete categories, we refer to it as *category-specific information*.

The current version contains implementations of four information methods:

- Direct method (formal and category-specific information)
- Strong, S.P., Koberle, R., de Ruyter van Steveninck, R.R. and Bialek, W. (1998). Entropy and Information in Neural Spike Trains.
*Physical Review Letters*. 80: 197-200.
- Metric space method (category-specific information)
- Victor, J.D., and Purpura, K.P. (1997). Metric-space analysis of spike trains: theory, algorithms and application.
*Network: Computation in Neural Systems*. 8: 127-164.

Aronov, D. (2003). Fast algorithm for the metric-space analysis of simultaneous responses of multiple single neurons. *Journal of Neuroscience Methods*, 124: 175-179.
- Binless method (category-specific information)
- Victor, J.D. (2002). Binless strategies for estimation of information from neural data.
*Physical Review E*. 66: 051903
- Context tree method (formal information)
- Kennel, M., Shlens, J., Abarbanel, H., and Chichilnisky, E.J. (2005) Estimating entropy rates with Bayesian confidence intervals.
*Neural Computation*, 2005: 17, 1531-1576.
- Shlens, J., Kennel, M., Abarbanel, H., and Chichilnisky, E.J. (2007) Estimating information rates in neural spike trains with confidence intervals.
*Neural Computation*, 2007: 19, 1683-1719.

The implementations of the direct method and the metric space have extensions for the analysis of simultaneously recorded spike trains.

## Entropy methods

Entropy methods are those methods that estimate entropy from a discrete histogram, a computation common to many information-theoretic methods. For the general user, we recommend adapting the demos for one's own data; the demos select appropriate entropy methods. The advanced user may wish to substitute other entropy methods, or to use the entropy methods as standalone modules (e.g., entropy1d). Entropy methods are chosen by including the appropriate code in the `entropy_estimation_method`

option (see information options and parameters and implied entropy options and parameters). The included methods are:

- Plug-in (
`plugin`

)
- This is the classical estimator, based on the entropy formula
*H* = Σ_{i} *p*_{i} log_{2}*p*_{i}.
- Treves-Panzeri-Miller-Carlton (
`tpmc`

)
- Treves, A. and Panzeri, S. (1995). The Upward Bias in Measures of Information Derived from Limited Data Samples.
*Neural Computation*, 7: 399-407.
- Miller, G.A. (1955). Note on the bias on information estimates.
*Information Theory in Psychology, Problems and Methods*. II-B: 95-100.
- Carlton, A.G. (1969). On the bias of information estimates.
*Psychological Bulletin*. 71: 108-109.
- Jackknife (
`jack`

)
- Efron, B. and Tibshirani, R.J. (1993).
*An introduction to the bootstrap*. Chapman & Hall.
- Ma bound (
`ma`

)
- Ma, S. (1981). Calculation of Entropy from Data of Motion.
*Journal of Statistical Physics*. 26: 221-240.

The version included in the toolkit is debiased, as presented in Strong et al. (1998).
- Best upper bound (
`bub`

)
- Paninski, L. (2003). Estimation of Entropy and Mutual Information.
*Neural Computation*. 15: 1191-1253.
- Chao-Shen (
`chaoshen`

)
- Chao, A. and Shen, T.-J. (2003). Nonparametric estimation of Shannon's index of diversity when there are unseen species in a sample.
*Environmental and Ecological Statistics*. 10: 429-443.
- Wolpert-Wolf—Bayesian with a Dirichlet prior (
`ww`

)
- Wolpert, D.H. and Wolf, D.R. (1995). Estimating functions of probability from a finite set of samples.
*Physical Review E*. 52: 6841-6854.

Erratum in *Physical Review E*. 54: 6973.
- Wolpert, D.H. and Wolf, D.R. (1994). Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy. arχiv comp-gas/9403001.
- Wolpert, D.H. and Wolf, D.R. (1994). Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, Chi-Squared, Covariance and other Statistics. arχiv comp-gas/9403002.
- Nemenman-Shafee-Bialek (
`nsb`

)
- Nemenman, I., Shafee, F., Bialek, W. (2002) Entropy and inference, revisited. In Dietterich, T.G., Becker, S., and Ghahramani, Z. eds. Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press. arXiv: physics/0108025.
- See also the original authors' NSB Entropy Estimation project page at SourceForge.

### Methods for estimating the variance of entropy estimates

The toolkit also provides estimates of the variance of entropy estimates, which can in turn be used to compute confidence limits. These results can be obtained by setting the option

`variance_estimation_method`

(see

information options). The jackknife (

`jack`

) and bootstrap (

`boot`

) methods can be applied to any entropy estimate. Additionally, the toolkit includes a variance estimate that is specific to the NSB entropy method (

`nsb_var`

), and may include other specific variance estimates in the future.

## Modular implementation of information methods

Each information method has a top level function which performs an analysis on an input data stucture. Each information method has been partitioned into modules corresponding to steps that provide useful intermediate results. They also include top-level functions for users that do not require flexibility. The table below depicts the five major information methods and the functions they call. Function `directformal`

performs a formal information analysis via the direct method; function `directcat`

performs a categorical information analysis via the direct method; function `metric`

performs a categorical information analysis via the metric space method; function `binless`

performs a categorical information analysis via the binless method; function `ctwmcmc`

performs a formal information analysis via the context-tree method.

All of the functions are documented in the function reference (note: opens in a new browser window).

Included demos give examples of how the top-level functions can be used.

## Inputs to the toolkit

We have developed a text-based input file format for the toolkit that is easy to generate. Users also have the option of bypassing the text-based file format and using another means to read the data into the Matlab input data structure.

Documentation fo the analysis options and parameters for information methods and entropy methods is available.

## Outputs from the toolkit

Estimated quantities are packaged in data structures with auxillary information such as variance estimates. See this page for more information.

## The future

This toolkit is one component of a larger endeavor in the field of computational neuroinformatics. We are in the process of integrating the toolkit with a Neurodatabase.org (a publicly-accessable neurophysiology database), developing a web-based analysis interface, and adapting the toolkit for a dedicated parallel cluster.

We are also working with members of the computational neuroscience community to incorporate their information theoretic techniques, as well as looking beyond information theory to other methodologies for analyzing neuroscience data. Please contact us if you would like to contribute.