Tips

If you have short tips on how to make use of the STAToolkit, please add them here. If your contribution is very lengthy, consider putting together a separate tutorial page (e.g., mytutorial), and making a reference to it on this page.

Utilities for summarizing a spike train data structure

Sometimes it is useful to provide a “snapshot” of a data structure X, i.e., a brief summary of how many sites (channels) and categories (conditions), how many spikes, their times, etc.

I wrote two utilities, one that summarizes the number of spikes, one that summarizes their times; both are provided (without warranty!) below.

The following Matlab code provides a snapshot of the sites (channels), categories (conditions), and number of spikes:

statin_summ.m
function counts=statin_summ(X,ifshow)
% counts=statin_summ(X,ifshow) summarizes a STAToolkit input
%
%  counts{category}(trial,site)=tally of counts 
%  ifshow=1 to show category-by-category summary (default)
%  ifshow=2 to show a min-max summary
%  ifshow=3 for both
%  
%   See also:  SWATCH2STATIN, TLS2SWATCH, STATIN_SUMMT.
%
if (nargin<=1) ifshow=1; end
%
if (ifshow>0)
    disp(sprintf('number of      sites: %3.0f',X.N));
    disp(sprintf('number of categories: %3.0f',X.M));
    disp('  categ   trials  counts on each channel')
    dstring=repmat('%7.0f ',1,X.N+1);
end
counts=cell(0);
trials=zeros(1,X.M);
for m=1:X.M
    trials(m)=size(X.categories(m,1).trials,1);
    counts_each=zeros(trials(m),X.N);
    for n=1:X.N
        for p=1:trials(m)
            counts_each(p,n)=length(X.categories(m,1).trials(p,n).list);
        end
    end
    counts{m}=counts_each;
    counts_site(m,:)=sum(counts_each,1);
    if (mod(ifshow,2)==1)
        disp(sprintf(cat(2,'%7.0f ',dstring),m,trials(m),counts_site(m,:)));
    end
end
if (mod(ifshow,2)==1)
    disp(sprintf(cat(2,'    all ',dstring),sum(trials),sum(counts_site)));
end
if (ifshow>=2)
    disp(sprintf(cat(2,'    min ',dstring),min(trials),min(counts_site)));
    disp(sprintf(cat(2,'    max ',dstring),max(trials),max(counts_site)));
    disp(sprintf(cat(2,'    med ',dstring),median(trials),median(counts_site)));   
end
return

The following Matlab code provides a snapshot of the sites (channels), categories (conditions), and times of spikes:

statin_summt.m
function times=statin_summt(X,ifshow)
% times=statin_summt(X,ifshow) summarizes the event times in a STAToolkit input
%
%  times{category}(firstlast,site)=first and last event times (NaN if none)
%  ifshow=1 to show category-by-category summary (default)
%  ifshow=2 to show a min-max summary
%  ifshow=3 for both
%  
%   See also:  SWATCH2STATIN, TLS2SWATCH, STATIN_SUMM.
%
if (nargin<=1) ifshow=1; end
%
if (ifshow>0)
    disp(sprintf('number of      sites: %3.0f',X.N));
    disp(sprintf('number of categories: %3.0f',X.M));
    disp('  categ   trials  [min  max] times on each channel')
    dstring=repmat('[%8.3f %8.3f]  ',1,X.N);
end
trials=zeros(1,X.M);
times=cell(0);
%make times_sites for each m
times_sites=repmat([Inf -Inf]',[1 X.N X.M]);
for m=1:X.M
    trials(m)=size(X.categories(m,1).trials,1);
    for n=1:X.N
        for p=1:trials(m)
            tvals=X.categories(m,1).trials(p,n).list;
            if length(tvals)>0
                times_sites(1,n,m)=min(times_sites(1,n,m),min(tvals));
                times_sites(2,n,m)=max(times_sites(2,n,m),max(tvals));
            end
        end
    end
    times{m}=times_sites(:,:,m);
    if (mod(ifshow,2)==1)
        disp(sprintf(cat(2,'%7.0f %7.0f  ',dstring),m,trials(m),times_sites(:,:,m)));
    end
end
if (ifshow>=2)
    disp(sprintf(cat(2,'    all %7.0f  ',dstring),sum(trials),...
        [min(times_sites(1,:,:),[],3);max(times_sites(2,:,:),[],3)]));
end
return

Jonathan D. Victor 2010/02/24 04:54

Exchange and Poisson Resampling

User Susan Travers asked about exchange resampling, a technique used in combination with the metric space analysis by the original authors, J.D. Victor and K.P. Purpura to produce surrogate datasets that match the original data in terms of spike counts in each trial, and average post-stimulus histograms for each condition. She wanted to include this analysis, which is not a part of the STAToolkit. I put Susan in touch with Jonathan Victor by email, who provided (without any warranty!) the following Matlab code.

Note that this code also provides Poisson resampling. Poisson resampling yields surrogate datasets that match the original data in terms of the average post-stimulus histograms for each condition. The Poisson-resampled datasets have spike counts that are Poisson-distributed, rather than matching the spike count distribution of the original data.

Both kinds of resampling (exchange and Poisson) destroy within-trial correlations, but preserve the firing rate envelope.

xin_res.m
function Xres=xin_res(X,mode)
% Xres=xin_res(X,mode) creates an resampled dataset of a single-channel STAToolkit
%   input structure X
%
% X: a spike train input structure
% mode
%   'exch' (default): exchange resampling: spikes within each category are
%      randomly swapped among trials, so that each trial in Xres has the same
%      number of spikes as in the original dataset
%   'pois': all spike times from a trial are randomly reassigned to all
%          trials, without replacement; resampled trials will no longer have the 
%          same number of spikes as the original trials
% Xres:  the resampled spike train
%
%   See also:  TLSINFO_SURRMAKE.
%
Xres=X;
if (nargin<=1) mode='exch'; end
for ic=1:X.M
    counts=[];
    times=[];
    indices=[];
    for itrial=1:X.categories(ic).P;
        nspikes=double(X.categories(ic).trials(itrial).Q); %this was int32
        counts=[counts nspikes];
        times=[times X.categories(ic).trials(itrial).list];
        indices=[indices repmat(itrial,1,nspikes)];
    end
    if strcmp(mode,'exch')
        indices=indices(randperm(sum(counts))); %resampled indices
    end
    if strcmp(mode,'pois')
        indices=ceil(double(X.categories(ic).P)*rand(1,sum(counts))); %resampled indices
    end
    for itrial=1:X.categories(ic).P;
        res_times=sort(times(find(indices==itrial)));
        Xres.categories(ic).trials(itrial).list=res_times;
        Xres.categories(ic).trials(itrial).Q=int32(length(res_times));
    end %itrial
end %ic
return

Note that this function takes a STAToolkit data structure, does exchange or Poisson resampling, and returns the resampled structure, but it only works for single-neuron (not multineuron) data. — Michael Repucci 2009/07/10 17:28

To Jackknife or Bootstrap

In response to a user inquiry, Jonathan Victor offered the following information:

You raise an interesting issue [to jackknife or bootstrap], and I can give you my reasons for what we do.

While it is true that in general statisticians prefer boot to jack, there is a specific reason for not wanting to do so here.

When you do a bootstrap, you will often have datasets in which you've chosen exactly the same spike train twice. This will never happen if you had collected more data. It's not a problem if, say, you were estimating firing rate, but if you are estimating information, it would tend to bias the result – since it would suggest that detailed temporal patterns can be reproduced. This bias does not arise with a jackknife.

Put another way – your point about having more surrogate data sets (as you do with the boot) is correct – it makes the error bars less variable – but they are not necessarily more reliable. That is, the additional surrogates (that you get with the boot) are of questionable value if the surrogates are not typical of the data – and bootstrap surrogates will always have more “clumpiness” than the original data.

Bottom line, I think that the jackknife is more conservative; one risks “fooling oneself” with all of the surrogates that you can generate via the bootstrap. But if you have a gross excess of data (whatever that is), then, in principle, the bootstrap would be better.

One final caveat – neither the jackknife nor the bootstrap give provably rigorous error bars, in that one of the formal conditions for them to work is violated – specifically, there are circumstances in which an infinitesimal change in a data point leads to a discrete jump in information. Both jackknife and bootstrap formally require continuity, and that kind of circumstance violates continuity. (But it is often the case that statistics are useful even if some formal condition is violated – e.g., Gaussianness – so we don't worry about this – it's just full disclosure.)

Michael Repucci 2009/08/31 18:00

The mysterious factor of N-1 in the jackknife

In response to a user inquiry, Jonathan Victor offered the following information:

There is a difference in the formulas for estimating the standard error of measurement for the jackknife and the bootstrap, and it might at first appear puzzling.

In each case, one calculates information from “resampled” datasets: in the jackknife, the resampled datasets consist of the original dataset with a single response deleted; in the bootstrap, the resampled datasets consist of random draws from the original dataset, with replacement. The next step, in each case, is to determine the standard deviation of these resampled estimates. For the bootstrap, this quantity, directly, can be taken as a standard deviation of the information estimate. But for the jackknife, it is multiplied by sqrt(N-1), where N is the number of trials.

The reason for what seems to be the extra factor of sqrt(N-1) in the jackknife is that each of the “leave-one-out” datasets is highly overlapping with the original data, so the leave-one-out estimates are highly correlated. The “extra factor” takes care of the effects of this correlation. It is required for the jackknife, but it should not be included for the bootstrap, or for resamplings that make use of shufflings or alterations of the spike trains themselves.

usage/tips.txt · Last modified: 2011/04/03 00:46 by Jonathan D. Victor