Cosyne 2025 Tutorial - Eva Dyer - Foundations of Transformers in Neuroscience

By Cosyne Talks

Summary

Topics Covered

Scale Neural Data Across Brains
Transformers Enable All-to-All Neural Connections
Spike Events as Native Tokens Bypass Binning
TorchBrain Delivers Scalable Neural DL Tools

Full Transcript

ator at the sulkq institute for biological studies in San Diego and I am serving as this year's tutorial chair for cosine and in that capacity I would

like you to join me in welcoming Dr. Eva Dyer, an associate professor in the department of biomedical engineering at

Georgia Tech. For now, should I say for

Georgia Tech. For now, should I say for now? Um, some excitement, some exciting

now? Um, some excitement, some exciting uh announcements regarding affiliation soon. I

soon. I hope Dr. Dyer leads the neural data science lab at Georgia Tech, also known as the nerds lab. definitely one of the

top uh lab acronyms I've encountered focusing on datacentric AI representation learning and AI for science after earning her her bachelor's

natural computer engineering completing her masters and PhD in the same field at Rice uh Dr. Dyer serves as a research scientist in the department of physical

medicine and rehab at Northwestern um and in 2017 joined Georgia Tech and Emory as an assistant professor in biomedical engineering. Her interests

biomedical engineering. Her interests align the intersection of machine learning optimization and neuroscience and her lab develops computational methods for discovering the principles that govern the organization and

structure of the brain. She's been

recognized with a very numerous number of honors including the Sloan Research Fellowship, an NSF career award, the Mcnite Foundation Technological

Innovations and Neuroscience Award, and recently the the Sefar Azraeli Global Scholar Award. Let once again join me in

Scholar Award. Let once again join me in welcoming Dr. Dyer.

[Applause] Great. Thank you all for being here this

Great. Thank you all for being here this first day kicking things off with a bang. Um if you haven't already, we're

bang. Um if you haven't already, we're going to be having an interactive session um after the kind of introduction that I'll give at the beginning. And so um you can find the

beginning. And so um you can find the materials for the interact interactive part and collab notebooks here with this QR code um through our tutorial website.

So, please do open that up um and for when we get going a bit later.

Speak closer to the recording. Okay,

great. All right. So, I'm really excited to um tell you today about some work that we've been doing over the past year

and a half um to, as you'll discover, basically build out new tools for um for training deep learning models on neural data. And the topic of the tutorial will

data. And the topic of the tutorial will actually be the foundations of transformers in neuroscience. So we're

going to be spending some time going through the basics of transformers, motivating, you know, why transformers for neural data and in particular thinking about the advantages of these

tools as we start to scale up and build larger and larger models that can actually coales data from many different sources and many different brains. Um,

and so yeah, we'll talk about the kind of basics of transformers and then I'm going to be telling you all about these three new packages that we're releasing

um that are now in pip and um and talking about the kind of features of those and then we're all going to be able to actually get our hands dirty working through some examples of training some of these models. And

welcome in, welcome in. We have another group of folks. Um so before I get started I just wanted to start by just acknowledging the team of amazing folks

that have really made this possible. So

um first I'd like to acknowledge and thank u Medi Azabu and Venom Aurora.

Both of them have really been the um the life force behind the creation of torch brain um which I'll tell you about um which is kind of the culmination of

these different packages for deep learning for neuroscience um both co-creators. Medi unfortunately couldn't

co-creators. Medi unfortunately couldn't be here today. We're so sad that he's not with us. Um but due to all the travel regulations and crazy stuff happening um he wasn't able to make it

today. Venom is here. Um I'd also like

today. Venom is here. Um I'd also like to thank um Shiva Mahhat. So he also helped to co-create the materials that you'll all be experiencing today. Um and

then we also have an amazing group of TAs that will be going around answering questions and helping you all um with the materials. So um Cole, Divian,

the materials. So um Cole, Divian, Nanda, Jimang, Avery, and Jing Yun are here. Um thank you all for your

here. Um thank you all for your contributions. These are also folks that

contributions. These are also folks that have been working intimately with TorchBrain providing feedback um and really helping us to launch what you what we'll be able to share with you

today. And then we have a number of

today. And then we have a number of other contributors that helped us to you know with with the notebooks. So Alex,

Ian, Sergey, and Julie. And this is um as you'll see a really big and kind of collaborative effort. And so this is um

collaborative effort. And so this is um joint effort between my lab at Georgia Tech um Guom and Blake Richards at Mila um as well as Liam Panninsk's group at

Colombia um as part of the Arie Institute. So um if you can all join me

Institute. So um if you can all join me in just like giving everyone a big round of applause. Thank you. Thank

of applause. Thank you. Thank

you. So they're really the ones that made all this happen and I'm just excited to be able to tell you about it.

All right. Um I'm just going to kind of quickly go through our schedule for today just to kind of orient us. So the

first about 50 minutes or so or hour I'll be talking about transformers and their application in neuroscience. Um

and then at that point we'll start transitioning over to the collab notebooks and more interactive session.

And then in that we're going to focus we're gonna have two kind of main notebooks that you'll see on the the website that I shared at the beginning.

Um for those of you that came in a little bit later um you should be able to find a link to the website on the Hoova and the the cosign uh tutorial uh

link. And so we'll do two things. The

link. And so we'll do two things. The

first is really just kind of describing the tools and infrastructure to kind of create data sets within this framework. Um we'll get a lot of chance

framework. Um we'll get a lot of chance to visualize data, go through some examples, and then we'll have a little bit of time for you to start playing with it. Um there's going to be a break

with it. Um there's going to be a break from 2:15 to 2:45. And then um in the second part of

2:45. And then um in the second part of the tutorial, we're going to um go through some examples of building a few different deep learning models. So both

like a simple multi-layer perceptron as well as um two different transformer models. And then um and then we'll also

models. And then um and then we'll also talk about fine-tuning at the end. So

the idea that we could have a pre-trained model that we can then adapt to a new data set from a new brain.

Okay. So that kind of sets the stage.

you know, there's been a lot of progress in generative AI and AI in many different domains um that have given us some insights into the fact that scale

or the amount of data that we can actually ingest into these models can really matter and can really play a significant role. And so we think that

significant role. And so we think that scale is an important concept also in neuroscience because the brain as we all know is a very complex and very

highdimensional system. And so

highdimensional system. And so potentially as we're trying to learn mappings between the brain and behavior, this could be a very complex system or

function that we need to learn. And what

we know um in many domains of AI is that sometimes you need enough data to actually start to reveal very complex

nonlinear mappings. And so it may be

nonlinear mappings. And so it may be that you know looking at one data set or from one brain or from one particular configuration of neural states um may

not give us a full picture of what the brain is doing. And so we really advocate for and we'll talk about the

use of scaling as a potential um you know benefit for learning these complex functions. And so the way that we kind

functions. And so the way that we kind of envision this is that you know most data sets that we capture in neuroscience are from a limited number

of individuals right small number of subjects potentially um measuring the brain in particular configurations or certain states of

consciousness and only for some limited amount of time right we can't record a brain for um all of its lifetime yet

u maybe in C elegance. And so really we're seeing these kind of small subsets or pictures of what brain activity looks

like. And so perhaps through, you know,

like. And so perhaps through, you know, combining and coalescing all of these sources of neural or information about the brain, we can start to kind of

stitch together a more coherent and unified picture of what the brain is doing and how it's computing. And I

think that this really kind of um you know this visualization provided by Dan Berman of part of the international brain laboratory visualization team

gives us some picture of what we might expect to see. So this is looking at um or visualizing the activity of neurons that were collected through the IBL of

over a hundred different mice or animals. And what we can see here is

animals. And what we can see here is that by starting to stitch together or put together all of these data sets, we start to get a much bigger um range of

coverage and ability to actually see what's going on across almost the whole brain during an underlying task. So uh

this I think just really nicely encapsulates some of the goals and ideas behind um this idea of scaling up and aggregating more and more sources of

data from the brain. Okay. And so this provides some

brain. Okay. And so this provides some sort of setting or motivation for maybe why scale and integration of a lot of

data sets could be important. Um and so you know what we've seen in a lot of other domains is that transformers kind of form the basis for a lot of the

advances in modern AI systems. So we've seen that in language things like chatgbt right and all these amazing LLMs that have come out and um you know

provide these really amazing capabilities as well as vision or vision language uh models and and in many other

forms of sequence data. And so today we'll talk about you know what are transformers and how can we apply them

to neural data and I'll talk through um in some level of detail kind of what are the building blocks and how do we go about thinking about you know how do we

take some of these tools that have been established in these other domains and um port them into our study of the brain.

Okay. And I guess I'll also say that one of the really um one advantage of transformers is that they're very flexible systems that allow

for processing of many different many different types of modalities. And um

and so I think that hopefully I'll convince you that this is particularly important for neuroscience because often in our data sets we collect, you know, many different sources of information

that tend to be multimodal, right? So

maybe we're recording aspects of behavior which is a continuous stream of of temporal information and then we're recording spikes right which are

irregular sequences of events and um you know what is the trial structure when when did the when did the animal you

know start to solve a task or not and so um so yeah transformers give us a a way in which we can process multimodal data very effectively.

Okay. So now I will talk about the basics of transformers. So what is a transformer

transformers. So what is a transformer and h how does it work? Um the starting point is to take this data maybe if

we're thinking about uh language. we

have you know a sentence which is comprised of a sequence of parts of words or words and um in an image right we can

think about different patches or regions of the image all being different pieces of this global piece of information that we want to understand. So what we do is

we're going to start by uh forming or creating tokens which basically give us a way of representing these different sources of data. And so as I was saying

in the context of language maybe each token is a different word that we might try to build some sort of insight across. So transformers represent

across. So transformers represent sequences or general information as a set or sequence of of tokens. And these

tokens have two main kind of pieces of information that um that define them.

And so a token is going to be represented both by its content. So it

could be the actual word that we're looking at in the sequence or the specifics of what's happening within a part of the visual field. And in in

addition to the content, we also are going to have to have some notion of position. Um, and so I'll talk about

position. Um, and so I'll talk about that just in a very general sense, but we could think about in language, you know, the position could be as easy as the beginning or the end of the sentence, right? We want some way to

sentence, right? We want some way to provide the transformer with information about where in the sequence this content is

occurring. And so here I just have a

occurring. And so here I just have a simple example of six tokens and each of them are going to be represented by the content X and their position P.

Okay. So, the main building block um of transformers is something called self-attention. And so, what does self

self-attention. And so, what does self attention do? It allows for

attention do? It allows for the exchange of information across all of the tokens within our sequence. So in

contrast to say like an RNN or a current model that only looks at nearby points in its distant past in order to predict the future um the transformer allows all

to all connection and flow of information across the whole sequence and across all the tokens. And so the the main yeah building block for self

attention is first um we will define for all of the content or the information in each of our tokens. We're going to

define three different objects um a quer a set of queries, a set of keys and a set of values. And each of these are going to be computed by basically

multiplying all of our tokens or all of the information in our sequence. So all

these X's by some weight matrix to build um these queries, keys and values. And

so what we can see here is that the dimension of the output of these will be the same number of tokens as what went in. But we could potentially change

in. But we could potentially change their dimensionality. So here is just an

their dimensionality. So here is just an example where we have two tokens being multiplied um by this weight matrix. And

so we still have two representations coming out, but they could be of a different dimension. Okay. So we can

different dimension. Okay. So we can kind of think about now these keys and queries as being a way for um all of the

tokens to talk to each other or ask for information from other parts of the sequence. And so um the actual way of

sequence. And so um the actual way of computing the attention scores um through this computation of self- attention is given by this equation

here. And really this is going to be

here. And really this is going to be driven by the inner product between a a query from the E token and a and a key from the J token. And this is going to

be all to all. So if I send out a query that I'm looking for certain types of information, all of the other tokens can give me their keys from which I can compute this inner product and calculate

my attention score. And then this here is just a

score. And then this here is just a softmax normalization. So we can think

softmax normalization. So we can think about this as basically being kind of sparsifying the attention scores that I'm receiving from all of those other tokens within the sequence.

And then finally the output of this self attention block um which is parameterized by these queries and keys and values that we defined is now just

going to be the sum of the values for the for all the J tokens um scaled by these attention scores. So in the end

the output for token I is going to be given by this equation here. Okay.

here. Okay.

Um, so and I I know that, you know, we're just getting going and this is being filmed or whatever, but I really do hope if people have questions and

want to interrupt at any point, please do. Um, and you know, I'd be happy to

do. Um, and you know, I'd be happy to clarify. Yes. Okay, perfect.

clarify. Yes. Okay, perfect.

Uh so the W matrices will have some fixed dimensionality depending on um depending on what we want the output dimension of the values or keys or

queries to be. So in this example we have like a 4x3 matrix.

Uh so each of the each of the tokens are going to be multiplied by the weight matrix in order to get a value out.

Yeah.

Okay. Okay. So one thing that I didn't exactly describe yet is the idea of the position embeddings.

And so um in the previous slide you know I talked about how the how the tokens are going to be um used to compute these

attention scores and the outputs and um and so okay so we can think about the position information in the easy example that I gave before. Uh

so if we have a sequence say of words the position could be as simple as you know marking the beginning and the end.

Um there are things called s cosine position embeddings which give us a way of kind of parameterizing or dividing an interval um into a kind of fixed set of

intermediate positions. Um but just in

intermediate positions. Um but just in general we can actually think about this as a very general concept that tells us how the different tokens within our

sequence or our data are connected either in time or space. And so on the left, I'm just showing an example of a

um of like relative position encoding where nearby tokens are going to weigh each other's attention or be kind of more influential in sending messages to

one another. And this can kind of be

one another. And this can kind of be visualized here by this idea that if this is my central token, I might be influenced more heavily by other tokens within my

surrounding. Um, but in more in more

surrounding. Um, but in more in more general terms, we can also think about encoding position in a more arbitrary way. Yes.

way. Yes.

Yeah. Yeah. I kind of um I kind of uh simplified things a little bit. So

typically you'll have some way of encoding the position of each of your tokens. And typically we'll actually add

tokens. And typically we'll actually add the position embedding to the content embedding before we go about solving or you know computing the self attention

and different scores. So for many transformers you'll actually add the position encodings to the content and then apply the self attention operation.

Um we'll see some models that actually integrate the position information into the attention calculation directly. So

there are different ways in which you can integrate position information. Um

we'll talk about uh a model called Puyo which was developed in my lab which uses relative position encoding which actually does this modulation of the

attention scores based upon the absolute difference between the two tokens in time. So yeah, there are different ways

time. So yeah, there are different ways and what I'm showing here is a little bit of an abstraction where I'm looking at it in terms of how the attention is going to be modulated by the position

information. Yeah, thank you. It's a

information. Yeah, thank you. It's a

good clarifying point. Um, and so over on the right, I'm

point. Um, and so over on the right, I'm just showing an example where we could even take this to the extreme. So maybe

it's not sequential or temporal data at all, but maybe we have information about how the different tokens are connected to one another. for instance in a graph.

And in that case, we could also visualize, you know, um basically if two nodes are connected, they can then um impact the attention or basically flow

information across all of those interconnected nodes. So this gives us a

interconnected nodes. So this gives us a sense that um that transformers are actually a very general concept that can be used in many different data

structures. It's not just in sequential

structures. It's not just in sequential data but also in graphs um and in you know other domains that might not have a

welldefined kind of notion of of position apriori. Okay.

apriori. Okay.

So, so now that we've covered the basics of what are transformers, the tokenization idea as well as self attention, we'll now

transition into talking about, you know, how do we apply these sorts of models to neural data. And we'll talk about a few

neural data. And we'll talk about a few different examples of that in um that have come out over the past few years since this has kind of really

started to um get going within um neural data analysis. Okay. And so in order to talk

analysis. Okay. And so in order to talk about the applications of transformers to neural data, there will be kind of three main ingredients. First is how do

we tokenize neural data, right? how do

we turn it into this sort of content and position information that I've talked about. Uh step two is what type of

about. Uh step two is what type of architecture will we choose? And so what I talked about is just a basic building block of um of a transformer or a kind

of standard self-attention mechanism.

But there's other forms of architectures and other ways to process this type of data using self attention or sorry just using attention in general. And then uh

step three is you know how do we optimize or what is the goal or what is the underlying objective of the transformer? What is it trying to

transformer? What is it trying to predict? Okay. And so we'll first talk

predict? Okay. And so we'll first talk about tokenization and then move on to the the other

parts. Sorry.

parts. Sorry.

Huh? What's up?

Oh. Okay. So, uh, step one, tokenization.

So um I think you know many of us are very familiar with the idea of kind of taking populations of neurons and

turning them into some say vectorzed or some input um for for population level models. And so this first introduction

models. And so this first introduction or this first um transformer for neural data or the NDT utilizes a kind of simple slicing uh

temporal slicing and patchbased tokenization to uh to prepare the neural data for later processing.

And so we can think about you know just a simple example where we have three neurons that are firing action potentials or emitting um these spiking

events uh which are depicted here uh and that that's happening over time. So

first we can kind of imagine you know if we have a really long recording just taking a chunk out of it so a context window over which we'll we'll build and

tokenize the the neural data. And so a simple way to start would just be to choose a bin size. So amount of time over which we're going to count the

number of spikes or compute the firing rate of a neuron. And now what we can do is we can basically slice along space.

So looking at what is the firing rate of all d neurons that I've recorded from jointly and make that into our token. So

we can see here this example. So the

first neuron emitted two spikes and then one and zero. So that would result in this vector 2 one zero and that's kind of the first raw part of our token.

We'll typically then process that with like a small MLP to map that neural population vector into a token vector of

some fixed dimension. And so the same would happen

dimension. And so the same would happen for this next chunk in time 112.

and we'll process that through an MLP and the output of that will be five tokens that we would use um in this

sequential or in in the transformer.

Um, so another way that we can kind of consider the tokenization is instead of grouping together all of the neurons into one token, we can also actually

split everything by different channels or by neurons. And so um this work from uh 2022 EIT was the first to propose

this kind of idea of channel level tokenization. And one of the motivations

tokenization. And one of the motivations for this work is that um you know different brains or different neural populations will have different

dimensionality. And so when we choose to

dimensionality. And so when we choose to group all of the neurons into one token, we're kind of assuming things about the ordering and the dimension of the

neurons that we want to process, right?

And so the motivation of this work was to actually allow for some amount of generalization by doing tokenization at the channel level. So here we can

imagine, you know, actually taking each neuron and splitting it into bins over time and then processing all of the

neural activity for each neuron through an MLP or through some small model in order to get a um representation or tokenization of the data.

And um and so we can see that you know this provides some ability to now understand if our tokens are at the

level of individual neurons we can start to understand how those different neurons are actually interacting through the transformer. Right. Sorry. Yeah.

the transformer. Right. Sorry. Yeah.

What's Oh um so an MLP is a multi-layer perceptron. Thank you. um which is

perceptron. Thank you. um which is really just a sequence of linear layers followed by some sort of activation function activation function. And so

this is kind of maybe one of the simplest neural network models that um one might consider utilizing and we'll talk more about it in the interactive session. We'll

actually go through a basic MLP. How do

you train?

Oh yeah. Um so we will get there. At

this point I'm just talking about how do we tokenize the neural data. Um but

there's different objectives right. So

in the case of um chat GBT or a lot of other like masked well chat GBT is more of a forward prediction model. So the

idea is like can you predict what the next token will be right there's also masked models which mask out different parts of the input and the goal of the model is to fill in those missing

components. Um or you can train it in a

components. Um or you can train it in a supervised way, right? And so you could train these models to also predict behavior, predict a stimulus, predict

any sort of um you know uh either input or output to the neural system as well.

Yep.

Sorry, say that again.

Yeah.

Yes.

Yeah. Um, so these first few models were uh kind of initial attempts at defining new ways to use transformers on neural data, but these models were actually not

used to scale yet. So I'm going to get into some examples of how we can address some of those challenges um in some of the future slides. But yeah, you point

out a good point like the dimensionality with this type of approach is going to grow significantly.

We're going to do that next. Yes, great

idea. That's that's what we thought too.

Um actually not this one, the next one.

Um and then you know that same year there were also some works to uh take neural activity and actually process

um process two different types of patches. So in this work um STNDT which

patches. So in this work um STNDT which is in 2022 they actually have an architecture where they both do attention over space or over different

neurons as well as over time through two different um kind of parallel branches within the the transformer architecture.

So here they um both create spatial as well as temporal patches. And then to the um suggestion

patches. And then to the um suggestion from the attendee back there um more

recently or in 2023 we introduce a model called Puyo which uh sidesteps the need

to bin entirely and really tries to use the the true nature of neural data at least at the the spiking level where you

have events that are become your tokens.

And so here within Puyo um we'll also define a context window. And now really the idea is to treat each spike or each

event as a token um which is sort of shown here. And now when we flatten out

shown here. And now when we flatten out you know create a sequence where each spike is a token there's kind of two main things that we need to be able to

communicate to the model. one is the neuron that it came from and the second one is the time at which the event occurred because we're no longer binning

and kind of collapsing things into these fixed bins any longer. And so now we can actually use the the real timing or the absolute timing information as a kind of

key ingredient that goes into the transformer.

How do I normalize time? Um so in this case we'll define a context window and over that context window we can define just the the unit time. So starting

we'll assume that it's starting from zero. A lot of times the context windows

zero. A lot of times the context windows that we'll use at least in this model are about 1 second. And so we can you know accumulate all the spikes that were

emitted over some fixed amount of time and then um start from from zero as absolute time and then kind of um parameterize them

accordingly. Okay. And so now we can

accordingly. Okay. And so now we can kind of um think about okay what is this unit embedding or sorry what is how do we convey information about the

different neurons and the uh spikes that they're emitting. And within this

they're emitting. And within this particular framework we kind of well what we do is we allow the model to learn some embedding vectors for all of

the neurons that were um observed throughout the experiment. And in

language, we can think about the content of each token, right? Or each word as potentially having some sort of semantic embedding, right? Where, you know, we're

embedding, right? Where, you know, we're looking at the word dog or we look at the word cat. And maybe the representations of those two words might

be close within this embedding space.

And similarly um we envision that through this learnable neuron level embedding that we use in the tokenization we might also be able to

discover similar semantics and organization. So you know neurons that

organization. So you know neurons that might have similar receptive fields or have similar tuning properties or maybe being from the same cell type could

actually be then mapped to similar um latent vectors within this embedding space.

And so this is something that is learned by the model over time. And now each of our tokens are going to consist of this neuron level learnable embedding as well

as the temporal information about when the event was uh fired. And we've extended this or um

fired. And we've extended this or um also been able to take a similar approach to uh tokenize calcium imaging data. So now instead of having spikes,

data. So now instead of having spikes, you actually have a regular time series of uh of information based upon the calcium signal. And so in this case, we

calcium signal. And so in this case, we can use basically the same idea of having a neuron level embedding for each of the neurons. Um, but instead of

having spikes, we're going to just represent every single timestamp as a token and then also add information about the amplitude of the time series

at that particular um point in time.

And so these are kind of really give us a a range of different possible ways of taking you know multivariate

uh time series or irregular time series events and boiling them down into a format where we can then you know put

them into a transformer model and and optimize and build some some desired outputs.

Okay. So the next step is to consider now that we have a a tokenization or a way of uh taking the neural data and turning it into tokens, we can also

consider some different forms of architectures that have been um leveraged for this. And so in a number of kind of existing models, so NDT, so

far I talked about NDT1, but there's been some nice extensions. Um, and and they you think

extensions. Um, and and they you think we'll talk a little bit more, but they use very similar tokenization schemes as what I described for NDT. also multitask

masking which is some recent work out of Colombia um as well as the EIT all use kind of standard self attention blocks

uh or transformers in order to build inferences about the neural data and in addition to self attention

um in Puyo we also introduce a novel architecture or leverage um the perceiver IO which was originally used

in um time series or audio modeling and in LLMs and um adapting that to neural data as well. And so the idea behind the

perceiver or behind this idea of cross attention is now imagine that you know we we take a a second long context window and if we have a thousand neurons

we might have a ton of tokens right depending on the firing rate. But if we only look at 10 neurons, we'd have a much smaller sequence. And so one of the

advantages of using this mechanism is it allows us to deal with highly variable length sequences.

And instead of computing attention across all of the tokens, which could you know be very long or short sequences, we instead will use cross

attention to basically take all of those highdimensional tokens and project them into a latent bottleneck or a set of

fixed latent tokens. And so this is sort of depicted here. Um, so now instead of the queries, the keys, and the values

all being defined over the tokens in our input, we're instead going to have a set of learned latent tokens that will provide the queries and then the spike

tokens or the inputs are going to provide the keys and the values. So we

can think about that as like this learn latent token might have certain rules that it's looking for within the spike sequence, right? Maybe it's, you know,

sequence, right? Maybe it's, you know, quering the firing rate of neuron one or neuron 10. Um or maybe another query

neuron 10. Um or maybe another query could be something about the sequence of spiking events um that occur across an ensemble of neurons. So it's a very

general concept. These queries can now

general concept. These queries can now look over the entire sequence and look for different types of information that it's interested in extracting. Yes.

Yeah, I mean I think that in essence this would tend to have this kind of um yeah like rank limiting effect right

like through this kind of bottleneck or mapping to this latent dimension. Um

what's nice however is that um so whereas say the complexity of the self attention mechanism say we have n tokens it's going to be n squared right because

we have to compute aigs over all the possible pairs we're now going to have a computation that's n time l um and so

and then all of the further computations will be in the latent bottleneck so it's more like l squared right Um and so yeah it it has a similar effect but this you

know gives us a way of reducing computation as well um without having to explicitly compute um the alltoall connectivity.

[Music] Um okay great. So once we have um used

this cross attention mechanism to basically take the um large potentially large set of spike tokens and um put

them down into this set of learned latent. We then will apply multiple

latent. We then will apply multiple rounds of self attention within the latent space to kind of compute more and more complex sources uh or pieces of

information from the data.

And I guess yeah I can just say this here. So with with Puyo and Puyo plus we

here. So with with Puyo and Puyo plus we uh train the model which I'll talk about next with the optimization um to solve different supervised decoding tasks. And

so one thing that's nice is just like we were able to um we're able to query

from or okay what I want to say is we were representing each of our spike tokens as a regular events with kind of

continuous time values. Similarly at the output we're also going to apply a cross attention mechanism to now um basically

query from the latence and then what we'll get out are some observations of some behavior that we want to predict.

And so this also allows us to basically query at irregularly spaced intervals or perhaps you have different behaviors that are happening at different frame

rates or different times. This is a really flexible mechanism that allows you to query at any arbitrary point in time in order to um produce some

resulting behavioral covariants of interest. Okay. And then step three is

interest. Okay. And then step three is what is the oh okay we're almost we're getting there. Okay. Step three is um

getting there. Okay. Step three is um what is the objective? What do we want to train these models to do? And so a number of models uh in the community

have um kind of there's two main approaches. So both unsupervised as well

approaches. So both unsupervised as well as supervised models. The first uh kind of main core set of techniques that have been successful have all been mass

modeling approaches. Um, and so these

modeling approaches. Um, and so these models here all take this same type of um, approach where the idea is that you're going to mask out different

tokens of what however you've tokenized your data, but masking out portions of neural activity. And then the goal of

neural activity. And then the goal of the transformer is to learn a really rich representation such that it can look at the context of what else is going on

within the neural recording and then it can predict what the activity was for those held out neurons or time points.

Um this visualization here is um from this multitask masking paper MTM where they actually introduced a number of different u mass modeling objectives

that challenge the model to learn you know different ways to predict neural activity from its context. And so kind of you know maybe a standard approach

that has been explored in the past is this idea of co- smoothing. So masking

out a random number or configuration of neurons and trying to predict what those neurons were doing. But in addition to this co- smoothing objective, they also show that introducing causal prediction

tests, so can I predict what's happening in the future uh with these neurons as well as um region level specific

masking. So in this case they were um

masking. So in this case they were um working with neuropixels. And so in that case you actually have activity that's spanning many different brain areas. And

so you can also play a lot of cool tricks to essentially mask out an entire brain regions worth of activity and see if given the other brain regions you might be able to predict what was going

on within that held out region. or an

inter region prediction task um where you mask out all the other regions and then you just try to use your local regional information to predict what's

going on within a held out neuron. And

so really the field has um evolved a lot just over the past few years. And so

this model really shows us that there can be a lot of um diverse ways in which we can actually challenge models through

these sorts of masking um prediction tasks.

Um and then you know of course in many uh neuroscience experiments we care about taking brain activity and predicting either some aspect of just

one sec some aspect of the inputs coming in or um the motor behaviors coming out.

And so within these models um the goal in the end is to be able to predict some aspects say of motor activity or uh

stimulus in in a visual condition. Yes.

the phantoactor.

Oh, like in how in the prediction and the outputs, how effective is the actual activ or the so these are all binned.

So, we don't exactly get the spike times directly, but um yeah, no, it's a really great question. Um we have some folks

great question. Um we have some folks that So, Cole is here. He was the senior author on this work. So, maybe during the breaks we can chat a little bit more

about some of the investigations there.

It's a good question. Yeah, maybe you have a

question. Yeah, maybe you have a specific hypothesis we could talk about.

Okay. Okay. So, yes.

Yes. Yes. So, that's a good question.

Um, so there are two different ways in which you can take the the pre-trained model. Well, so in one case you could

model. Well, so in one case you could assume that you've held out some trials, right? Or some chunks of time from the

right? Or some chunks of time from the model. You've trained your foundation

model. You've trained your foundation model and now you ask, okay, well, how well does this predict on those held all trials? In many cases, um, you know,

trials? In many cases, um, you know, that's a standard thing like just zero, right? where I just put in the new data

right? where I just put in the new data and then I can predict um but then in some of these models in particular pouo where you have um these unit embeddings

right so it's some content information about each of the neurons that has been learned throughout training so in that case the

pre-trained model has no access to information about brand new neurons that it's never seen before and so what we show in um in a number of these works is

that you can fine-tune or you can actually relearn the unit embeddings for a new collection of data from a new brain that it's never seen before with a

small amount of labeled data. So at this point um you can transfer onto new animals but you need some way of figuring out aspects of the new neurons

that it's never seen before. Yep.

Um, so I think that it's still under investigation.

I think there are some approaches that are, you know, moving us closer towards something that could be more zero shot in the same way that LLM is like I just put in, you know, imagine it's kind of

like, okay, I want to I want to go into the LLM, but I speak a completely different language that it's never seen before. I can still prompt it, but it

before. I can still prompt it, but it doesn't know what my language is, right?

And it can't fit into the context of what it's seen before. And so I think neuroscience is really unique in that, you know, when we just have a collection of neurons that are doing stuff, there's

no there's no way to, you know, uniquely identify individual neurons without some way of kind of realigning things. In our

prior work, we did show that you can hold all of the weights of the transformer fixed and you can just update the unit embedding. to basically

give it the new you know a translation of the new language um without having to change the rest of the model. So that at least you know shows us that the rules that it's learned are applicable. It's

just that we need to figure out this initial translation step. That's a good question and we will be going more through uh fine-tuning at uh in the

interactive session.

Okay. Yes. When creating you look at any of this mass modeling generative approaches to neural data or were you really just focused on behavior as an

output?

Um so I think you know there's been an evolution of things um because this was a whole new way of tokenizing the data

new architectures we chose to start with the supervised task but a number of folks on our team you know have also been looking into using some of these same building blocks on mass modeling

tasks as well. So but it's a good question. It's like I Yeah. All all the

question. It's like I Yeah. All all the things all at once. But yes, it's been a kind of

once. But yes, it's been a kind of progression to get there. Um we do though I will say that I

there. Um we do though I will say that I think that um having

behavior really can be very impactful in aligning many diverse data sets, right?

having something to like really ground the the model towards learning kind of more common and unified and and still I think it's unclear

um from mass modeling you know how do we still maybe you uh how do we still like what what is actually aligning the data in that

case okay maybe things about the brain region or other information that can help

Okay. And so um you know the like all of

Okay. And so um you know the like all of this has happened really recently right over the past two years or so. Um and

what we've found is that with you know these new tokenization and architecture and new objectives they have allowed for scaling right which was part of our

motivation at the beginning. uh both

scaling across the number of subjects as well as the amount of time or recording hours. And so this just shows us out of

hours. And so this just shows us out of many of these different models um the progress that we've seen just over the past few years in terms of you know the

number of subjects and um the amount of time or recording hours and so really transformers and and these different models that I've described provide this

new pathway towards building these you know really largecale um potentially foundation models for for neural data

that can coalesce information from all of these different sources. Yes.

sources. Yes.

in terms of like the different all these different models or oh for like NDT one two and three so

those are different there's differences in the tokenization as well as the scale of the data and so each of those were like each new model was actually

retrained from scratch Um we have examples where you know you can pre-train and then you can fine-tune on

other animals or other sessions or other tasks. So that would be the case where

tasks. So that would be the case where we take yeah the model weights as is and then we just do some fine-tuning on the new data. But so far I think because

new data. But so far I think because there's a lot of open questions about you know how to do this and what what

everything is you know beneficial for um we haven't yet as a field started with you know one model as our as our basis

and then built from there. It's a good question.

Okay. So,

um, okay, it's 1:30 now.

Um, so what I'll say is that, you know, what I've described are the building blocks, the things that have kind of happened over the past few

years in terms of building out these models. Um but the process to get there

models. Um but the process to get there was you know really engineering and time intensive

right and so scaling and writing both efficient code that allows you to easily adapt to new data sets is really

challenging. Um and you know real world

challenging. Um and you know real world data sets especially in neuroscience are very multimodal. They're very

very multimodal. They're very asynchronous. they contain events and

asynchronous. they contain events and time series. Um, also when you have a

time series. Um, also when you have a really long recording, maybe one time you want to analyze it at a short time scale and another time, you know, you

might care more about long-term trends, right? And

right? And so what this means is that we have these really rich data sets, but it's often quite hard to prepare them and you know

get them in a format where we can actually start to scale up or even put them into a transformer in the first place. And so what we'll talk about for

place. And so what we'll talk about for the rest of the tutorial are new tools that our team has been developing that

really allow us to both sample really easily from these multimodal data sets and also train deep learning models on neural data. And we really think that

neural data. And we really think that you know through accelerating and having better tools as a field we're hoping that this will really help us to

accelerate science within within the field as well. And so to address some of these challenges we have created um a

number of new tools for training deep learning models for neural data. This

consists of three main packages which we'll talk about today. Um the first is temporal data which is which gives us a way of

basically slicing through a data set and extracting all these different multimodal event streams and putting them into a format that can be easily understood by PyTorch or by a deep

learning framework. Um the second is

learning framework. Um the second is brain sets. So, and actually I'm just

brain sets. So, and actually I'm just going to go ahead and like step through them here because I have a I have some schematics. So, yeah, temporal data, as

schematics. So, yeah, temporal data, as I said, it gives us a way of essentially in the end when we try to train a deep learning model, one of the main

ingredients that goes into it is our data loader, right? and how to actually sample from our data set efficiently as we're, you know, pulling different

chunks of data from different points in time or different data sets. And so

temporal data is kind of the backbone that allows us to actually build out data samplers or these data loaders within PyTorch.

And um one of the really cool advantages of it is that it allows for what's called lazy loading which means you don't have to pull your entire data set

into memory and bin it and kind of move through it in a fixed way. It actually

gives you a very flexible way to access within the data object some arbitrary chunk in time and then tokenize it however you want. So you could have, you know, a different bin size or the

different sampling rates over which you're pulling from. And so maybe this feels a little bit nuanced, you know, but I think for those of you that have worked with these types of data sets,

you might appreciate that this is not a trivial task. And so the lazy loading is

trivial task. And so the lazy loading is like instead of yeah pulling it all into memory, we can actually go through and pull down a chunk quite easily when

we're building out our batch for um deep learning training.

Um okay and then uh so all of this comes together in and culminates in what we call torch brain which we're hoping you know like in the graph community they

have pietorch geometric you know there's all these tools that exist in other domains that just make things a lot easier accelerate the advances and so we're hoping that torchbrain can be the

beginning of that and um it's really just designed to be flexible, efficient, and scalable. Um, and within TorchBrain,

and scalable. Um, and within TorchBrain, we can define different samplers. We

have um some models that are already built into TorchBrain that you can use and we're hoping to expand on this further. So, we're really excited to

further. So, we're really excited to work with the community and you all, if you have things that you want to contribute, um, we're really excited to

make this a community effort.

So, Torchbrain provides a collections of modules for building u models for neurosciences. So, we have, you know,

neurosciences. So, we have, you know, state-of-the-art transformer blocks, different tokenizers. Um, currently we have

tokenizers. Um, currently we have implementations of Puyo and Puyo plus in there. Um, but we're really close to

there. Um, but we're really close to also being incorporate different mass models. Um,

so that's going to be coming soon. NDTS.

Um and then as well as kind of more basic building blocks like the multi-layer perceptron, temporal convolutional networks um as well as the

multitask masking approach that I talked about earlier. And then finally, brain sets is

earlier. And then finally, brain sets is um a collection of standardized data sets that are already ready for training. So for people that are

training. So for people that are interested in testing their model or benchmarking different models on some of these existing data sets, these are

already available here. Um, a lot of these are uh actually all of the ones currently are non-human primates during

a variety of different reaching tasks.

Um, but we're also going to be releasing the Allen Brain Observatory, calcium imaging, as well as IBL and the Falcon benchmark through brain sets. So, we're

hoping that this can also allow for reproducibility and um for everyone to be able to easily kind of um you know

use new data sets to test their models. Okay, so that takes us to the

models. Okay, so that takes us to the end of this first session. um the sum in summary. So I've talked about how

summary. So I've talked about how transformers provide a way to model neural data and learn very complex functions of neural dynamics. We talked

about the main building blocks for transformers. So how to tokenize, what

transformers. So how to tokenize, what are the architectures, what are some of the different objectives neuroscience um motivating things through this notion

of you know wanting to scale and um needing kind of efficient code and infrastructure to do that. And then um finally just telling you a little bit of

an overview of these new packages that we'll be diving into now. Okay. Okay. And I'll release all

now. Okay. Okay. And I'll release all the slides so you all have it. So then I also have these references to many of the models that I talked about here

um with just a little bit of info on them. And then um finally just before we

them. And then um finally just before we switch over to the interactive portion, I just wanted to give a plug. So, we're

going to be hosting a two-day workshop here at the conference um or during the workshop sessions on foundation models for neuroscience. And so, if you're

for neuroscience. And so, if you're interested in learning more about these topics, we'll have a number of really exciting talks as well as panel

discussions. And so, um I hope that you

discussions. And so, um I hope that you can join us for for some of that.

And then also there's going to be a upcoming symposium that will also be really talking about scaling and a lot of these topics at Champoly mode in

October. So please put that on your

October. So please put that on your calendars as well. Okay, great. So, this will be the

well. Okay, great. So, this will be the time that we're going to start to transition over to the collab notebooks

and we'll start going through those. Uh,

for each of them, I'll kind of walk us through some of the ideas and kind of ground us in um in the content and then you will have some time to work through

some different exercises and the notebook. So, go ahead and open up your

notebook. So, go ahead and open up your laptops. if you don't have them already.

laptops. if you don't have them already.

And um you can find the materials at this link here. So

here. So cosign-tutorial-2025.github.io. Cool.

cosign-tutorial-2025.github.io. Cool.

So, I think we have like um we have maybe like about 10 minutes just to kind of transition, give you guys a little break to let that all settle

in. And um and if you need to, if you

in. And um and if you need to, if you already have everything pulled up, do try to like go ahead and run because we were having, you know, it's like a lot of people trying to request an instance

and so just making sure everyone can get theirs going. And so we'll start back up

theirs going. And so we'll start back up at um 150. So 10 minutes.

150. So 10 minutes.

I think we'll get started with the first notebook.

Okay.

Alrighty. So hopefully everyone has been able to get a T4 instance. Okay, all the TAs are just so

instance. Okay, all the TAs are just so excited to help you with these next parts. Um, okay. So as I've already sort

parts. Um, okay. So as I've already sort of motivated in the slides um you know yeah the the idea behind

Torch Brain was to be able to make training deep learning models on neural data you know easy and efficient. And so

in this first notebook, we're going to kind of go through some of the main

um components that allow us to build out uh data objects that then will allow for this lazy loading and very efficient um

sampling of the data within TorchBrain. So within this notebook um

TorchBrain. So within this notebook um there's three main parts. Uh the first is just talking about data and data

objects. The second part um data uh is

objects. The second part um data uh is is talking about slicing. So this notion of kind of pulling out a chunk through this data object which could actually be

very large and not even fit onto your memory. Um and then the third part is

memory. Um and then the third part is kind of going into then how do we build samplers using this framework. Okay. And

we have our documentation here. The

documentation is like very interactive.

Um, a lot of visuals and so I hope also any questions that you all have about this you can also dive into some of the documentation and of course ask us today

or in the future. Okay. So the first part is uh here let me just zoom in a little more.

Okay, great. So, what is in terms of temporal data and these different objects, what are kind of the main uh components, right? And so, I've

already talked about this a little bit, this idea that in neuroscience we often have multimodal data, right? That could

be like let's say we have a behavior of a reaching movement, right? or a cursor which is actually you know sampled at some underlying rate and is a regular

time series object. Um in addition to behavior we also have irregular time series. So as we talked about if we're

series. So as we talked about if we're looking at spikes these are regular because they're not sampled at exactly the same rate right it's just whenever a spike occurs. So we could have long

spike occurs. So we could have long stretches of time where we have no events. And so in order to efficiently

events. And so in order to efficiently represent those sorts of sparse data formats, we don't want to just bin everything and put a lot of zeros in

between, right? So really dealing with

between, right? So really dealing with the true nature of these eventbased data requires a different way of formatting that information. So we have regular

that information. So we have regular time series, irregular time series. Um,

oh yes. So this is like the idea of event data here. But then also we could have irregular time series where maybe we're missing data over some chunk like

maybe we turned off our recording device and you know um we weren't recording some pieces of that data over consistently over time. Um as well as intervals which could represent any

number of things. So like trial start, trial end is is one example of where we have an interval over which maybe when we want to process our data, we only

want to constrain ourselves to to certain types of intervals and not over the whole stretch of time. And so when deal like these are

time. And so when deal like these are really unique data formats that I think in neuroscience and maybe other areas of science like are kind of really unique

to us. And so

to us. And so um when working with this data, we might in some cases have to do things like

okay, I have some bin size for my spike data and I have some frame rate over which I'm capturing some behavioral

information. So oftentimes we just end

information. So oftentimes we just end up aligning everything and defining the same sampling rate. So you can have like a onetoone mapping between each of your

behaviors and each of your bins, right?

And so this can for some types of analyses potentially be limiting. And so

I think, you know, this gives us a way to more flexibly define a lot of these different um sources of information or with

different frame rates. Okay. So the

first thing we'll do is just install the package. So temporal data. Um it's in

package. So temporal data. Um it's in pip. Yay. So it's just pip

pip. Yay. So it's just pip install. Okay. And just for this uh

install. Okay. And just for this uh tutorial and just to kind of step through these things, we're going to be working um to start with just a single

session. Um, so we're pulling from the

session. Um, so we're pulling from the Dandandy archive, which if you haven't heard of Dandandy, it's a open repository um for a lot of different types of

neural data as well as behavior. Um,

this is all in NWB format. And so um this is just you know pulling down one sessions worth of data. But um we can actually in the in these scalable

models, you know, be able to use this um repository to easily pull down many different um sessions or different types of data

sets. Okay. And so the session that

sets. Okay. And so the session that we'll use in this example is um from uh from Matt Paritch and Lee Miller. So Lee

Miller's lab at Northwestern. um they

have a non-human primate performing in this case just a simple center outreaching task. In some of the models

outreaching task. In some of the models that we've trained on, there's diverse um types of reaching or on a using like a touchscreen. Um but in this case the

a touchscreen. Um but in this case the the animal is holding a planer manipulandum and they'll see a cue towards one of eight different targets

and um you know we'll be instructed to reach uh over um all of these different cues that it receives. And so this data

set will consist of neurons both from um preoter areas as well as primary motor so PMD and

M1. And at the output we'll have the um

M1. And at the output we'll have the um the cursor position uh in both X and Y.

So we have a two-dimensional behavioral output that um we'll be interested in visualizing and potentially decoding

with a a decoding based trans or with a decoding based model using transformers.

Okay. And so

um to start so this is really cool. This

notebook is super interactive and it might actually be useful for analyzing your own data or playing with data.

Um, so, oh yeah, so basically you'll see this symbol here for the different interactive cells within the notebook.

And so you can actually go through and and play with them. So this is really cool. We're just

cool. We're just visualizing um the neural population activity, so spikes as well as these reach interval objects that I've kind of

already alluded to. Um and then this is the uh velocity in X and velocity in Y.

And so here we can actually press play.

Isn't that so cool? Um you could also speed it up if you want to go through brain activity in 8x time. Um, and so what we can see here is, you know,

really just motivating this idea that reach intervals don't always happen at exactly the same chunk in time. This

could be something that is encoded within your data, but um is actually not so easy to like standardize and to you know say, oh, every 10 seconds we have

another reach, right? So this is showing us all of the aspects of the data within um this particular data set.

Okay. And so now what I'll talk to you about are how to actually when we build

a brain sets when we prepare raw data to be fed into this um transformer or into

torchbrain we need to encode all these different parts of the data set the different objects. And so there are four

different objects. And so there are four main data objects within temporal data.

We have um intervals, regular time series, irregular time series which we've already talked about and then we

also have a way of um basically encoding uh so just a array or a a dictionary of vectors. So in the case where we have

vectors. So in the case where we have the puo model where we have to actually learn um unit embeddings for each of the different neurons uh we can actually

initialize them and define them using this object array dict. So here um I just have some examples of how you can actually read all of the different

neurons within the recording into this array dict um using temporal data. And

so this also allows us to encode things about um the location as well as all of the individual units. So here we also

have representations of the different brain areas that each of the neurons come from. So this first object is um

come from. So this first object is um yeah representing the unit information.

So at this point it's just uh a lookup table basically for every neuron and we're going to give it a unique identity. Um okay in addition to that we

identity. Um okay in addition to that we have our irregular time series. So this

allows us to encode timing and um other information about the underlying spikes.

And so here we just have an example of reading in all of the spikes into this irregular time series

object. And I am not going through every

object. And I am not going through every line of this code. We don't actually have time. But I'm hoping that this

have time. But I'm hoping that this provides a um a good starting point just to understand like what are all the different objects and what comes together within temporal data. Yes. With

a neuroscientific data set we have spiking in behavior. What's a case where it's better to use a regular time series?

So with the behavior like let's say you're just recording I don't know the visual cue or something and you have it on the screen and it's 30 frames per

second. So really anything where there

second. So really anything where there is a regular interval over which the data is collected can be represented as a regular time

series. You could potentially also store

series. You could potentially also store it as an irregular time series object.

Um but yeah encoding as regular is nice because you just basically specify the start and the end and then over that and the end the frame rate and so then you

assume that everything is regularly spaced over that interval.

Okay. So great. So now we've encoded the spikes as irregular as an irregular time series within temporal data. This is

just um so this is also interactive here. We

can see all of our spikes that were um that are encoded within our data set.

Okay. The next type of data object is our regular time series. And so here like anything that has to do with the behavior

um or uh so things in this case like the cursor position, velocity and acceleration can all be encoded as a regular time series. And so here we're

just pulling from the NWB file these different fields. So the cursor

different fields. So the cursor position, velocity, and acceleration.

And then um we're creating our regular time series for all of these sources of data. So what's nice that we'll see

data. So what's nice that we'll see later is maybe one model wants to extract out or predict the acceleration and another one wants to predict

velocity. So we have a flexible way to

velocity. So we have a flexible way to query for each of those different um sources of data. But at this point the all of all of that information can be

stored within the same object.

Okay. And this is just a visualization of the cursor position over time. Um we

can see uh the you know complexity of this positional information embedded within

this regular time series. Okay. And then

finally the other main piece of information is uh the interval and that's used in this example to encode

the trial tables. So the start and the end of of different trials. And so here the interval is going to be encoded you know through the start and the end time.

um as well as you know maybe you have information about whether it was a success or a failed trial or or any other metadata that has to do with that

interval and so you can also encode that in this um this values uh field as well as the the target ID so where the animal

was reaching to in this case we'll have eight different targets and so all that information can be encoded into these different interval

objects. Okay. And this is looking at

objects. Okay. And this is looking at the uh a visualization. Okay. Um sorry, I don't

visualization. Okay. Um sorry, I don't know if this is running. Okay. So, this is now just

running. Okay. So, this is now just visualizing all the different reach intervals. And so we can see that um you

intervals. And so we can see that um you know there's chunks of time where they weren't being instructed to do any of these movements and those can happen

over very kind of irregularly spaced um amounts of time.

Okay. And so then finally once we can extract all these different fields out of our NWB or out of our data file um we

can put that all together into this data object which is really a container for multiple temporal data objects right so all these different pieces that all come

together. So in the end, this is really

together. So in the end, this is really the key container that we'll use to then query and slice through different parts

of of the data set. And it has all of the all of the different essential pieces. Okay.

pieces. Okay.

Um once we define our our our data objects, we then um will store this information into an

H5 file. Um, and so the thing that's

H5 file. Um, and so the thing that's really nice about H5 is it allows for this lazy loading that I talked about where instead of pulling everything into

memory, we can actually just pull down the chunks of data from this larger data set that we want to pull into our training loop or our batches. So really

this is just to show you that under the hood all these brain sets have already been prepared in this way where we take all the information from them we put them into this data object and we save

everything as a H5 file so that then TorchBrain can easily you know access and interface okay so I know that

we're running the break at 2:15 so I I guess at this point we'll um I'm going to go through just these last two parts

and then we're going to have a 30 minute break for those of you that you know want to do whatever during the break.

Awesome. For those of you that want to stick around and still play with the notebook, um we'll also have the TA still here to if if you want to. Okay.

So then in the rest of the notebook we talk about how to slice uh data or slice through data objects.

And so I want to just show you this was the picture that we saw before. And so um just at a high level,

before. And so um just at a high level, right, we have all of these different types or streams of data that are all in our data object. And now the key thing

when we're defining our data samplers and data loaders is that we want to specify a start and an end time and then

we want to slice through this irregular time series in order to get out all the data from that start and end time.

Right? So we can think about when we were talking before about tokenization, we had like a context window, right, over which we were tokenizing. And so

this could be thought of as like you know slicing out a a one second or some context window from the data. And so normally if you were trying

data. And so normally if you were trying to do this um there's a number of things that you'd have to do like first you

know is there even any data from a given field under the hood during that time.

Um, and so what this will do is just vary like all this code under the hood, but at the end of the day, we can actually specify the start and end time

just in terms of absolute time rather than the bin or the samples and say sample 10 through 30, right? Um, we can actually just use time as a way to slice

through this data object. And this is going to be what's actually happening within torchbrain when defining the data loaders. And so then when we slice

loaders. And so then when we slice through the data, it'll give us some information about um you know what are the things under the hood. So here in

this example um in our slice, there's just one interval that overlaps with this particular uh slice or this chunk

of time. the brain set or the data set

of time. the brain set or the data set information about the session um as well as you know we have the position veloc velocity and

acceleration as well as um the number of neurons that are all being encoded. So

55 neurons um that are all part of the irregular time series. Okay. So now um we can kind of

series. Okay. So now um we can kind of put all this together and so we have this data set object

um and then what we can do in order to generate a sample right is to specify the recording ID as well as the start and the end time and then we can

actually just slice through. So at this point we are now um you know pulling in the torch brain. So, PieTorch brain um

package in order to look at these different samples and [Music]

uh so we can you know yeah get within torch brain by specifying the ID the start and the end time. So here we're just pulling out that first second worth

of data. Okay.

data. Okay.

And so um what's nice is that okay once we've defined this object we have this idea of how we can slice through the data at some chunk in time you know and

create a sample with that um within torchbrain we provide a number of data samplers that have kind of structured

ways of pulling out these chunks of data so to make things really easy and um so I guess the two main ways that one might

just like naively or think about um building samples for training their model. So, as we know, we'll need some

model. So, as we know, we'll need some training data. We'll often need some

training data. We'll often need some validation data as well as some testing data, right? And so what we could do is

data, right? And so what we could do is we could just um we could just like take this whole chunk of time or the definition of the full recording and um

we could just randomly sample these 1 second uh windows or these context windows from the data set and we could randomly either assign them as a

training validation or test. And so this is um our random fixed window sampler where you can just tell it like you know

how much of the data you want to sample uh for the train test and val and um and then we also have ways of instead of doing just that randomly you could also

sequentially define your training test and validation. So now you only train on

and validation. So now you only train on like say the first half of the data. The

next chunk might be for validation and then the the later subsequent times could be tests. So that could be a way to instead of randomly sampling, you could actually use time as a way of

slicing out your train testing and val and I

think so um so this is an example of the random sampler. So we would def we would

random sampler. So we would def we would define the amount of time each of these red boxes is showing each of these slices that we pulled out. And what we can see is if like we generate you know

this again it's going to randomly sample the full chunk of time in a different way each time. Right? So this is the random fixed window sampler. So

sampler. So interactive. Yeah. Cool. Um so this

interactive. Yeah. Cool. Um so this gives you a lot to play around with.

Let's see. Okay. And then here I have um I have this example where I've now split things over time

and we can see um some of the different examples of what can be pulled out of

here. Okay. Boom. Boom. Boom. Okay. This

here. Okay. Boom. Boom. Boom. Okay. This

just shows you that now we could also like only sample data from say when a reach is occurring, right? And so now we

can also do setwise operations to say only give me training data or only give me uh valid samples over specific intervals. And so here we're seeing this

intervals. And so here we're seeing this example um where we've left out certain chunks of time and it can you know

automatically go in and and find these things. Okay.

things. Okay.

So, um, we're going to get more into the details of torch brain, but so we have a few different methods for manipulating neural data, including binning spikes.

So, it's nice because you could adaptively bin spikes at different rates as well.

Okay. So, um now we will give you guys a break and um you can also check out some of the different exercises that we have

here at the bottom and hopefully you can start playing around with this notebook.

Um feel free to Yeah, we have 30 minutes now. So, we'll come back at

now. So, we'll come back at 2:45 and if anyone has a question, just raise your hand. The TAs will be um

ready to like come around and and answer any questions that you have. All right.

Thank you so much.

[Applause]

Loading...

Loading video analysis...