We Resurrected a Dead Language
By Tsoding Daily
Summary
Topics Covered
- Full Video
Full Transcript
Hello everyone and welcome to yet another recreational programming session with a Mr. Zozen. So we've been developing uh our B compiler for I don't know several months already. I think
maybe a month or something like that.
And there was the issue that was created pretty much at the beginning of the development to basically try to compile existing B compilers with our B
compiler. So this is not necessarily
compiler. So this is not necessarily self-hosting and as already said multiple times I do not really plan to self-host my compiler right for me personally I I guess you can basically
rat hole about definitions people love to do that but for me personally self-hosting if is then when compiler compiles its own source code this is what I personally call self-hosting you
may have uh your own definition of self-hosting you do you my my definition is that the compiler compiles its own source code so and And here uh we have a
suggestion to compile other existing B compilers uh who that are self-hosting with our B compiler and um I said somewhere here in discussions that um
right so this could be a very cool sort of like a benchmark or I would say milestone for our development right so if we can take existing B compilers and especially something maybe historical
this seems to be something historical actually so what is this a project to resurrect Unix on the PDP7 from a scan of the original assembly code. Right? So
if we can take something that is historic historical and compile it with our B compiler, we can say that the goal of this project is pretty much achieved.
Right? So and uh after that goal is achieved. So the question will be what
achieved. So the question will be what do we do from there? Right? So and I have a couple of ideas, right? So I like B programming language uh for uh recreational programming activities.
Right? So as already said multiple times that programming in B feels like programming in assembly but without the
usual uh assembly [ __ ] and once we are able to compile actual existing B compiles with our B compiler I want to explore that part of of the B language
being a recreational programming language that is like assembly but without the assembly [ __ ] Um right so anyway uh so here is the source code
of that B compiler it is written in B um so it's it's self-hosted right so and it's actually pretty big right so it's almost like thousand lines of code not
really thousand 825 right so but it's actually less capable than our B compiler so as far as I know it only compiles to PDP7 assembly right so it
literally outputs PDP7 assembly so you'll need a separate assembler which is going to take it and translate it to machine code. So it doesn't generate
machine code. So it doesn't generate machine code directly. Uh our uh compiler actually targets a lot of different things, right? So we can even take a look at what exactly does it target. So let's actually rebuild our B
target. So let's actually rebuild our B compiler. So let's rebuild our B
compiler. So let's rebuild our B compiler and let's take a look at the targets. So so far it supports like five
targets. So so far it supports like five targets, right? So it compiles to
targets, right? So it compiles to Windows, Linux for x8664 and also for Linux ARM also compiles to UXN which is a virtual machine and also
6502. So it supports a lot of things and
6502. So it supports a lot of things and because of that our compiler is more complicated than this one and on top of that our compiler provides nice diagnostics. So here diagnostics are
diagnostics. So here diagnostics are pretty weak as far as I know they just like output characters like it is described in the reference by Ken Thompson. Uh so let's actually go ahead
Thompson. Uh so let's actually go ahead and see if uh our compiler is capable of compiling this source code and if it doesn't uh we may try to maybe fix our compiler or maybe adjust it or something
like that. So so this kind of exercise
like that. So so this kind of exercise is actually pretty good to just see like sort of like a reality check right. So
we kind of like um cooking in our own isolated environment but maybe we are completely out of touch with what B language B programming language actually
is. Uh right so and this is a very good
is. Uh right so and this is a very good reality check. So here is the historical
reality check. So here is the historical B compiler. Uh so it's it copyright is
B compiler. Uh so it's it copyright is not Ken Thompson. So I suppose it's oh yeah so if it's recreated from the from the assembly code right. So they didn't have really a source code. It's
recreated for assembly code. So it was translated back into higher level language I suppose, right? Or something
like that. So it's kind of hard to tell.
Uh yeah. So in that looks like a B uh code honestly, right? There's a little bit of stuff in here. And this is because the escape in here is a little bit different. Anyway, so let's actually
bit different. Anyway, so let's actually go ahead and try to compile this thing with our compiler and see where it is going to. It instantly fails. It
going to. It instantly fails. It
instantly fails. So okay so it depends on a bunch of things and a bunch of external and it's not even marked as external.
Uh so is it oh this is very interesting so it's an existing function.
Ah [ __ ] I remember some time ago I think it was Spearman uh the main architect of our UX
center target said that functions if they are present in the current sort of translation unit they are not considered to be external functions.
So essentially like you're not supposed to put them into the extern right. So, and this is the I I guess
right. So, and this is the I I guess this is the first reality check.
I guess this is the first reality check.
I wonder if we can hack the source code so it it compiles like XT defaf, right?
So, what if I put XT defaf in here?
Okay. So, that seems to be working more or less. All right. So, BL end. So, we
or less. All right. So, BL end. So, we
can go like that. So, here's the error.
Uh, all right. So, is error uh also a function in here? Can I just find uh yeah so that seems to be well it's used a lot there is a lot of function usage
and what's interesting is that they're all out of order which actually yeah this is a very interesting point so
this kind of semantic of functions means that B never needed forward declaration.
C needs forward declaration.
B never needed forward declaration. B
was ahead of its time. Two pass comp I suppose, it, was, kind, of like, two, pass compiler but what it was probably doing it was just leaving like a holes that
were back patched later. Uh right. So I
guess that's what was the what was the case right? So I guess that's the case.
case right? So I guess that's the case.
So we could actually try and maybe put every usage of the function into an extern but I think it's going to be very tedious considering the size of this code. So we have to do something with
code. So we have to do something with that. We actually have to do something
that. We actually have to do something with that. So let me let me see what we
with that. So let me let me see what we can do. So let me actually redownload
can do. So let me actually redownload this just in case and I'm going to try to uh build this and that stuff. Right. So could not find
name. I'm going to go and just find this
name. I'm going to go and just find this particular error in here. Okay. So, what
is this function? What is this function?
This is a compile primary expression right? So, this is just a primary
right? So, this is just a primary expression. When you encounter ID, we're
expression. When you encounter ID, we're looking up that ID in the scope. And by
the way, this is why we are requiring to have externs for functions that are not sort of like visible yet, right? So, so
then we can find them in the scope. uh
that's that's why it's needed. So if you can't find them in the scope, right? So
it will throw an error and down there right. So after the expression was
right. So after the expression was compiled, right? So we look if after
compiled, right? So we look if after that we have all parent, we treat that specific ID as a function call, right?
So we treat it as a function call. So I
wonder can we just insert some sort of a hack in here? For instance, if you can't find a variable, if you can't find a variable, okay, just treat it as an
external argument anyway.
Okay, so that that will create an interesting situation is that when you're trying to use some function that does not exist, I think the assemblers for some of the targets will not
generate an external declaration specifically for fom. So it's going to be a linker error which is actually I guess fine right? So which is actually I guess fine. Maybe this is what we have
guess fine. Maybe this is what we have to do in here. So this is kind of interesting. So I'm actually really
interesting. So I'm actually really curious. I want to recompile this entire
curious. I want to recompile this entire thing. Uh right. So if I do something
thing. Uh right. So if I do something like this uh let's actually go and try hello world. Right. So here is the hello
hello world. Right. So here is the hello world example. Hello B. And I'm going to
world example. Hello B. And I'm going to run this entire thing. So this is the hello world and in here uh I'm going to try to also call fu right so it will
compile but I feel like it's not going to link yes so it it's not even it didn't even assembler right so it didn't get to the uh like a linking stage it didn't even
assemble because since fu was not marked as an external uh right it never generated an external declaration but um
if you do something like that now it is a linker error. Now it is a linker error here. It is an assembly error but on top
here. It is an assembly error but on top of that we can define maybe fu in here but in that case it is going to compile.
Okay just by allowing to call non-existing functions we kind of work around that particular thing. Um but
it's it's it's kind of a hack, right? So
here's an idea. Let's actually uh hack our compiler to the point where it can compile this B compiler and through the hacks we will see how out of touch with
reality we are I think it's a it's a pretty good idea right so the goal of today's stream is going to be okay hack the compiler to the point where you can compile uh right so I think it's a
pretty good idea anyways so in here what I what I want to say in here is something like um so to two.
Um this uh will make the assemblers fail when the function does not exist.
Make them output uh human readable error that points to correct locations location and stuff.
Right? So because right now the the just underline assembly the back end is going to fail without providing any useful human readable information but I mean that will help our compiler to compile
the other compiler hopefully. Uh anyway
so let's actually try to compile it one more time. Lexa error okay character lit
more time. Lexa error okay character lit contains more than two characters.
Ah yes okay [ __ ] classic. So the character literal in B, they allow you uh they allow you to actually put two
characters in there and they basically pack them into into like um into 16 16 bit number. But what's interesting is
bit number. But what's interesting is that B historically never used uh backslash for escaping characters.
Originally it used uh star for escaping characters. So here what we have
characters. So here what we have actually we have uh a star escaped with star. So this is a single character. And
star. So this is a single character. And
by the way recently nominal um introduced a historical mode which actually allows you to use stars as um as as a scape character. So we can even
take a look at that. So um so what do we have in here? So there was some some modifications.
Uh so if we take a look at the Yeah. So
there you go. So uh he makes the compiler strictly follow the description of the B language from the uses reference by Ken Thompson as much as possible right so that's basically what we have in here and okay if you try to
execute hello world right if you try to execute hello world we can even try to run it uh if you set the historical mode uh there will be a lot of interesting
things for example it won't allow you to use C++ style comments right so because uh B never had them right so first of all you you would have to to do something like this. And second of all
it's going to literally treat back slashn as back slashn. So because new line is going to be star n, right?
Historically, uh, right? So historically, so that's
uh, right? So historically, so that's how it's it's going to work. So I
suppose it only makes sense to try to compile this b compiler with the historical mode, right? So it only makes sense to do it like that. All right. So
let's give it a try. I'm going to do b uh b.b B, I set and uh unknown escape
uh b.b B, I set and uh unknown escape sequence starting with R. Oh, this one is, interesting., So,, there's, a little, bit
is, interesting., So,, there's, a little, bit of weird stuff in here. R.
We I don't think we have that in in Alexa. So, escape.
Alexa. So, escape.
So, we only support N and T. And I don't remember uh did Ken Thompson actually described this escape sequence in there.
Uh so, let me see. escape. Okay, so you have end of file.
This is such a like why would you need to escape that? Do they have like a special meaning or something like that tab? Wait, there's no R in here.
tab? Wait, there's no R in here.
Wait, did I get scammed?
Is that is that even historical B compiler? It's using some extensions.
compiler? It's using some extensions.
It is less historically accurate than our [ __ ] What? What? Who submitted
that?
I got scammed. I got [ __ ] scammed.
Like I mean so but honestly again, so there's no like a very rigid description of what B
was, right? So B is more of a vibe
was, right? So B is more of a vibe right? It was never like a formally
right? It was never like a formally standardized or anything like that. Uh
so important milestone no more. Well
okay. So, we can it is still pretty important, right? So, it's sort of like
important, right? So, it's sort of like a also historical recreation, right? So
project to resurrect you. Yeah, I think it's it's pretty close. Maybe it's not a super close, but I mean, uh yeah, so they actually support this kind of
stuff. Like, I don't understand.
stuff. Like, I don't understand.
Can somebody explain to me what what's the [ __ ] point of escaping curly braces?
Why?
This is so weird. Do they have like a special meaning within? They don't even have a special meaning within the the string, literal., Um, ah, maybe, there, were a
string, literal., Um, ah, maybe, there, were a keyboard that didn't Okay. Okay. Okay.
That's that's fair. That's actually
fair. But the curly braces are used within the source code. Uh so well I mean they they're used to denote the
blocks of the language itself, right? It
doesn't make any sense. Like like
escaping things only makes sense if it's something special. Like I do not
something special. Like I do not understand. Um
understand. Um [Music] anyway, whatever. Fings.
anyway, whatever. Fings.
Oh, sweet, sweet summer child. Fings.
Holy [ __ ] Yeah, [ __ ] hstrings that's for sure. Uh, all right. So, we
got some subs by the way. A lot of subs actually. So, thank you so much, Aqua
actually. So, thank you so much, Aqua here for tier one with a message. Mile
stoner today. Exactly. Exactly. My
friend, uh, thank you so much for tier one the Mark. So, thank you so much for uh the
Mark. So, thank you so much for uh the Twitch Prime with the message sub zoding. Hello. Hello. So, hey you here.
zoding. Hello. Hello. So, hey you here.
Thank you so much for the sub. Hey, love
your streams. Uh, thank you. Thank you.
Uh Dex Jax, thank you so much for uh with the Mesopot. Thank you. Thank Thank
you so much. Really appreciate it. Okay
so that's kind of bizarre. So suppose
this is a carrot uh carriage return or whatever the [ __ ] it is. We can add support for that. Like I mean it's also kind of whatever whatever. Let's let's
freaking add it. So this is going to be that. And uh there we go. Here is your R
that. And uh there we go. Here is your R thingy majingy. Can I rebuild this
thingy majingy. Can I rebuild this entire thing like so? Uh okay. So
literal character string identifier. Um
oh you you can have negative numbers in initialization. Oh
initialization. Oh um well I mean it almost reached the end. It almost reached the end which is
end. It almost reached the end which is which is kind of cool. Like I didn't expect this kind of stuff. So if you take a look at how we parse like literal
with uh with minuses oopsy oopsy doopsy token uh minus all right so where is the token minus so here it is I suppose this is a primary expression and when we
encounter minus we literally emit a negate operation right so we support negative numbers at runtime but I mean if we implement constant
folder it will just go get optimized out so who who cares who cares Am I right?
So it will just get optimized out. So in
our case minus is not even a part of the constant but but the question is is it a part of uh the definition in here. So if I take a look at the
here. So if I take a look at the constant I do remember this kind of thing. Uh look at that. So if you
thing. Uh look at that. So if you defining so oh it's a i value. Okay. So when you define um a variable you can initialize
the variable value global variable specifically right you can initialize it with I val I is either constant or a name name it's understandable just the
name of the other variable constant uh if you take a look at the definition of the constant it's okay it's it's digit
which is rather a weird way of doing that right so it's I suppose it's basically number and a character or a
string. So there's no minus anywhere in
string. So there's no minus anywhere in here.
Our compiler like I'm telling you our compiler is more historically historically accurate than whatever the [ __ ] this is.
What the [ __ ] Excuse me. So
ah H excuse me. So okay um we can try to maybe support that. So where is compile program right? So this is like a top
program right? So this is like a top level of the things we encounter an ID.
Okay. So if after id we have opar this is a function definition as you can see.
So this is a function definition assembly definition we don't care variable definition. So this is where uh
variable definition. So this is where uh what I'm looking at right now. Uh so
yeah we're expecting lots of things in here and yeah so this is a very important part. Uh okay so if the token
important part. Uh okay so if the token starts with int lit or charlit uh we set the value um as literal so and here is
the value and then we use that value in a global variable definition. So what
I'm thinking what we have to do in here we have to expect minus right so we have to expect minus like if it is a minus then after the minus right away we are
expecting um int lit right so we're expecting in int lit and now we should be able to
just return int lit but negative about that look at look at right so if it's a it's a minus right okay We expect int and we just emit a negative one.
Easy peasy lemon [ __ ] squeezy. That's
how you could. That's how you [ __ ] could., Uh,, all right., So,, token, mismatch
could., Uh,, all right., So,, token, mismatch type. Oh, this is because it's several
type. Oh, this is because it's several one. Okay. So, what else do you want?
one. Okay. So, what else do you want?
Uh you ah wait the freaking second. Are you for real Bravki?
[ __ ] H [Music] just cast it.
You know what? Can I just uh um [Music] cannot be used as unary?
What? What the [ __ ] Rust is a piece of [ __ ] Ah, okay.
All right.
Wait, it still complains about that stuff. Does anybody know what the [ __ ]
stuff. Does anybody know what the [ __ ] is this [ __ ] Like, what the [ __ ] is it all about? Right. So, why would that
all about? Right. So, why would that negate an unsigned uh an unsigned integer? It's to complement. Exactly. To
integer? It's to complement. Exactly. To
complement, to, complement., All right.
It's it's kind of difficult for me to explain uh this complement within a single stream but
uh yeah the idea is that uh essentially you have like several beats
right and uh basically the lower bits represent things like one two uh four 8 uh 16 and so on and so forth. And
essentially how you compute the the value of the entire thing, right? So if
one of the bits is set, what you do? You
just sum up the values of corresponding bits and there you go. This is basically 10, right? So this is 10. So but with a
10, right? So this is 10. So but with a tooth complement um the last bit instead of being whatever it is is reserved as
minus that value, right? And essentially
to negate tooth's complement you have to invert its beat and add one. Right? So
as far as know Bener had a pretty good video explaining tooth's compliment. So
I'm going to redirect you to his video.
So here is the Wikipedia to compliment and to compliment is something that is used for negative numbers in all of the modern computers. Right? So Bener's
modern computers. Right? So Bener's
video about to compliment. just Google
it up on YouTube and I'm going to put it in the description for for people who's watching uh the stream right now. So
yeah, this is some sort of [ __ ] like I don't [ __ ] understand why it doesn't work. Uh so
work. Uh so uh so we have ah we have to probably like expect it somewhere here.
Uh all right. So I'm going to probably here, right? So token minus something
here, right? So token minus something like that, right? So still doesn't work.
Wait a freaking second. Um Oh, yeah.
It's a little bit complicated. This code
is Well, I noticed, buddy. I I noticed uh I noticed Oh, Yui. I I think I vaguely remember
Oh, Yui. I I think I vaguely remember that pull request. Yeah. Yeah. So, it's
kind of can find a better way to write it while keeping accurate error messages. Okay. So, I what it's doing, I
messages. Okay. So, I what it's doing, I suppose, is just like expecting everything up front, right? So I suppose this is what we have to put in here
right. So we're expecting everything up
right. So we're expecting everything up front minus int lit right if we see o bracket right we expect an integer and
after that we're expecting the same thing as we expect in here but not okay so will that be enough will that be
enough okay that was enough okay so it cannot find symbol read this is interesting okay so read Um, okay. So
the call to read it doesn't even exist.
H, but I feel like Okay, so read seems to be acting like a get char. Yeah. So, as
you can see, yeah. So, this is a straight up get chart, right? So, read
and see equal to. Okay. So one of the things we can do can we just create like a separate file like a B I I don't know
like utils or whatever can I just create uh read right and since we're using lib B right so send library of B we do have
get char in here we do in fact get char so we can actually right away return this entire thing so maybe it's defined somewhere in a different place we have to link it somehow but I don't want to
go through all of read undefined symbol. Oh yeah. Yeah. So
we have to actually put this thing in here. So B utils B. Uh yep. So get char.
here. So B utils B. Uh yep. So get char.
So we don't have a get char. Um so let's actually mark it as external. Get char.
Um wait. Ah. So because because of this
wait. Ah. So because because of this this is what we have to do.
Okay. So now write. Okay. how write is used throughout this entire thing. Uh
yeah, so it's basically putchar. So as
far as I know in the standard library of C, we do in fact have putchar of some sort. Uh yeah, so here it is. And it
sort. Uh yeah, so here it is. And it
acts like a similarly to how it was in B, right? It just puts a single char. So
B, right? It just puts a single char. So
we can just use that. It's equivalent
put put and putsy tootssy. Whereas the
putsy tootsy is equivalent to except that it's okay. So write character right? So that's that's understandable.
right? So that's that's understandable.
All right. So let's do something like this. I'm going to do write and it
this. I'm going to do write and it accepts a character. Uh and I suppose we're going to do put char x like this.
Uh okay. So I'm going to try to compile this entire thing one more time. Flash.
What the [ __ ] is flush? Man, I feel like we're going to have a lot of these functions, right? So I hope this is not
functions, right? So I hope this is not coming from some sort of a standard library that we have to link with or something like that. So um I don't know.
Okay. So there is some to compile the compiler with there is
bit C of some sort. Okay. Let me take a look at this entire [ __ ] So um can we just clone this [ __ ]
What is this [ __ ] Can we just [ __ ] clone this [ __ ]
Um, so find uh PDP7 type file named B. C.
Uh, oh, implemented in a subset of the C language compatible with B.
Oh, and it has flash and stuff like that.
Huh.
So there is a B compiler flush buffer to disk. Then you have uh sik.
What are you talking about? I don't
know. Okay. So
how flush is even used in here? How
flush is even used in here? Oh
it's basically it's basically writing and then flushing. So let's actually assume that it's basically flush, right?
So maybe it's a F flush. Uh so let me let me try. So but F flush requires you to provide the the string. So but I think if we put that in into B utils, we
could probably do that, right? So let's
actually have external symbols like F flush and std out, right? So we're going to be writing to std out and let's do F flush std out like this. Right? So this
is what we're going to have in here. Um
all right. So let me try to maybe rebuild the whole thing. Uh, get char undefined symbol getchar.
Uh, do I have to mark it as external? I
think. Yeah. So let's actually mark this as external because it's coming from lip c and I suppose like everything that we use from lip c it probably has to be
marked as external. Um, okay. So
undefined reference to f out. Ah, [ __ ] Uh, so all right. Um, so what the [ __ ] is f out?
It's an external symbol and it's a freaking variable that you assign to.
Damn.
Oh, but it's not used very much. This is
the only usage of this entire thing.
Okay, so we're saving the value of f out and we're setting it to one and then we're restoring it.
What is the ah this is the this is the std out. So
basically through this you say where you output right so you switch it out you they saving it so then they change it to standard error because they are reporting error and then they're
restoring it back that's what it is okay one of the things we can do in here in fact is we can just define it in here and we can ignore it right let's just
let the code assign this entire thing uh and yeah so that's That's actually That's actually pretty cool. So, let's
see.
It compiled.
Um, excuse me. Um, it just compiled. I
mean, yeah. So, stream over. Yeah. But I
mean, I'm I I'm pretty confident this is this [ __ ] is going to sold right? So just because it compiled maybe
right? So just because it compiled maybe it generated some bogus assembly or something like that but I mean it compiled kind of uh right so let's
actually see. So um thank you so much
actually see. So um thank you so much last royal raven 41 with a message oppa hello hello. So Alex over six thank you
hello hello. So Alex over six thank you so much for the sub with a message always doing cool stuff as always. Thank
you thank you thank you Lunrago. Thank
you so much for the sub with a message.
Hello. Do you have plans on doing content about compression algorithms?
And I don't really have any specific plans on doing compression algorithms but like what exactly is so interesting about compression algorithms, right? I
can do one stream marketed as compression algorithms, but then I'm going to do another stream about other compression algorithm and nobody's going to watch that because everyone is going to feel, oh, he already done a
compression algorithm stream. So it has to be it has to be marketed in a very specific way. If it's going to be long
specific way. If it's going to be long continuous series, uh I have to be very strategical about that. I can make a stream about one single compression algorithm that I find particularly
interesting or something like that, but I don't know. I don't know. I don't
know. I don't know. So I'll think about that. I'll put that to my to-do list.
that. I'll put that to my to-do list.
What compression algorithm do you want me to look at? RL E.
Um anyways um so let's actually see what is going to happen. What is going to happen? So um my assumption is that it
happen? So um my assumption is that it it is reading um from the standard input and writing into the standard output. Right?
So that's what it does. Right? So it
basically doing that. So let's go off from that assumption. Um, all right. So
let me see.
Um, okay.
Um, it's it's doing things. Uh, it's
doing things. Uh, what about if I Oh, is that PDP7 assembly?
I think this is a PDP7 assembly. Does
anybody know if this is a PDP7 assembly?
Looks like it. So, we had a uh C compiler. We had a C compiler. So, let
compiler. We had a C compiler. So, let
me let me find the type file name B. C
uh PDP7.
Uh, well, this is not where you have to put that. You have to put that in here.
put that. You have to put that in here.
PDP7. Here it is. So if I now try to maybe build that thing PC O
uh type default implicit int. Ah, it doesn't like
implicit int. Ah, it doesn't like implicit ins. What if I say it's a very
implicit ins. What if I say it's a very old C?
Okay, it seems to be. Yeah. Okay. So BC.
Uhhuh.
Oh wow.
It [ __ ] worked. So but then in this one something is [ __ ] Honestly
something is truly [ __ ] So uh the BC we can actually use it as a reference right. So is like we can compare what
right. So is like we can compare what this one outputs and what this one outputs. I think we lost some sort of
outputs. I think we lost some sort of names. So if we are reading some names
names. So if we are reading some names they're probably read into some sort of a buffer.
So maybe something [ __ ] up with the buffers. And
buffers. And is B.B doing any pointer arithmetics?
is B.B doing any pointer arithmetics?
[ __ ] [Music] Oh boy. I I think Okay. So, let me let
Oh boy. I I think Okay. So, let me let me try to Okay, so it sack folds. Okay
here's the interesting thing. It
sackfolds when I end this stream.
Uh, right. So, if I press Ctrl D, which means send end of file, it sackfolds.
The question is where does it sackfold?
Uh, we probably won't be able to tell because we don't really have any debug symbols or anything like that. It's very
difficult to generate them with fosm but we can try right so let's actually take a look at some of the uh some of the maybe assembly right so where exactly does it fail okay so I'm going to press D
well at least we have a symbol where it is located right so at least we have a symbol where it is located so let me let
me see um so BB uh symbol Uhhuh Uh
does it do any anything fishy?
Does it do anything weird?
Mhm.
Don't really see. So, where could it fail? Interestingly
fail? Interestingly so pick symbol.
It's just a symbol.
It's just a symbol.
[ __ ] Okay, so here's the thing. Um
pointers, uh, so the memory in PDP7 and in B correspondently is not addressed by bytes.
That's the thing. It is not addressed by bytes. It is addressed by words.
bytes. It is addressed by words.
So essentially right so the word in PDP original PDP is actually two bytes right so 16 bits uh so a a bb c d e e right
and essentially if you're d referencing an address zero you do referencing this pair if you d reference an address one you referencing this pair
if you do reference under three you referencing this pair I mean two uh and so on and so forth Right. So that that's basically how it
Right. So that that's basically how it works.
So this is not true with modern platforms that we are targeting like x8664.
The addresses in x8664 they are bytes.
They are bytes. So we introduced such an interesting notion as the size of the word.
The size of the word.
And essentially we deviated from the semantic of painter pointer arithmetics of original B. We kind of deviated from
that.
So essentially uh if you have a pointer right and you just dreference it is assumed it is assumed to be uh addressed
by bytes.
But if you do have a pointer and you try to index it, it this particular pointer is assumed to be pointing at the array where each element is the size of a
word.
Right? So essentially if you do something like this, it is equivalent to doing something like plus size of the
word. Right? So size of the word
word. Right? So size of the word dreferencing in case of x864 it's going to be eight. So this is a huge deviation
from how originally B worked because in B uh this is always equivalent to this
always right because you you cannot address each individual bytes. You
cannot address each individual byes. So
and I think in here we got [ __ ] by this specific situation because uh sim buff, let's take a look at the definition of sim buff.
Uh see, so probably have to do something like it's probably some sort Yeah, it's probably some sort of a buffer uh where we kind of store the names and stuff
like that. And when you define an array
like that. And when you define an array of 10, this is an array of words, not bytes. It is an array of words. Uh, and
bytes. It is an array of words. Uh, and
the size of the word in x8664 is actually eight. So, this is an array of
actually eight. So, this is an array of 80 bytes. This is 80 bytes. This is 80
80 bytes. This is 80 bytes. This is 80 bytes. But what we do in here, we add
bytes. But what we do in here, we add nine bytes which shouldn't actually break anything per se, right? So I didn't think it should break anything per se, but we
have a lot of other code which potentially might be using the B semantics when it comes to pointer arithmetics.
So yeah 10 letter max identifier is crazy. I
mean, we're talking about like 1970 right? So, uh, if I take a look at this
right? So, uh, if I take a look at this thing, um, so PDP7, when was the PDP7, PDP7 um, it's 1965.
We're talking about this era of computers, right? So, it's honestly
computers, right? So, it's honestly like look at the size of this program.
Look at the size of the program for 1965.
This is an insanely complex project for 196. Imagine writing something like
for 196. Imagine writing something like that in 1965 when we didn't have super powerful computers and we didn't know about programming that much. Imagine writing
that.
It is actually insane. Ken Thompson is a [ __ ] genius if he managed to write that in 1965. Right? And suppose this is not originally what he wrote. It's
recreated from the assembly scan.
From the assembly scan by 2025 standard.
This is a pretty complex program and it's written in 1965. It's insane.
Uh I remember Ken talked about grap uh being,60 plus line of code. Yeah. So for
those times it's an insanely complex program.
I mean our compiler is also not simple one right so if I do clock so 5,000 lines of code I mean yeah 5,000 lines of
code but anyway so what should we do what should we do well I mean we kind of already achieved the goal right so we compiled the
compiler it even worked to some extent incorrectly granted incorrectly but works. So the milestone kind of have
works. So the milestone kind of have achieved. Uh but now I feel like yeah so
achieved. Uh but now I feel like yeah so at least, now, I, know, that, I, had, to, get, a little bit more serious about semantic of pointer arithmetic compatibility with
the original B right because I suppose a lot of historical B programs programs kind of depended on that specific semantic it in fact kind of uh dependent
on this on the semantic. So let's take a look at how we handling this kind of stuff in in B. So if you're doing the referencing, right? So you are probably
referencing, right? So you are probably looking at multiplication token, right?
So a primary expression prime expression starts with a multiplication and what we're doing, we're just using dreferencing, right? So d referencing
dreferencing, right? So d referencing and d referencing is performed at the at the codegen, right? So we referencing it as is without modifying or anything like
that. So if you're indexing uh right so
that. So if you're indexing uh right so you probably start with O bracket and here is an interesting thing in here uh you compile in an offset between the
brackets right you've got an offset and now the first thing you do you multiply the offset by the word size right so here is the multiplication of the offset
by the word size you save it into the result and then you add that uh specific multiplication of offset and word size to sort of the base, right? So that's
what's going on in here, right? So
that's what makes dreferencing and indexing actually semantically different. So because of the because of
different. So because of the because of this specific thing, by the way, in original B, uh P1 was equivalent to one P, right? Because it would have been ex
P, right? Because it would have been ex uh basically expanding into something like this, right? So which is equivalent to this specific thing. But in our thing
it is not equivalent because this thing is always addition multiplied by the size of the word. Um right. So what
would it take to hack our system to support to support the uh historical semantic of the pointer arithmetic?
Right?
We have to if we're going to be working with pointers, we have to store all the pointers divided by the size of the word.
Right?
So every time we take a reference or getting a reference from somewhere, it has to be divided by the size of the word. You do all of the pointer
word. You do all of the pointer arithmetic in words. As soon as you try to dreference it, we multiply that size
of the word by that pointer by the size of the word and only then dreference it.
So this is how roughly we can emulate that if you know what I mean, right? So
just keep all of the pointers as word indices. As word indices seems cursed
indices. As word indices seems cursed but I mean like we're trying to recreate like a historically how B worked.
Uh so yeah this one is rather interesting what what's interesting is that sizes of the word are always multiples of uh two
right it's powers of two right so we have two two bytes four bytes uh eight bytes and so on and so forth right if I
have uh an absolute pointer an absolute pointer in bytes and I want to translate it into b pointer which is in words uh
right of you know x8664 how would I do that I have to divide it by eight but since it's a power of two I can actually use shifts
right so essentially by shifting to the right by one I'm effectively dividing this thing by two
by shifting it by two I'm dividing it by two two times effectively dividing it by four By shifting it by three, I'm dividing it by two, by two, by two
right? Which effectively means I divide
right? Which effectively means I divide it by eight right?
And bit shifting is actually very cheap operation. So every time I have an
operation. So every time I have an absolute pointer, I convert it into B pointer by shifting it to right by three bits. And every time I want to
bits. And every time I want to dreference uh this particular B pointer right, I shift it to left also by three
and I dreference it. Um
that's basically how I think we could do that efficiently, right? Because like
you know dividing and multiplying is a little bit too expensive to maintain this kind of semantic but I think shifting is going to be all right. I
think shifting is going to be all right.
So yeah I think this is what we should try to do. What do you think?
Mhm. Mhm.
Array indexing is syntactical sugar.
Yes, it is true. It is in fact syntactical sugar. So, we'll have to go
syntactical sugar. So, we'll have to go through all of the places where we do kind of like a pointer related stuff. A
pointer related stuff um is probably located in dreferencing.
Okay. So, here is the referencing. Okay.
I'm I'm going to break uh every place where we use dreferencing arguments.
So then taking a reference of autovar and external thing also related to references. So I'm going to put
references. So I'm going to put underscore in here. Data offset is basically a pointer to a string. [ __ ] And pointer to a strings are really interesting in that regard. So I don't
really know what to do with them, but I want to visit those places as well.
Right. So I suppose these are all of the places kind of like all of the arguments that are related to references right. So
we have to change their semantic like everywhere where we encounter them. So
in terms of operation I think I remember vaguely operations also had stuff related to references specifically store. This is something that I wanted
store. This is something that I wanted to change some time ago but whatever. So
index is an index of autovar that stores the uh pointer where you want to store this argument right so because of that when you do reference so we we do a little bit of a pointer stuff in here so
we also want to visit this place too we also want to visit this place too so do we have anything else I can't remember anything else off the top of my head
uh yeah so I guess let's go ahead and visit all of these Let's go ahead and visit all these places. So, we got some subs. Thank you
places. So, we got some subs. Thank you
so much, Zidal for for the sub with a message. So cool following you building
message. So cool following you building a compiler. Awesome work. Thank you.
a compiler. Awesome work. Thank you.
Thank you so much. So, I think I also missed somebody from Brazil. Thank you
so much to Cold D for this with a message. Cheers from Brazil. Thank you
message. Cheers from Brazil. Thank you
for your content. Cheers. Cheers. So
many people from Brazil. Actually, it's
interesting. There's
uh there's a pretty strong school of informatics and programming in Brazil.
At least, this, is, what, I, noticed., Um, a lot of very cool and uh smart programmers are from Brazil. And I think
I suppose this is the same school that created Lua right?
It must be. It must be the same. Yeah.
So it must be the same school that created L. It was actually pretty cool.
created L. It was actually pretty cool.
[Music] Okay. Okay. Good. So, let's go and visit
Okay. Okay. Good. So, let's go and visit all of these uh beautiful places where there is some references stuff, right?
So, compiler assisted refactoring. Let's
[ __ ] go. Uh what do we have? Okay, so
this is a arch.
Um, I don't really want to do a arch right now. Um
now. Um can I just do something like this? Okay
so I'm going to put to-do. So, right
now, a arch is kind of broken, right?
So, uh, let's not do that. Um
so this is the problem with too many targets, right? So, it's kind of cool
targets, right? So, it's kind of cool that we're adding all these targets, and it's it's really cool that I made it. So
adding new targets is easy. But every
time you experiment with the interface between the front end and the back end of the compiler, you kind of have to touch each individual back end now which is kind of annoying, right? So
it's it's kind of dangerous for the experimentation. And because of that
experimentation. And because of that I'm kind of hesitant on adding more targets even though I want to, right? So
that's kind of the the main value of the compiler is supporting all of these targets, not the language itself. Um, so
can I just like for now I I don't want to be dealing with targets that I really don't care about. Can I disable some of them? Right. So for instance, yeah, can
them? Right. So for instance, yeah, can I just say okay so let's not compile all of these things. Right. Uh, right. So
let's let's not compile all of these things. Okay. So it tries to generate.
things. Okay. So it tries to generate.
So temporarily I'm going to I'm going to actually disable all that stuff. Uh
right. So I think I think that's that's a good idea.
So let's let's remove that as well. Uh
so this is the config. So and ah [ __ ] So we're reading the config. Okay. So
this is the runner but it's for 6502.
Okay. Whatever. Uh okay. So now we are going through fasm x8664.
So calling an external function by a reference.
Okay.
Do we call external functions by a reference somewhere in the compiler? I
actually doubt that. And if we do, we'll instantly discover that by hitting this specific to-do, right?
colon external functions by a pointer is disabled for now. Right? So, it's kind of a very
now. Right? So, it's kind of a very weird case and if it does exist in the compiler, we'll have this to-do and this to-do will tell us if this is something
that is like needed to support. So, for
now, I'm disabling this entire thing and yeah, so it's underscore of course.
Okay, so loading an argument into the register. Uh, right. So if you're trying
register. Uh, right. So if you're trying to dreference, okay, so here's the thing. We are loading a pointer into a
thing. We are loading a pointer into a register and then we dreference that dreferencing that pointer into the same
register. So if we're going to be uh
register. So if we're going to be uh storing pointers pointing addressing by words by words not by bytes this
specific pointer that we just loaded needs to be multiplied by eight it needs to be multiplied by eight because it's divided by eight right so it points at
particular words it points at particular words so um x8664 so multiply it we have to move it to left uh shift left uh lift
left beat shift left.
So shift rotate uh shift rotate shift left. So we can try to do SB append
left. So we can try to do SB append output C. Um so
output C. Um so something like that.
Uh shift left this to three. Something
like that. You can do that in fuzzy if I recall correctly. So can I uh do
recall correctly. So can I uh do something like eight? Is that what you're saying? Yeah
eight? Is that what you're saying? Yeah
you you can do stuff like that I suppose.
But I'm not sure like is it going to work?
Uh let me let me try to test that. So if
I have something like so this is the register uh fu azam right. So I have main format L64
executable and this is return right. So
this is return fuzz fu azam. Okay that
seems to be working. So if I have rax pointing at something like
um 69 then I'm trying to load something from rax. So probably maybe I have to do
from rax. So probably maybe I have to do well doesn't matter I think. Okay. So
can I then multiply it by eight?
So huh it can be. Yeah. So as far as I can remember x864 instructions have sort of like
embedded arguments for this kind of arithmetic operation. So this kind of
arithmetic operation. So this kind of arithmetic, operation, is, going to, be, done for you very very cheaply.
Uh, can you put anything even not power of two? I I guess you can, huh?
of two? I I guess you can, huh?
Wait, exist was specifically designed for doing this kind of [ __ ] Seriously that's actually kind of cool. Uh, right.
So, let let me let me try to do that.
So, okay. So, we can try to multiply it by, eight., All right.
by, eight., All right.
Okay. So let's try to maybe rebuild the whole thing. Uh right. So this is the
whole thing. Uh right. So this is the referencing and I forgot to to put it like that. Okay. So that's actually
like that. Okay. So that's actually pretty cool. Uh when I'm taking a
pretty cool. Uh when I'm taking a reference to this kind of stuff. Now I
have to divide it by eight.
I have to divide it by eight. And uh I don't think assembler allows me to do that very easily. Right. So I loaded an
address of this thing into into the memory right?
So yeah, I don't see how I can like easily divide that unless maybe layer allows me to do that. No, I don't think so. Right. So the idea here is
think so. Right. So the idea here is that I will have to do SB append output C. That's going to be four. And I want
C. That's going to be four. And I want to shift to right. So we're doing division. So shift right s
division. So shift right s uh right by three and this is going to be the register right. So this is basically what we're doing uh shift to the right.
Mhm.
So then we have a reference to an external and it's kind of similar actually. It is kind of similar. So but
actually. It is kind of similar. So but
we have to be sure that this external is aligned properly. if we're taking a
aligned properly. if we're taking a reference to an external variable. Uh
but I don't think we're taking a reference to an external variable very often in here, right? So it's kind of kind of interesting. Uh so I'm not sure
what to do about this kind of stuff right? So because here's the thing like
right? So because here's the thing like if you taking a reference to something that is not aligned by a word.
So by dividing it by eight you will lose precise value. You will lose precise
precise value. You will lose precise value., So, for example, uh, consider, this
value., So, for example, uh, consider, this thing. So if you divide 18 by 8 and
thing. So if you divide 18 by 8 and multiply by 8 you're going to end up with 16 always. Right? If your value is divisible by 8, that's always going to
be the case. But if your value is not divisible by 8, aka unaligned uh well, I mean it has to be integer division. This is not going to be the
division. This is not going to be the case, right? So this is not going to be
case, right? So this is not going to be the case. Uh so we also need to make
the case. Uh so we also need to make sure so we can guarantee that um for example, autovars are aligned by words because they are located on the stack.
Yeah, this new line. Thank you so much.
Yeah, they are located on the stack. But
for the externals, we can't really guarantee that. We can't really
guarantee that. We can't really guarantee that runtime check. Like I'm
not doing runtime check right now for a code that I'm hacking just to make a thing work right now. Right here, you're suggesting me to do production level code. No, I'm not doing production level
code. No, I'm not doing production level code for something that I'm trying to hack. Uh, right. So something I'm trying
hack. Uh, right. So something I'm trying to hack. Yeah. So the also new line I
to hack. Yeah. So the also new line I mean the compiler will tell me right so the code is just not not going to assemble it doesn't matter whatever uh don't obsess over something that is not important what's important here is what
do we do with the [ __ ] analign external names that's what's important because that's not going to properly tell you anything uh
so this one is interesting so I suppose for now uh I'm going to put it to do here um
it's unclear what to do with uh with unaligned unaligned external names
right? So, it's really unclear.
right? So, it's really unclear.
So, what we're going to be doing with them, I don't know.
I don't know. I don't know.
Uh so, what else? So, data offset. Here
is the data offset. And for the data offset, right? So essentially it takes a
offset, right? So essentially it takes a uh it takes a pointer and the data might be unaligned.
So this one is interesting.
Um see what to do with unaligned data.
Do we use strings anywhere within the B compiler?
Oh wait, we don't.
Oh, that's actually cool.
Uh, so this is never going to happen in the compiler, right? So, because there's no strings for for that to happen.
Actually, there's no strings for that to happen. Okay, so that's that's fine. All
happen. Okay, so that's that's fine. All
right. So, uh, what do you want from me?
So, what incompatible types? Ah
freaking Okay, so this is annoying.
Yeah, different types and [ __ ] So we have to put semicolons in here. And it
also means they have to put this kind of stuff in here. Uhhuh. So this is going to be something like that. Something
like that.
Uh something like that. All right. So
let's [ __ ] go.
Not sure about this one, but um so require them to like guys don't suggest me production level ideas. I'm
trying to hack a compiler. Uh right. So
to make it work for like for now, right here, right now. Okay. I think I need to make a small break. I want to make a cup of tea. Um right. So just like insert
of tea. Um right. So just like insert the check to tell the you dude, do you understand what I'm trying to do?
Uh, it's just like Jesus.
Jesus [ __ ] Christ. Anyway, uh, let's make some break. Uh, I'm going to make a cup of tea and after the break, we're going to go through the rest of the places where we do the pointers and [ __ ] like that and we'll try to hack them uh
to make them work for this specific compiler., All right,, we're, back., So,
compiler., All right,, we're, back., So,
let's continue going through the compilation. Uh, so store ref name. Ah
compilation. Uh, so store ref name. Ah
this is just IR. Uh, in case of IR, uh I think none of that matters. But did I do I think I I literally missed store.
Yeah. Yeah. So, looking at the store why did it the out of order errors is actually
[ __ ] me up a little bit. Okay. Okay.
So here we are loading a pointer uh from an autovar into rax and pointer is actually points at a word right then
we're loading a value that we want to store by that pointer and then we're doing this dreferencing thingy and I suppose um so you can multiply it by
eight right so this is one of the things you can do in here uh all right so I'm going to put underscore in here.
Hopefully that is going to work. Uh that
is going to work. So this is just uh you know code gen uh or debug information right. So none of that needs to be
right. So none of that needs to be modified in any way honestly.
Uh right so this is just intermediate representation uh token. Ah this is already not a code
uh token. Ah this is already not a code gen this is a compiler. Right. So when I have a primary expression starting with a multiplication sign, this is just a d
referencing. None of that needs to be
referencing. None of that needs to be changed in here. Uh what else in here?
Uh this is referencing. Uh and if you have Okay, so none of that needs to be modified. I think none of that is
modified. I think none of that is important. Data offset. Uh yeah, none of
important. Data offset. Uh yeah, none of that is needs to be modified as well.
All right. So ref external yeah all of that is the same but you still need to go through them right so just to confirm that all of that is the same right so this is the compiling
string into data sets that's understandable here is the cool thing this is indicing this is indicing so we
are multiplying offset by the word size we don't need to do that anymore we can just take offset we can just take offset and
directly add it to the argument like this. So we don't need multiplication.
this. So we don't need multiplication.
I'm going to also remove this to like that. So yeah. So now this should be
that. So yeah. So now this should be equivalent, right? So this is the
equivalent, right? So this is the referencing as usual. Uh what else do we have in here? So this is a compiling that stuff. There's
that stuff. There's some things that are not important. Uh
but we still need to get through them.
So dreferencing this is a store. Uhhuh.
So this is the referencing.
Okay. So what is that semicolon uh stack grows down to talking?
Ah this is a very interesting syntax right? So if you have a function you can
right? So if you have a function you can define an autovar right but then if you put a number in here it does not initialize this auto
war with the number it allocates this amount of auto wars on the stack and gives you some sort of like a uh like a vector that is allocate allocated
on the stack right that's what it is I don't think we need to do anything with that in here right so just going to keep it like this uh just going to keep it like this So
and we have a couple of warnings in here. We can try to fix the warnings
here. We can try to fix the warnings but I'm not sure how good of idea that is. Word size is not needed anymore.
is. Word size is not needed anymore.
Look at that. Yeah. So, because we don't compute the word size anymore. Uh name
lock is not needed, but I don't know like uh I feel like if I try to remove them, it's too much of a uh you know rabbit hole, right? All the things kind
of depend on that. But maybe not. Uh
right. Right. So, we're not going to commit this code anyway. So, maybe none of that matters.
Uhhuh. So, let me try to remove that stuff now.
Uhhuh. So, now we have that. Okay. So
that seems to be working.
Okay. So, let me double check uh if everything is okay. All right. So
uh right. So, this is dreferencing. We
multiply it by eight when I'm dreerencing. Right. So, that's
dreerencing. Right. So, that's
understandable. when I'm referencing on the other hand I have to divide it by eight but I can't really do that in assembly at least I don't know how to do it in assembly so I divide by eight by
shifting to to three uh external I'm just like loading the value inside of the external none of that really matters uh right so data sets I'm not sure what to do with data set so I put to do in
here and most important one is a store so I multiply it by eight so because I'm dreferencing it's it's kind of equivalent to dreferencing okay so that
should more or less hopefully work. So
let me recompile one more time.
So let's try to build the compiler.
Okay. Character contains more than two characters. Ah yeah historical mode and
characters. Ah yeah historical mode and oh yeah also B utils B utils B. Okay. So
that didn't break and even assembly didn't didn't break because so you can do multiply by 8, right? So we already confirmed that and shift right. Does it
even shift?
It doesn't even use shift right.
Wait a freaking second. Wait, wait
wait, wait, wait, wait. That's that's
interesting. Shift right. So that means it never it never referenced anything anyway. Okay. So, you never needed any
anyway. Okay. So, you never needed any of that stuff at all.
Uh, right. So, that's kind of interesting. Okay. So, will that fix
interesting. Okay. So, will that fix anything? So, what was the problem? Did
anything? So, what was the problem? Did
anybody remember? So, PDP I need to find an executable uh in PDP7 ex Oh my god.
Name BP.
I think I had BC. Yeah. Yeah. BC. Right.
So, this is what we expect.
Uh so if I now uh I'm going to rebuild B and I'm going to try to do that and that now it's worse than ever. Right. So
uh, so, like, it, it, used, to, at least, you know print something now it's just su Yeah. So now it's proper BM.
Yeah. So now it's proper BM.
Uh, okay. So, uh, let's do GDP.
Um, symbol, [ __ ] Steel symbol. Um
maybe I'm just missing something about um, All right. So, Simbuff.
Simbuff.
So something sus. I feel like I'm missing something.
Uh, right. So sim buff. If you take a look at the sim buff, it's an array.
I remember we had something weird about arrays. I do remember we had something
arrays. I do remember we had something weird about arrays because they have weird semantic in. And we had to kind of hack them so the semantic is not that
weird anymore. You know what I'm saying?
weird anymore. You know what I'm saying?
You know what I'm saying? Right. So, if
I take a look at this sim buff um, yeah. What what what the [ __ ] is
yeah. What what what the [ __ ] is dollar is a current uh address.
Dollar is a current address. So for the simbuff, we're allocating a word, an entire word, but inside of this word
we're storing current address plus the size of the word. Effectively, it stores the value pointing at here.
I think I'm starting to vaguely remember from the times when I was reviewing uh the vectors because I didn't implement the vectors myself, by the way. like a
lot of stuff in this compiler I didn't implement myself like a lot of stuff is contributed by other people and I kind of vaguely remember uh like reviewing this thing in fact um the amount of pull
requests to be compiler doesn't stop like only yeah I like only today at the beginning of the day I had 10 pull requests I closed
one of them I have like I closed two of them now I have nine ju just listen listen I had 10 I spent
some time closed two now I have nine that's the that's the speed at which I'm I can actually like review and and merge them and stuff like that but I really appreciate everyone who submits the pull
request right so the fact that we achieved uh this level of the compiler so quickly is due to the community constantly contributing uh so as far as you know yeah so yeah we
started like a last month ago right so this is the last months. We already made 731 commits in one month, right? So, and
there was a lot of people contributing and sending pull requests and stuff like that. And we reached the point where we
that. And we reached the point where we are capable of compiling other existing B compilers. Think about that, right?
B compilers. Think about that, right?
So, historical B compiler, we can compile it. It doesn't work
compile it. It doesn't work yet, but we can compile it. It generates
some code that vaguely like act like behaves like there the compiler is supposed to behave which is already better than the current trend of vibe coding.
Isn't that basically vibe coding? So you
wanted to achieve something it kind of roughly behaves what you want. Yeah. So
we we managed to do that without AI but maybe somebody actually contributed AI generated code.
But to be able to contribute AI generated code, it has to be over passable for me quality. Honestly
personally, I don't give a [ __ ] if you use AI or not, right? So, I never evaluate your code. When I look at the P request, I never evaluate the code whether it whether it was written by an
AI or was it not written by I literally don't give a [ __ ] I look at the code and look at its quality. Is it
understandable? Is it easy to modify?
Does it solve the problem? And stuff
like that. And if it's acceptable, I'll just merge it. I don't give a [ __ ] if it's AI or not AI. So, and honestly um I think I would notice if somebody
actually generates like AI stuff unless they actually clean it up. Um
the AI tools kind of have this really stupid habit of commenting each individual line or something like that.
Uh, like university students or whatever. But, but I suppose you can
whatever. But, but I suppose you can basically fine-tune them not to do that.
You can basically fine-tune them not to do that. Don't comment code. Well, yeah
do that. Don't comment code. Well, yeah
you can tell them not to comment code but sometimes commenting code is actually kind of important.
I should PR totally human code with a comment on every single line. I will
actually reject it. Uh, but not because it was generated because I would think that it was generated by AI, but because you commented every single [ __ ] line.
That is stupid. It's completely whether it's AI or not AI is [ __ ] irrelevant.
So, Um, I just want the code. No yapping
please. Yeah, I think no yapping prompts are really powerful. They're really
powerful, especially if you know what you're doing, right? So, if you know exactly what you want, uh, and you just want the LLM to like, you know, do the right thing right away. Like, no yapping
prompt is just like top-notch in my opinion. Actually makes LM useful.
opinion. Actually makes LM useful.
Okay. So, uh and here's the thing. Here
is the thing.
So, we do it this way is that when you um supply a vector what you have is a pointer to the
vector, right? So, when you define a
vector, right? So, when you define a variable and you reference a variable you reference a variable, you get its
value. But then if you try to supply the
value. But then if you try to supply the value of a vector you gets it you get its address.
This is rather interesting thing. This
is rather interesting. So this is like the difference in semantic.
All right., So, let, me, demonstrate, min.
[Music] So if I do print f and I just want to print a value and that value is in fact
going to be a global value 69. So can I do p uh okay can I uh stash all of the changes
please?
Yeah.
Um, print f is not known. Extern uh, print fx.
All right. Yeah. 69.
69. As soon as I put brackets in here I get the pointer.
You see what's going on here?
So, like a very huge semantical difference.
This is the value and this is the pointer.
And this is why I believe when you define a vector, the first element the of the vector is its size. So when you
try to like read it, you actually read a pointer that points to the elements after that.
It's an interesting hack. I'm not sure how good of an idea this is, but it's an interesting hack.
But the thing is that pointer is an absolute pointer.
We need to divide it by eight.
We need to divide it by eight.
So, but the thing is uh okay, so we can't divide it at assembler time right? We can't divide it by by at
right? We can't divide it by by at assembly time because we don't even know where exactly that symbol is going to be located in the memory when it loaded by a memory. So it's going to be located
a memory. So it's going to be located somewhere.
Um so one way one way we can try to maybe hack that.
Uh okay so here we are loading an external right. So here we're loading an
external right. So here we're loading an external and the problem here is that that value has to be divided by by 8
right? Um
right? Um uh I already stashed all of these things. Oh my god. Barskis uh unstach
things. Oh my god. Barskis uh unstach please. Uh please unstach. Okay, so we
please. Uh please unstach. Okay, so we unstached it.
All right,, so, where, is, the, external the tragedy of the situation chat? Uh
the tragedy of the situation is that we have to like we loaded this entire thing but now we have to divide this value. So
shift right uh shift right but sometimes not right. So we have to
divide it but only when it's a vector.
So we have So if it's an external value uh we have to know whether it's a vector or not a vector. If it's a vector we have to do
vector. If it's a vector we have to do this shift right. Uh and if it's not a vector shift right
man I I should have actually I should have actually thought about uh be compatible pointer arithmetic right away because now I think this technical depth
is biting me back. Uh right. So the the more you kind of ignore this semantical problem the more it grows honestly right so because the language continue evolve
in that direction and now you have to stir in a completely different direction.
[Music] So I don't freaking know. So the thing is the thing is global variables are externals aren't they?
Global variables are externals.
strct global. Oh, there we go. We have a isve and we also have a name.
We have is vec and we have a name. Do we
have immediate like what is immediate value? It's a literal or whatever.
value? It's a literal or whatever.
So what I'm thinking in a strct compiler we have all of the globals, right? We
have all of the globals.
Can we all right so when we compiling this thing if we had an array of globals we could have looked up
whether that name is in the array of globals if it is in the array of globals and that global is a vector add this addiction thing
well I And that's like that's one way to do that. Globals const global.
do that. Globals const global.
That's one way to do that. Would be kind of nice to have some sort of a function unsafe.
Uh find global by name. Um can vectors only be globals though? Yes. Well, I
mean there's also uh auto vectors, but I don't think anybody uses auto vectors in uh in B. Maybe they do.
Uh, let me see. Let's go through autumn.
Mhm.
I already looked through half of the code base. Haven't seen a single auto
code base. Haven't seen a single auto vector. Yeah, they don't use it. Okay.
vector. Yeah, they don't use it. Okay.
So, uh, let's actually have a global in here. Okay. Okay. So, I'm also going to
here. Okay. Okay. So, I'm also going to have a name conch and let's return option uh conchar
like so.
Uh so let's iterate uh globals then global uh let's dreference the whole thing.
Let's d reference it. So this is going to be I strmp global name name equal to zero right. So global name equal to
zero right. So global name equal to zero. And if that um so probably
zero. And if that um so probably actually want to have it as a pointer.
So let's keep it like this. Do we let's actually not do it as a pointer. And by
the way this is not a name. This has to be just a global right. So
something like that. And we just return some global some global and in here none. So the problem here is that we'll
none. So the problem here is that we'll have to sort of forward this globals thing throughout the entirety of the call stack. But I mean we can just
call stack. But I mean we can just follow the compilation errors. So find
the global by name. So here we find the globals and then we supply the name
right. And if let some global
right. And if let some global right and if global
is ve is ve we can do some stuff like that roughly.
Okay. So let's go through the compilation error. Slo arg. Mhm.
compilation error. Slo arg. Mhm.
Man, this sucks.
Can I uh do something like this? I want
to put this as a first argument.
Right. And now that should make it super easy for me to query replace that with globals that
yeah boom for all of them. So just like makes it easy to replace. Obviously this
one is false positive, but that's totally fine. All right. So here call
totally fine. All right. So here call arg the same thing globals. Um so const global like so.
So call arg load arg.
So generate function. Okay. So I'm going to put globals. So we're starting to pass a lot of stuff into this function.
Maybe we should start passing the entirety of the compiler there. Right.
So because all of these things are kind of interled and uh they interact with each other.
All right., So, call, arg, this, is, globals like so. What else do we have in here?
like so. What else do we have in here?
Globals. Global
generate function. Another globals
const global. Yeah. You see like this is what I was talking about. We need to forward all of these things to the coal stack and it's a little bit annoying.
Uh, okay. Now, I append, not append, but slice. So, I'm pretty sure there was
slice. So, I'm pretty sure there was something that I could Yeah. Yeah. Here
it is. I can just use this thing.
Let me just use this thing. Let's go.
Compiles. Uh, uh, it's unclear. So, what
I Yeah. Okay. So this is full. This is
not what I wanted to compile. Let's
compile B.
Okay. Character. Yeah. So uh B utils and historical and compiles.
All right. So now as we're loading vectors, we're actually taking the address of that vector and we divide it
by eight. So it acts like a B point.
by eight. So it acts like a B point.
Hopefully.
You know what I'm worried about? I'm
worried about things. The fact that some of that stuff is not aligned. Uh let's
give it a try. So, uh I already recompiled B compiler, right? So, but
I'm going to do that again just in case.
And I'm going to try to do the following thing. Main like this. Oh
thing. Main like this. Oh
it works.
Absolute.
So, yeah. Uh
it's kind of interesting. So, I'm still not sure what to do with the with the semantic of the pointers right? I feel like this is kind of a a
right? I feel like this is kind of a a demonstration that we should move closer to the semantic, the original semantic of B, right? So, it's a little bit difficult to do that because we already
committed to weird semantic, right? So
now we need to pivot the whole compiler to something else. I'll see how to do that, right? I'll see how to do that. Uh
that, right? I'll see how to do that. Uh
missing one in the output. Uh
missing one in the output.
What are you talking about? So if I do BC ah that's true that is absolutely true missing plus one.
Mhm.
So yeah. So it's not fully like that but I uh I think I I suspect why because as I already said
um this stuff heavily relies on um everything being aligned properly. For
example, here is the code right? So here
is your code and the sizes of the instructions may be different right they may be completely unaligned. So, and
then you start uh data, right? So, data
section which also can have arbitrary stuff. It's also unaligned. Um, right.
stuff. It's also unaligned. Um, right.
So, I'm I'm not sure. Do we have data?
Where is the data? We probably don't even have data in here. But I mean, the code here can be analigned. So, we have to we have to align it. So, I remember
that gas had an alignment uh directive or alignment instruction where you can say something like um why can't you modify that, bro? uh something like dot
align right dot align and you can specify like maybe amount of bytes or something uh I don't remember does fasm have something like that uh so let me
find so here's the documentation for fasm and uh alignment oh there's literally an instructional line
align 16 So the align director fields the byes that yeah had to be skipped to perform the alignment with the nope instruction and at the same time marks the area as
initialized, data., So, yeah, yeah yeah, so, I
initialized, data., So, yeah, yeah yeah, so, I suppose you can do it like that. Um all
right so it also has it also exists in fos which is which is nice.
So, so here's the functions. Here's the
aam functions. Here's externals. They
don't add any data in here. And suppose
in here I want append. So, what does it uh what does it do? Uh align allows coder data to [ __ ] decode the data to the specified
boundary., It, should, be, follow followed
boundary., It, should, be, follow followed by a numerical expression specifying the number of bytes. Okay. So, that's that's very important. So output C uh align
very important. So output C uh align eight bytes. Let's align it by eight
eight bytes. Let's align it by eight bytes. Then we have a data section
bytes. Then we have a data section right? So this is the data section. And
right? So this is the data section. And
um uh [ __ ] after the data section, let's align it one more time because it might not be
divisible by eight. Uh globals the each global is the size of the word. So it
doesn't require any alignment. But I
mean we don't generate anything after it anyway. So, uh, that's totally fine. So
anyway. So, uh, that's totally fine. So
maybe that's the problem, but I don't know if that's not the problem. I have
no, idea., Okay,, so, that, at least compiles. And if I take a look at this
compiles. And if I take a look at this kind of thing, does it have a line? Oh
my mode even highlights that. So, that's
cool. So, that's pretty cool.
All right. So, I didn't see anything super fishy in here.
Everything Gucci atamaguchi. All right.
So, can I now try to do that?
Nah, it didn't fix it.
Uh, it didn't really fix it.
It's kind of hard to tell. What's up
with that?
So can we maybe um plus one is the return? Maybe it is in fact a return.
Um can we just have something more complicated? What if I do extern f? So
complicated? What if I do extern f? So
this is the official compiler, right? So
hello world n. Uh all right. So can it compile something like Okay. So that's
not bad. That's not bad. And now B.
Mhm. Oh
okay. So, kind of kind of lots of things are missing.
So, let's do BC. So, this is a hello world B.
Uhhuh.
So, So X. So even X is kind of missing.
So X. So even X is kind of missing.
Some of the stuff is missing. So it's
not I don't even know what could be the problem in here.
And given that this is a language like a very sophisticated language um it's kind of difficult to tell.
H.
This one is rather interesting.
It uses actively double characters.
Uses actively double characters. Why the
[ __ ] does it use double characters?
What's up with double characters? I
remember that um in KB man there was like a diagnostic messages.
Uh yeah, diagnostics.
Um so these are like errors. Diagnostic
consists of two letters an optional name and a source line. Due to free format of the source the number might be high. The
following is the list of the diagnostic right. So
right. So it's very interesting.
So and it literally uses double characters.
Uh so beginning of the line error code.
So this is a double character.
So you can write a double character.
Okay. So, let's do the following thing.
When I write something uh, print f.
Um, maybe even x maybe even 02 x. So, instead of like printing the
x. So, instead of like printing the source code, let's actually print the characters that we're supposed to print.
Um, yeah, I forgot. Okay, thank you so much.
I forgot that we're in historical mode.
Yeah, we're in historical mode. Um
unpacked. It's unclear.
Okay. So, yeah, we're using data offset but we're using data offset in the context where we passing it into
uh into a C code anyway. So, we don't really need to modify uh any pointers or anything like that because we don't deal with the pointers of the data set. Okay.
So, I'm going to remove this specifically for this thing to work. Um
okay. So, now if I try to do B This [ __ ] is trying to print double character right
it's it's literally puts a double character into a right somewhere I don't really know where but
it so it can happen so uh what I can tell you is that our right utility is probably incorrect right our right utility is probably incorrect. One of the things we probably
incorrect. One of the things we probably want to do is just like while x is not zero. Um let's do I don't know 255.
zero. Um let's do I don't know 255.
Right? So we are masking out the first bite and then we're shifting it by a single bite and we're assigning this thing back. So this is probably what we
thing back. So this is probably what we want to do in here. So we're going to be unpacking all of these bytes.
Uh right. So let's try to rebuild this entire thing and or yeah okay and if I try to do that
is it in a different What the?
Um, yeah, here it is.
Where do we pack those things?
So, string storage. Yeah. Okay. So if we encounter a character right we like multi multi- character
thing we packing it like this right this is the packing if I now revert this entire stuff
okay [Laughter] Chat, chat, chat. We figure it out.
We'll figure it out. Absolute bading.
Indianness. Indian. My ass. Uh, my ass.
Excuse me. Absolute bading. So, let's
actually see. Uh what's the hello world?
Extern uh extern print f and then print f uh hello world and uh boom. So that's how it's supposed to
uh boom. So that's how it's supposed to look like. Can I save this entire thing
look like. Can I save this entire thing just in case?
Uhhuh. I'm going to just copy paste.
So that's the could be kind of cool to maybe compile and run it on PDP virtual machine, but I don't want to spend time doing that.
Uh all right. So what about hello world uh for our thing? I don't know if it compiled thing correctly. Uh but let's actually create a separate uh thing in
here. Is that the same? I vaguely
here. Is that the same? I vaguely
remember that you could do debuffers.
Yeah. Yeah. So TMP and TMP2 and there's no difference in them.
Chat listen we took a historical B compiler
compiled it with our B compiler and compiled a hello world in B with that historical compiler compile it with our compiler and it produced the correct
[ __ ] result.
[Music] [ __ ] you. This is what I'm talking about. This is the ultimate recreational
about. This is the ultimate recreational programming experience. This is the
programming experience. This is the ultimate s moment.
The goal, the mission of this project is basically achieved. It is basically
basically achieved. It is basically achieved. Right? So after that, we go
achieved. Right? So after that, we go beyond just a B compiler. Right? So this
is the B compiler. This is the B compiler. We created it. We went through
compiler. We created it. We went through the same path as the Ken Thompson and Dennis Richie event went. It's kind of there is something interesting about
reimplementing things like that, right?
So you implement B compiler and you literally experience the same thing that those legendary people experience. So
it's kind of akin to pilgrimage. You
know how religious people like go to uh like a holy places and stuff like that right? It's kind of akin to that. You
right? It's kind of akin to that. You
going through the same sort of experience as those legends when and you kind of know what they experienced.
Uh I had a similar a similar thing when I implemented a self-hosted uh port, right? When I made
both self-hosted, I literally rediscovered Ken Thompson's hack. Like I
didn't read about Ken Thompson hack. I
didn't try to understand. No, no, no no. I just did the same thing he was
no. I just did the same thing he was doing and I just rediscovered it myself.
I realized that I mentally went to the same place where he was when he discovered that as well.
What's interesting is that furthermore I experienced a similar thing when I was implementing Tula my touring language right so let me actually show you Tula
so it's a basically a touring um touring machine interpreter but with this touring machine interpreter I also managed to implement a touring machine
interpreter in touring machine right aka uh universal touring machine and I just realized that they experienced the same
thing the Turing did when he realized that you can have a Turing machine that interprets other Turing machines right? So, and I think this is why it is
right? So, and I think this is why it is important to just like reimplement and re understand all of these kind of things because uh you realize that all
of these legendary peoples maybe they are geniuses yes but they discovered all of these things under certain circumstances. you go their path, right?
circumstances. you go their path, right?
So, it's sort of like a mental intellectual intellectual pilgrimage.
You know what I'm talking about? And
it's kind of cool. It is kind of freaking cool to be in the same to go through the same path as Ken Thompson Dennis Richie, Turing, and so on and so
forth.
So, and I really like that. I really
like this moment when I Oh, so that's how they came to this kind of conclusion right?
It's so freaking cool. I absolutely love it. So, um since um this is already B
it. So, um since um this is already B compiler, I need to start thinking how we should go beyond B compiler. What
could be a beyond B compiler? So as I already said I see an incent value in B compiler sort of like um ideatically right so I
think by itself the fact that it doesn't have any types and every type is a word is limiting but it also enables your
creativity right so and it does so in a similar way assembler does right because in assembler Everything is kind of a word too, right? So because you have
registers of the size of the words and so on and so forth, you kind of operate on a level of words. But at the same time, B doesn't force you to go through
the same [ __ ] as uh assemblies do right? So because in assembl you have a
right? So because in assembl you have a very uh very much uh platform dependent code, right? And you also don't have a
code, right? And you also don't have a very convenient control flow instructions and something like that. So
what I like about B is that it uh simplifies programming and assembler while maintaining the assembler spirit.
Right? So that's what I like about B. B
is something in between assembler and C.
And I want to continue extending this language but preserve this spirit. So
what I want to do I probably want to write an article uh describing my vision about this language that is assembler like but for recreational programming
and we will see like how we're going to go about that right so yeah that was pretty cool that was pretty cool so it would be interesting to compile more complicated programs uh
with this B compiler and like discover even more bugs and stuff like that but I think uh I already streamed it for 2 hours right so regards don't really want
to do that uh okay so I I also need to think how I want to go about the semantics of the pointer right so I think I need to revert the semantic of
the pointers back to be you know addressing by words uh right but this requires a very careful planning because um it requires changing all of the
platforms and we already have five platforms and some of them are difficult to change so It's like requires a little bit of logistics, right? So, we'll see how we can go about that. All right. So
I guess that's it for today, right?
Thanks everyone uh who's watching right now. Really appreciate it. Have a good
now. Really appreciate it. Have a good one and I see you all on the next recreation recommen.
I love you.
Loading video analysis...