LongCut logo

Pablo Salgado - The soul of the beast

By EuroPython Conference

Summary

## Key takeaways - **Python's Grammar is LL(1): A Double-Edged Sword**: Python's grammar is classified as LL(1), meaning parsing decisions are made by looking at only one token ahead. This simplifies parser implementation and execution speed but introduces complexities for certain language features. [06:11] - **The 'Soul of the Beast': Python's Official Grammar**: The Python grammar, defined in a specific file, uses a meta-grammar similar to regular expressions to describe the language's structure. This formal definition dictates what constitutes valid Python code. [02:31] - **LL(1) Limitations: Ambiguity and 'Hacks'**: The LL(1) nature of Python's grammar leads to ambiguities, forcing workarounds like special handling for keyword arguments and making features like parenthesized 'with' statements difficult or impossible to implement directly. [19:39] - **Adding New Features: A Deep Dive into CPython's Pipeline**: Implementing a new operator like '->' in CPython involves modifying the grammar, tokenizer, Abstract Syntax Tree generator, compiler, and bytecode, demonstrating the interconnectedness of the language's components. [35:46] - **Concrete Syntax Trees: More Than Just an Intermediate Step**: The Concrete Syntax Tree (CST) generated by the parser retains information about comments and formatting, making it valuable for tools like code formatters that need to reconstruct the original source code accurately. [31:30]

Topics Covered

  • Python's Grammar Is Wider Than You Think.
  • LL(1) Grammar: Python's Hidden Design Trade-off.
  • Parser 'Cheats' to Handle Python's Grammar Ambiguities.
  • Why Parenthesized 'With' Statements Cannot Exist.
  • CPython's Parser Shapes Python's Core Language Design.

Full Transcript

also so thank you very much for

attending so in this talk the year is

that we're going to have like a deep

dive on what makes python python right

because we have several pieces that

normally people identify as pi right we

have the interpreter but as you know

there is multiple version of them right

we have C Python PI by established

Python item Python so it seems like the

interpreter usually it's basically

implementing something else right

usually languages have what is called a

spec which is like you know like for

example C++ has the C++ standard which

is like books you can kill someone

hitting with them but like the Python is

a very interesting case because see

python is not only like one of the

implementation of the interpreter but

it's also the reference right so the

idea of this is like okay so what what

is Python really like when people think

about python like python the language

like the text as you write and then

executed by something what is this a

glitter suggest some more interaction

first so Jim Pablo I'm a Python core

developer I work at Bloomberg in the

Python infrastructure team before that I

used to black hole physics see someone

is interested in talking about that and

very happy always I'm one of the authors

and the implementer of Delphi 70 and I

must spend my free time just fixing bugs

in the C Python tissue okay so let's

start with a cool question who thinks

this is valid Python code raise your

hand you think it's body raise your

honey do you think is invalid

this doesn't add up all the people so

this is actually valid Python code if

you see like this X is a is a type

annotation property just property this T

that seems like a C++ template is

actually saying properties of temporary

is less than T bigger than X that's an

unpacking

that's an ellipsis and that's object and

in the circus this is actually the ast

of that expression not only that like

actually I just so it's nice like the

parser parses the scene but actually you

type this thing and call a spam it

actually returns because they are

notation

not executed so you can you cannot call

it no only value paisa call that is

actually you can try no it's not that

with no problem

so the you know this is that what people

normally think about Python is really a

small subset of what Python is or at

least what is described in the Python

grammar of the official Python grammar

and idea is that we are going to store

exactly how like which is the bigger

scope or which is the bigger difference

with what people think python normally

is and for that we need to introduce

some some special topic regarding

grammars and regular grammars so let's

just talk there so the Python grammar is

describing the Python source code in a

file called grammar is in the grammar

folder but for that we need to introduce

first how these files describe so in

this file you will find things like this

you will find rules so a rule basically

the sky like a name a colon and then

I'll rule description we are going to

see what it is then you can have many

alternatives for the rule right like you

can have the same rule name and two

different descriptions this description

usually they are called productions

because basically the grammar is written

in a way that the what is history

forward given these form is to generate

actually program not to recognize them

but usually when you have two

alternatives for a rule people put it in

this way which is like this rule name

has this alternative or this alternative

but it's basically the same form then

you will find this kind of constructions

which means that this rule name is one

letter a or more it's very similar to

regular expressions as you can imagine

the asterisk is basically zero eight or

more and you can find also like square

brackets with mean like this chunk here

is optional right so this is the

complete meta grammar so the meta

grammar is the grammar that talks about

the grammar right so is this the result

we found the put here right so the the

top bottom sorry the top rule is

basically the one that you will start

and

it will record so this is just for

reading the grammar file

so using this this expression we can

actually see the actual Python grammar

which looks like this right so for

example we recognize file input so this

is like the one of the star rules so it

says like okay so a file written in

Python it will be a new line or zero or

more statements and an end marker and

you say ok what is an statement okay so

a statement is a simple statement or a

company statement and then you say ok

well it's a simple statement so and then

you go that right you enough you will

start seeing things that you can

actually recognize like for example

compound next statement which is a the

bottom one you will find like if a

statement or while a statement or for

statement and then you come probably

mine what those are

okay so let's keep looking in this file

so for example you see if a statement

then you will find that any first

statement is the expression like the

string if and the difference we will see

what this means but the afem between the

strings that the ones that are in yellow

and the ones that i'm white is that one

of them cannot expand anymore right like

the string if is literally the string if

is not like another rule but you can see

like how do more or less recognize the

structure here right so and if it's like

if then something that probably like a

condition then : sweet is basically a

block and then a leaf and the same

structure right so it's very simple but

the idea is that one of the things that

we need to understand is how the partial

actually can recognize this thing and

see if a program is valid or not okay so

more more definitions so as I said the

ones that are in yellow are called

terminals because when the when when you

are like expanding the rule that thing

cannot be expanded anymore right just

like a string while the ones that are in

white they are called non terminals and

this is basically referring to another

rule that is in the in the in file and

this is going to be very important in

the following slides okay so let's talk

about la1 grammar so it turns out that

python has a sort of weird feature which

is the grammar is in a category that is

called ll1

so let's see what it is this what is bit

there are very few languages that have

this condition and these

in addition you will see how it actually

percolates over the language one thing

that we are going to understand is how

like something that technically is

something technical right because like

the fact that this other one is not it's

not very important or it seems like it's

an implementation detail but you will

see how this actually percolates over

the language that has very deep impact

so what is another one grammar so

technically the definition is a bit

boring another one grammar is a grammar

that can be parsed by little one person

right it seems like you know one in

University someone says yeah one grammar

can be passed to one pass I said thanks

I understand huge thing now but let's

see what the what another one partially

right so the idea is that you will pass

the grammar rules sorry that the tokens

that you receive left-to-right does the

first sell you do leftmost derivation so

you stand spanning the rules from the

left now the second L and then this is

the key point here is that when you are

parsing you can only look for the next

token which means for example if you are

parsing an if condition you can only

look at the next token in the stream so

if you look at the if and then you need

to decide what rule follows you can only

look the next token not more and this is

going to be super super important it was

not that Python has two hidden extra

rules it used to be in an all check out

of the source code there was a comment

explaining the same but this has

actually been lost promise actually even

before it ported to mercurial so the the

extra rules are these two so it turns

out that Python that have M pretty empty

productions usually these are called

epsilon productions an empty production

is basically a rule that has as a

possibility the empty string so it's

like you have a rule and the empty

string satisfy that rule it turns out

that that is makes everything much more

complicated but Python doesn't have them

so if you are basically you know how

this more or less works when you parse

and when you create another one parses

you need to create this thing called the

follow sets on the first sets but it

turns out that you don't have any

productions you don't need the follow

sets we are going to spring what this

heart so don't worry and it also there

is another rule which is not super

important but we will see how it

manifests which is at some levels of the

grammar only the last alternative can

have a non-terminal so let's see one

example so this is

more of the grammar and if you see here

I want you to focus on this flowy

statement here and you will see that the

flow is a break a continue I returned a

trace or a deal and these are here but

you see like the break here it starts

with a terminal which is the word break

continue we start with a terminal which

is continue return the same but only

Jesus statement is the one that actually

which is here start with something that

expands again this is what I mean right

only the last option can be expanded and

this simplifies a lot also a partial but

it turns out that this isn't one another

for so it's just a simplification for

for some levels but it makes it very

fast ok so let's see let's see how what

other pieces do we need so the first set

because in this thing right like when

you are passing this thing you can only

look at the next talk into the side

where rules you need to use and if you

think about that it will be very

important for you to know given a rule

like imagine if a statement what is the

what are the tokens like the strings

they're the terminals that this rule can

start for because and this is important

because as you can only lose one token

you can only use that one and if you see

that the rule that you are trying to

pass doesn't include this token do you

know that is invalid because you can

only look at one token and this rule

doesn't start with that token so a

symbolic so the first set Sal basically

all the words are rule collector would

write so we see for example that simple

statement can I start with a lot of

tokens like assured the asterisk a

lambda we see that for example bring the

stimuli canister with print but make

sense returns to Macalester with return

but other ones like Jill are women can

start with a lot of things right and

this is going to be key in in a few

slides

ok so let's see how actually cpython

works so it also that the pi the parser

of suppression is actually written by

hand but what we have is a parser

generator so we have a generator that

automatically receives the grammar and

generates a parser for you and this

makes everything much much easier but

let's see how it works it's actually

relatively simple so the via the like

global idea is that the we're

to have the grammar does the Avena

fly-ass terminal back was not formed so

is this text file with the roots we are

going to construct something called

non-deterministic finite automata which

sounds very scary well it's actually

pretty simple

and then we are going to simplify those

in what is called a deterministic finite

automata

so let's see what this heart so

basically a finite automata is the final

a set of the states it's basically like

it's actually a flowchart right like you

know would you follow arrows and enjoy

any different states right and idea that

you have a set of states which is places

where you can be you have transition

functions which tells you you can move

to an estate or another state and then

obviously you have a start state and you

can have one or more final states like

let's say an example right so if they

look like this so the you start with the

a let's say that is a start and then you

have transition functions which is this

lowercase B lowercase C these are these

are basically words in the in the Python

programs that you write so lowercase B a

lowercase C will be things like if for a

while and depending if you have if or a

while in this case a V or a C you will

move to state B or state C right it's

pretty easy and then you can have a

final state so in this case if this kind

of automata is going to recognize a

language if I am in the the state then

the program is valid if I am for example

in C and the next token is I don't know

H it the program is invalid because I

can only move to the final station and

hollaby right so that's that's basically

what the parser will do if the parcel

can reach the final state then your

program is good and it will continue if

the parser sees that in is D and you

don't have that it will raise us integer

it's as easy as that right but it turns

out that we have to kind of automata

right we have the non-deterministic ones

and we have the deterministic one so

which is this non determinism so it

turns out that there is two kind of

non-deterministic so here one of them is

that in mind that you have one of these

states right one of these upper case a

and it turns out that you have a

transition with the same letter right

this is one of the source of non

determination because which one do you

choose right you can actually choose

both you don't know the other is kind of

non determinism is if you have this

scenario so this epsilon which is the

empty string again

is a transition that you can choose even

if you don't have a token so in this

case if I mean a even if I don't have

any token or even if I have one I can

choose not to pick it I can move to see

for free

that's another non determination right

because like the state can be a or

actually I can be C if you want so you

don't know cool so yeah these these

things are actually pretty easy to

construct and they're pretty easy to

reason about

but when you try to run them if they

come they can actually bill it's going

to implement our runner for this because

they cannot they cannot end or you can

attend in cycles so idea is to simplify

these into something that is more easy

to parse so for example this is an

example right so you will have like this

tiny you have some excellent transition

if you look at this see you can transit

to see itself by a but if you choose a

you can also to be so this is like a

simple one because only has three nodes

but has like out of non-determinism and

they we have to construct a

deterministic one is actually pretty

simple

because if you look at the two

possibilities of the of this is the of

the non-deterministic ones and you can

say okay so if I can transit to both to

two nulls technically is like if this if

I consider this automata Technica is

like if is in C and B at the same time

right because like do you mind that you

are in a you can actually imagine like a

parallel universe in which you go to C

and a parallel universe into B so you

can consider all the multiverse if you

want so it really is like you having a

new node called BC right like a set of

states and now you have only one

transition which is a to be see if you

look up epsilon transitions because

epsilon is free is like if this thing is

in a and C at the same time right one

friend of mine says or it's like one no

no it's not like that right so that this

thing is like if you have a C right and

then you have whatever transition that

was and one procedure to be right - to

be using the toka name so that is how

applying these two rules is very simple

right like when you have one of these

you collapse them

in the previous one and when you have

the other two collapsing the second one

and ideas I supply this thing

iteratively right so the price that you

do that for this one

you win with this one which doesn't have

non determinism but it's much more

complicated like that's the price that

you will pay because it has more nodes

you have like empty node sometimes and

and it has a polar States right like all

the powerset combination it can have up

to two to the power of M new nodes so

this is basically summarizing the

difference right non-deterministic

automata the idea that it can have M

piston piston transactions a

deterministic and the terminus Taquan

cannot have them so it's easier to

reason about non-deterministic one is

simple as we saw in less notes the misty

one has more notes when technical thing

is that in a non-deterministic one is

very difficult to implement backtracking

and you want that sometimes for your

parcels terminated one is trivial and

the key point here is that the

non-deterministic one and super simple

to construct from the grammar like you

just like read the notes are your

parsing while the deterministic one you

need this process that we saw previously

okay so lots of terrible let's see how

it works so for example this is actually

see these structures in the actual

Python grammar so it is one of the rules

that you have in Python right this is

the rule for factors include like

addition subtraction not and then you

have more rules so this is the

non-deterministic automata as you see

here as you can see if you start from

the top this is state zero here doesn't

have any tokens you will transition

right like these arrows doesn't have a

talking like this one right for sample

is safe for you will go to a state aid

if you find a minus sign but from a

state zero to state one is the insulin

transition right so they are here right

and if you are in state zero you don't

have any clue to know if you have to go

to the right of to the left so you know

if you apply the process that I saw you

doing with this other one right and now

you have like interesting things like

you don't have epsilon transitions

anymore like in a state zero you will

transition to state 1 if you find one of

these three things so you can always

decide with arrow to choose but the

process leaves some nose I'll write for

example this is Stage five is there but

like is is not correct so there is an

extra process that the parser does is

grabbing one of these which is already

deterministic and is simplified

I see

device to these things so basically is

like removing nodes that are not used

and also maybe collapse some of them so

for example these state two and stage

three that are on top of factor they are

also eliminated because you cannot start

from them but this is the process you

can start one of these you'd make it a

terminus tick and then you simplify so

let's see some other examples that we

have here so for example this is for the

competition switch when comparing two

things so this is like a

non-deterministic one at the misty one

and as you can see the final one is

actually very simple but this is also

very simple rule for example this is

like the the actual sign that you can

see the idea of this is that you don't

need to like read them and understand

them part ideas that I want to give you

a by for how the parser is simplifying

these structures and how like you can

just come the ROC do right it's no good

measurement but idea is that you can see

more or less how it works right like I

state 0 will always transition depending

on the sign so for example you can

actually see in this in this graph what

how the grammar works even if you don't

need to read it for example this is for

the decorated rule and then you can see

in the final product that you can only

decorate three things right in the

Python grammar enhancing function of

class definition of function if you want

to now grade your own Python language

and decorate something else

you just put something here and it will

just work we will see an example is at

this time so for example this is for the

multiplication and division and

everything that has the same precedence

of multiplication this is for the print

statement

so if condition so you will see like how

how I simplify I want to show you one of

them which is my favorite so this one

this one is the one for is the most

complicated one in the room and this one

is for parsing functional definitions

like the arguments in the function

definition right like the for argument

keyword arguments it looks like this

but I worry because this is not the

non-deterministic one right like the

deterministic one lie after all this

complicated process so much simple well

oops this actually includes like

everything right like so okay so it

looks very complicated but with the nice

so you can see like actually things that

you will recognize like commas this is

for like the new type comments should

you recognize like the two stars so that

you have this kind of attractors which

is like knows that have a lot of arrows

inside and the reason is it looks like

so complicated is not because the rule

itself is that complicated because it is

really not it's actually a bunch of

rules but it's something that we're

going to see now the reason this looks

like is so complicated is that the fact

that the rammer is l1 makes the parser

super fast like very very fast and super

simple to implement like at least the

parser generator but writing things that

you may think is very simple it turns

out to be extremely complicated the fact

that we can write this rule is

technically you're cheating because this

rule is written in a way that is called

complete let's have punching because if

you write the rules that corresponds to

this you will read something that we

will see afterwards which is an

ambiguity like you cannot write in that

way so you need to literally expand them

and and the rule is to go like almost

talking by talking which is something

that usually doesn't do in recursive

descent parsers we'll see a lot of

examples of these and this makes some

other things so much complicated you

will see something that a lot of people

have requested of the Python grammar

even for 3 9 or 3/8 but it's actually

impossible with the current person and

like because language you use black this

is a feature that black needs to some

kind of formatting and it turns out that

is impossible to implement we'll see

which one okay so let's talk about this

right let's talk about other one

limitation so it turns out that yellow

one is sort of good right like if you

are like a very fast parser and it's

very easy to implement sort of and a lot

of people argue that the grammar being

other one makes the grammar more simple

to read right like humans are also

parsers and when you are reading the

grammar if you don't need to it's also

context-free it's a property for one

grammar so you don't need to remember

a bunch of other things that you have

seen before like in C++ right or in C

like see if you read a variable or a

type you kind of define if you don't you

cannot distinguish them you don't know

what you have read before but here you

just need the next token but it turns

out that yeah sure it makes like the

grammar sort of simple and that's what

Python people recognize has Python right

like python is simple to read and simple

to reason about and probably that's very

tight to further it start to be another

one but now it's just driving like is

very difficult to implement something so

let's see the limitations that we have

with this so let's imagine I have this

rule right I will have a rule and we

have two possible productions so the

rule will start with the word dooms like

a do-while loop and so we start with the

word doom it will go into a rule called

a so we don't know what they eat is

doing and then the world while and then

another expression and we have another

production that has the same terminals

but instead of a it has B rights to

tuition project doesn't matter and from

a and B we have the first sets here so

they so if you go to rule a a can start

with a lowercase a or B for the rule B

we can start with the lowercase a or C

right

so you mean now we have this program so

someone rise do be RLC while see

something right so we need to know which

one of the two productions this thing

corresponds to but it's very simple

right because you go and say okay do-do

okay both the start will do this that's

nice and then okay it's a or b so we see

lower case B so if we go to the first

set we see that lower case B is on only

on the upper case a so we know is the

left one right make sense because this

lower case B only appears in the first

sets of the first one so we know is the

left production so that's nice this

program is valid and is the left but

what happens if someone writes that it

turns out that the lower case a is

actually on both first set so you

actually don't know which one of the two

productions to use this is the ambiguity

this is this this theme that seems like

very stupid it's actually the program

because you don't know which one to

choose and you actually need to reject

this rule or the parse advice right like

the suppression part is most of the time

the parser generator most of the time

will tell you that it cannot produce the

parser but sometimes it actually sport

is it really serious so let's see some

cases when this happens actually I

so for example this is when you call

functions this is a rule for calling

functions right so you see that when you

are writing an argument the argument can

be a test and this is everything that

they've always to an object more or less

you can write maybe a comprehension you

can try the default argument which is

like some name so some variable name

like I don't know like reject equals

true or something all right

and taste is something that will twist

to an object you can do I'm packing with

dictionaries I'm unpacking with this

well it turns out that this is actually

not the rule this is false because if

you write this thing is invalid right

because imagine that you are parsing

this thing if you are here and you say

okay I can only look at the next token

let me see the first set of tests so the

first set of tests includes name but the

second rule also start with name

so using name which one of the two you

choose you cannot distinguish so this is

some bills the actual rule because you

will say okay okay

this is ambiguous but I can't write

functions right so I can call them so

which is the actual group so the actor

is this one which is very weird because

the actual rule it says that you can put

instead of string saying like I don't

know process equals true or something

like that you can literally put

arbitrary Python objects here this rule

allows things like list comprehension

equal this comprehension or deal equals

e or something like that but you cannot

write that right so how does it work so

it turns out that the grammar allows it

like the parser will parse it super

happily yep but it turns out when we are

constructing much later the aftern

syntax tree we look that if that test is

actually just a name and if something

else will reject it we are like sort of

cheating right because if someone goes

like I don't know if you send to outer

space like the grammar file and aliens

find this thing and they implement their

own Python they will allow some weird

like it's like what is this Riley

and it turns out actually you will see

how the Python is coupling with back

with the language itself right because

this is the this is the spec and if you

implement this aspect or tangle around

right like this is no poly Python and

actually the HDFC Python is especially a

piece of C Python right by pi will do in

different way and other implementations

can do whatever right but we've solved

this kind of limitations much later

let's see it is actually everywhere like

for example this is your favorite

operator so it turns out that this is

also the same program because this is

like what is called our name is

prescient test it can be an object or

the world roof right which is normally

like a variable while roosters right but

you do the same thing this optional

thing is also like an optional part so

if you want to you have a token let's

say some name and then you want to say

okay it's actually I'm actually in the

optional part or actually in the test is

the same program and as we know before

test includes name in the first set so

this is again an ambiguity right because

if I give you name you don't know if you

are on the optional pile and you are

going to use the evil operator or you

are going to go into the right bar and

you need to use test the rule again is

test test so again you can pull like

list comprehension while route list

comprehension and then someone will

complain much later awesome so what

about this so this is actually the thing

that makes us very sad so in Python 3

you can actually have in context

managers you can't have multiple ones in

the same in the same with expression

right this is something that Python to

do have because if because the price you

have six of them you need to indent like

one one every time in Python 2 but

imagine that you can write them but if

you have so many of them doesn't fit on

the same line how do you how do you do

that right so suddenly you need to write

that write that breaking continuation

line which is very weird because

foremothers doesn't know how to deal

with that because that line actually

allows every trained and Tatian in the

next line because literally is then the

same line for the parser which is always

very odd right so what do you want to

run yeah I pull it there yeah I know

actually Gupta's comparing from the same

time the last time I give this talk and

I remember to pull the comma but it is

it seems that not in all these lines

it's actually different font cool well

what do you want to write right you want

to write this so it the same is the same

as the imports Riley you have import and

then you open up on this you

bunch of them and you close the

parentheses and formatters know that how

to format these things because it's the

limited so you want to write that it's

nice right you cannot you cannot because

you could say okay the rule that the

grammar has is the top one right with an

item at least and then you can have zero

or more commas and another item so this

is the rule and then you say okay so

let's write another rule right right you

can have the same or a parenthesis the

same rule and a closed parenthesis right

but it turns out that the with item the

first sets of the with item include the

open parenthesis and these open

parenthesis is the one that you pull so

you can you can do something very weird

which is bad which is that it turns out

you can you can use have we are property

for parcel that if you write this ruin a

specific way the parser will say

everything looks fine and it will always

choose the first production but and you

can actually write these things it turns

out that the products that that this

allows one very weird thing which is

read writing will open parenthesis deal

closed parenthesis that's like a

generator that emits ending values in

and it turns out that the Python dishw

it has one instance of this rule so when

I was playing with this it fell one test

which is totally unrelated to our

grammar test he was in a single unit

right so short like the sad story of

this is that we cannot implement this

thing with the current person like it's

impossible I have spent hours writing

extremely weird rules for this like I

don't know removing indentation so it

the parser things that you're writing

multiple lines when it's not it does not

work like instead of saying Oh open

parentheses close parentheses I have

tried to say okay optionally open

parentheses and optionally close

parentheses and then I will find out if

I can fix it in the ast you don't have

that you cannot or you cannot easily so

sadly you won't have probably this rule

on three point nine or even ever if we

don't change the person so sorry but

cool so the year of this is that

the take out of these that yes you will

see a lot of people saying like oh yeah

the the Python grammar is great right

because the subtle one is one of the few

grammars of this other one so that's a

extreme nice property we shall be

thankful because that's the berry it's

it's up the berry core and it makes a

very simple thing which is sort of true

right don't get me wrong like I think

everybody sort of agree with that but

the problem is that it has two side

effects which are non desirable one of

them is that what it should be a

technical thing right it just is another

one parser but like l ll k parses can

pass also other one or almost impossible

one so you will choose a different

parameter definition maybe but the price

with this would this would be a

technical decision it has like

particularly it perkily suffer over

everything and it has impacts as this

one right like this is like the end of

the of the pipeline right something that

is a very technical thing it has the

consequence that you cannot allow this

rule which is actually the title right

so it's interesting because it makes you

like reflect on what this angle is

python right and what is written and all

these alien thing is so so let's

continue with idea but but this is this

is one of the core topics of this talk

that i want you to think about

especially when you when you see new

grammar rules opinion language and when

you do see what people call like

pythonic python or clear python right

like as you can do so in the first rule

like what python really is is actually

not what you think it is but leave what

he thinks is that it allows choking

let's continue so what the persisting so

we saw how the parser is described how

the grammar is describes what is the end

result of the parser so the end result

the parties have person three also

called a concrete syntax tree so for

example you have X plus y so the partial

Jenner is this theme these are numbers

which usually are linked to basically

defines in the see header so you

substitute by the names you will find

this so this is basically a tree read

like written in with with this but it is

that bad expression will go rule by

rules we will start with a valley input

and it's test list then you will go like

down down down the grammar until it

finds one of the

that actually can recognize particularly

term there and it will start going side

until you find the X and then it will

backtrack of it like basically it will

go a stack up I will find the plus sign

and then again and basically the factor

of this this intermediate same because

this is still very concrete of your

parcel right if I write different rules

this tree will be different and idea

that is an extra step that will convert

this thing though we are not going to

talk about here but will convert this

thing to a to an ast which is the

abstract syntax tree which is like

abstracted from the parser right that's

like more linked to the language itself

right instead of this the after canter

to you this will be basically like just

an expression with which three notes

tube for the operators and one for

applies but this is still useful this is

actually what it for matters use useful

in like particularly black uses these

things and originally this is useful is

this is because this is still at this

level you still have information about

comments because they still have

comments right is not part of things

that you can execute line breaks so you

can actually have this thing and

reconstruct your original program and

this is what formatters do right because

they don't want to lose your comments or

your maybe special indentation rules

that you made so even if this is like an

intermediate thing is still something

useful okay so let's see how do you make

a new grammar rule let's see if I can

listen quickly

awesome let's make this bigger so if you

start like an interpreter here this is

Python three nine because the future it

turns out if you try to use a regulator

and then you want to put a lambda here

right you say okay lambda or something

maybe 32 it says oh this is embarrassing

times because it turns out you're gonna

write lambda decorators or basically

every try things so let's let's actually

change that right so how do you do this

it's actually pretty simple so you go

this is the Python source tree so you go

to this grammar file which listen

grammar grammar you find the root for

the decorator which is this one and it

says okay is add and something called

Doulton name so let's change the theme

for this list which is basically

everything

and then you say okay here is an extra

step and I'm going to defy all the gods

of life calling and do see like : it's

actually pretty simple so you see this

is passing the nose is very boring

called it says 84 dot name so let's

nightly change this thing for HD for

this list first least ooh you will fail

because we have not ready needed a

passer so we do make regain grammar this

region is the parser and then we compile

awesome if in Israel they see what

happens add lamda 32 good function

return very odd things and what is this

function is 32 also much better right

don't do that at home

what

okay awesome so let's see another one

right this is actually pretty simple the

reason this is very simple is because

you just need is a substitution rule

right just like making a grammar a bit

less restrictive which is always easy to

implant so let's see actually so this is

actually how to imprint a more

complicated one is not the most

complicated one that you will see but

let's implement this thing right so

let's implant a new operator so it will

be an arrow it has to be a right arrow

because uh left arrow is basically less

than minus something and some people's

so let's implant this so this is going

to be something that you know you put an

object on the left but this a and you

write the arrow and then you pull an

object on the right and this will invoke

some magic method let's call it a draw

method or something like that right so

how do you do this thing so okay

so you go to the grammar file and then

say okay let's say that the arrow has

the same presence of the line

multiplication sign so you have the

arrow here so now you say okay so you

can multiply it you can refine you can

divide or all these things okay so you

regenerate the grammar as you saw just

now it was added because this is a new

talking you need to go to this file and

say okay this is a new talking is called

rye arrow and it has this arbitrary

number here and then you need to teach

the tokenizer to recognize that it looks

like you know C code or whatever but

it's actually pretty simple right

because the way the voice is that is a

switch here which is parsing a token and

then you say okay do you see a - good

you see then a bigger sign good so amid

right arrow is pretty simple right okay

good so now with these changes

you run the tokenizer and gneisses

recognize the arrow as an operator cool

so we have already the tokenizer

so we can move forward so it turns out

that now we need to teach the parser to

meet this theme but it turns out that

this is all always already implemented

because the rule for for multiplication

division is agnostic of the actual like

sign that you put in the middle so you

can hit that and now what we need here

is to to teach Python to me by call for

the news

so you need to go to the compiler so the

compiler is basically the ast and emits

the bytecode and so you go to the

compiler for the bind operation so it's

a binary operation this is I a plus B so

you will see that this is not in the

grammar right this is already extracted

and then you say okay if the actual

operator is one of these arrow operators

they meet this new while call that I'm

going to call in place right our

operator and then you know to go to this

file defining the why : sighs okay

there's a new bite called target which

is this one it's just having the name

here what you're saying is basically

that if you see the token then you meet

this this this token for tasty to

recognize actually you don't understand

it you just need to see that it's

actually very simple even for C code so

at this point you have already you check

the right code you will have a white

call for like right arrow something and

now the thing that is left is teaching

Python to call something on the class

and that's very simple right so this is

the evil loop this is you a lot sees one

of the core things here and when

basically is like if you if we have the

target the in place right arrow operator

then grab the object on the left grab

the o you come the right which is you

know it's a stack machine so you just

pop two from the stack and then you are

going to call this function call by

number right arrow operator that I am

going to implement and then this what is

left here and clean up the only

important part is this three first line

is giving me the two operands and call

the function and the function is going

to be super simple because all of the

machinery is already implanted so I just

need to go to the file defining the type

object so every class and I never know

to see okay these are new magic metal

and the reverse one because you know it

has to also go to your other one if it

doesn't have it which is called under

context cord right arrow and a

squandered call and it has the sign here

it is just registering it and the

function that I said I need to implement

basically is going to call binary up

which is already implemented with the

with with this thing here again you

don't need to and you need to know what

you need to change right but it you know

it's actually was just one line and just

with that literally you compile you can

try the scroll so you create this arrow

morphism to miss Haskell promise happy

it will receive a function that will

store and um plain this underscore

Iram that is going to be basically a map

so you will map the function over

whatever is there and now you can find

you can try this f ro this list and it

will like power every element square

power and it works I mean you have to

believe me well whatever

awesome so it's that simple

it's like going not that simple because

the two pipes that has been implemented

in 3/8 like the walrus and 570 like the

position only elements those those PRS

are like changing two thousand lines of

code because it's not like particularly

the walrus the problem is the scoping

likes a lot of a scoping rules and is

very difficult like Emily they are

really excellent job there and it turns

out that this lash is actually very

difficult to advocacy's on do you

remember that enormous gigantic like

automata for the root so you need to put

something there which is like you know

playing this game of like operation is

not not cool and you need to make it

more or less faster okay so let's wrap

up so what do you learn so we learn

basically which is the Python grammar

and basically how is define and what is

the fine like you know how it works we

have learned also how we don't write a

list in C Python right this is also tied

to interpreter or interpreting

implementation will do different things

but you have learn also how we generate

the parser and it's actually a very fast

parser I think someone will may be able

to confirm this by I think pipe I also

have another one parser this is an

interesting thing to think about also

because it turns out that all of the

parts well not all but like a lot of the

packages and other implantation of

Python somehow I will really rely in on

this to be ll1 right like for matters

packages I'll give you better parsers

implementation of the language etc so

inviting that because you know main that

you want to put these parentheses around

the with the statements and then we will

implant the parser like we have actually

discussions and we do is testing an

implementation for the parser but

imagine that we change this into

something more general like maybe a peg

or LK or something like that

this will basically make life for

everyone much harder right because now

all the other one parsers will not be

able to pass these new rules or

implement these parenthesized expression

with the parsers

so it's something to think about right

like because cpython sometimes is look

as just you know one of the

implementation of the language the main

of the main one or one of the most uses

but it's actually stream lis coupled

with the language so it's something to

consider and every time you change

something it has like effects everywhere

so it's something to think about and

then we learn how how all of all of

these things work together so we learn

also basically how using the things that

we have seen before like the person and

all that we can change the language in

different ways and more complicated ways

actually more difficult but the simple

ones are are great this is actually a

very good point of good design right the

fact that it has been very easy is

because we have a strong design

regarding all these pieces well it's an

old implementation but it's very strong

and also you have learned something that

I think is very important for Easter

which is like what are the limitation of

all the grammar right like even if you

think that a rule is very very simple

realize you have parenthesis and you may

think that you have parentheses around

it's super super easy to fall into one

of these everywhere like literally

everywhere is very difficult in playing

in grammar rules making everything more

consistent which is also very important

it's like even if you manage to

implement a new Rama rule the price that

you can be limiting everything right for

example the GLS statement is also

limiting a lot of the things right

because it forces some some paths in the

Rama just to be to be able to were the

way it works so the point is that even

if you manage to implement something you

may be limiting other things in the

future so it's something to think about

if you want to maintain another one

thing and this is everything I have so I

don't know if there is any time for

questions but I hope you do enjoy

[Music]

Thank You Pablo we have a minute for

questions there's a I can you so with

the width statement and adding the

parentheses for the practice eyes with

list that's ambiguous because there is

another thing that you can do with

parentheses what's the other thing that

you can do with parentheses there in the

wither statement yeah I so you can tribe

so for example this is Bali

let me change the side Jill or anything

that has parentheses so for example with

write a Python we did not adultery so

you can write that and the prints that

that parentheses like if you write which

is not valid right but if you write this

now when it sees the open parenthesis

you will go into the node that is

parsing the Jill and when it sees the

right paren that you will say this is

imbalance because I already parse this

technically you can also do this right

so you can you can write like here an

arbitrary number of parentheses and also

do this which is but it stupid but is

valid as well and the price that these

parentheses belongs to the to the test

particularly this is here what I can say

for this drink in power here you see the

atom here this rule would this rule

includes the open parenthesis and it

goes into a dual expression or at least

comprehension which you already know

that is everything so you can basically

write open parenthesis whatever and this

actually goes again you can look so

that's why electrons

thank you very much

fortune there's no more time for other

questions

[Applause]

Loading...

Loading video analysis...