pdffffffff

CSE340 Fall 2024 – Homework 1Due: Friday September 13 , 2024, by 11:59 PM on Gradescope




All submissions should be typed, including parse trees. No credit will be given to handwritten
answers
Remember that no late submissions are accepted for homework assignments (see the
syllabus).
Answer the questions in the order they are asked.
Do not write the questions together with your answers. Just write the answers.
Problem 1. Consider the following list of tokens over the alphabet { `a’, `b’ , `c’ , … , `z’ , `B’ , `C’ , … , `Z’ ,
`0’ , `1’ , `2’ , … , `9’ , `_’ }. Make sure that you read the alphabet carefully !!!!!!

T1 = { “_1__”, “_2___”, “bcd1e” } (highlights are to help you count the underscores)

T2 = { “_b_d_” }

SUPERID = strings that are ID (see below) and that have “sup” as prefix. In other words,
SUPERID is a string that is an ID and that starts with “sup”. For example, “bsup123”is not
a SUPERID, but “sup_” is a SUPERID. “sup__” is not SUPERID because even though
“sup__” has “sup” as prefix, “sup__” itself is not an ID because it ends with two ‘_’
(underscore).

ID = Set of strings that starts with a letter or underscore followed by zero or more
letters and/or digits and that ends with a letter, digit or underscore. For example, “_”,
“_11”, “_1b1b” are IDs, but “___” (three underscores) and “b__” (ending with two
underscores) are not IDs according to this definition. The input “___” consists of two
IDs, the first one is “__” (two underscores) and the second one is “_”. The input “b__”
also consists of two IDs which are equal “b_” and “_” respectively. The input “____” (4
underscores) also consists of two IDs which are both equal to “__” (two underscores
each).

CRAZY = Set of strings that consist of an ID, followed by “_crazy_” followed by an ID. For
example, “__crazy_b” is a CRAZY because it consists of the is “_” followed by “_crazy_”
followed by the ID “b”, but “_crazy_b”, “_crazy_” and “a_crazy_” are not a CRAZY
because they are not the concatenation of an ID followed by “_crazy_” followed by
another ID. Note that unlike SUPERID which are ID’s that start with “sup”, CRAZID are
not IDs.

NUM = Set of strings that consist of 1 or more digits. According to this definition, “00” is
a NUM.
In addition to the symbols in the alphabet, the input can contain the symbols `#’, and `%‘ which are
treated as separators but are otherwise ignored. This means that getToken() should stop when it
reaches `#’, or `%‘ and returns the longest matching prefix up to but not including the separator. The
separator itself (`#’, or `%’) cannot be part of a token, so, if the first tokens ends just before the
separator, the next token starts after the separators. `#’, and `%‘ are the only separators. Space
characters are not separators for this problem. If a call to getToken() starting at a given position returns
ERROR, the next call to getToken() should start by advancing one position relative to the previous call.
For example, for the input “ABC”, if we call getToken(), we get ERRROR because A is not part of the
alphabet. The next call to getToken() will start at the B after the A and will return ID.
Problem 1, Part I
For each of the following inputs, give the value of the token resulting from the first two calls to
lexer.getToken() (give only the values of the token_type and the lexeme, but not the line number)
1. Input: %_crzy_crazyabc!_
t11 = lexer.getToken();
t12 = lexer.getToken();
2. Input: _1__003_2_%_crazy__
t21 = lexer.getToken();
t22 = lexer.getToken();
3. Input: &__crazy_bcd1e!_crazyabc!&_
t31 = lexer.getToken();
t32 = lexer.getToken();
4. Input: super_supper_103_2_%_crazy__crazyabc!_
t41 = lexer.getToken();
t42 = lexer.getToken();
5. Input: ###%%%a_b_d_cra!!!zy__crazy_
t51 = lexer.getToken();
t52 = lexer.getToken();
Problem 1, Part II
Assume we are given an input such that if we call lexer.getToken() repeatedly on the input we obtain
the following sequence of token/lexeme pairs:
(T1, “_1__”) , (T2, “_b_d_”) , (ID, “_1234”) , (ID, “bcdefg”) , (EOF, “”)
Now, assume that we are starting execution from the beginning of the input and the following sequence
of statements is executed:
t6 = lexer.peek(3);
t7 = lexer.getToken();
t8 = lexer.peek(5);
t9 = lexer.getToken();
t10 = lexer.peek(3);
What are the values of t6, t7, t8, t9 and t10?
For this problem, you are only asked to give the values of the tokens and you are not asked to explain
how you obtained them.
Answer Format
For part I, just list all the tokens and their lexemes. For example, if all the tokens are (ID, “bc”), you
should answer
1. t11 = (ID, “bc”)
t12 = (ID, “bc”)
2. t21 = (ID, “bc”)
t22 = (ID, “bc”)
3. t31 = (ID, “bc”)
t32 = (ID, “bc”)
4. t41 = (ID, “bc”)
t42 = (ID, “bc”)
5. t51 = (ID, “bc”)
t52 = (ID, “bc”)
For part II, use the same format. For example, if all the tokens are (ID, “a”), you should answer
t6 = (ID, “a”)
t7 = (ID, “a”)
t8 = (ID, “a”)
t9 = (ID, “a”)
t10 = (ID, “a”)
Problem 2. Consider the following input for the parser of the expression grammar that I covered in class:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
LPAREN
NUM
PLUS
NUM
RPAREN
MULT
LPAREN
NUM
MULT
NUM
RPAREN
MULT
NUM
PLUS
NUM
“1”
“2”
“3”
“4”
“5”
As we have seen, the parsing functions call each other recursively starting with parse_input() (refer to
the first set of notes for the actual code of the parsing functions).
1. Which token is consumed first by the fourth call to parse_F() (specify the number of the token.
The numbers are highlighted in yellow in the first row of the table above)
2. The token (NUM, “5”) at 13 is consumed by which call to parse_F()
“6”
3. What is the total number of calls to parse_F() when this input is parsed?
4. What is the total number of calls to parse_E() when this input is parsed?
5. How many calls to lexer.peek(1) in the various parsing functions return PLUS while parsing this
input?
Answer format. For each question just provide the answer with no explanation. For example, if the
answer is 1 for 2.1 through 2.5, you just write:
1. 1
2. 1
3. 1
4. 1
5. 1
Problem 3. Consider the grammar
𝑆 → 𝐴𝐵𝐶
𝐴 → 𝑎𝐴𝑏| 𝑎𝐴𝑏𝑏𝑏|𝜀
𝐵 → 𝑏𝑏𝐵𝑐|𝑏𝐵𝑐𝑐|𝜀
𝐶 → 𝑐 𝑐 𝑐 𝑐 𝐶 𝑑 | 𝑐 𝑐 𝐶 𝑑 |𝜀
Where 𝑆 is the start symbol, 𝑆, 𝐴, 𝐵 and 𝐶 are the non-terminals and 𝑎, 𝑏, 𝑐 and 𝑑 are the terminals. For
each of the following indicate if (0) the input has no parse tree according to the grammar, (1) the input
has one unique parse tree according to the grammar, (2) the input has only two parse trees according
to the grammar or (3) the input has three or more parse trees.
1.
2.
3.
4.
5.
aabbbbbbbc
aabbbbccd
aabbccccd
aaabbbbbbccccccd
Give an example of an input that has three parse trees according to the grammar. The input
should not be more than 30 characters long.
6. Draw two of the parse trees for the input given in part 5 (typed not handwritten)
Answer format. For questions 3.1 through 3.5 just write the number. For the last question, just give the
string. For example, if all strings have no parse trees according to the grammar and a a a a a a a a a b b b
b b b b b b has three parse trees according to the grammar, you should write
1. 0
2. 0
3. 0
4. 0
5. 0
6. 9 a’s followed by 9 b’s (don’t write the string. Write the number of a’s and the number of b’s)
7. For this part, you should draw the parse trees. Your answer should be typed not handwritten.
Note
1. If a question asks you to give derivations, you should not answer by giving a parse tree instead
of a derivation. You don’t get any credit for parse trees in that case.
2. If a question asks you to give parse trees, you should not answer by giving a derivation instead
of a parse tree. You don’t get any credit for derivations in that case.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
Are you stuck with your online class?
Get help from our team of writers!

Order your essay today and save 20% with the discount code RAPID