Z80-ASM - the Z80 assembler
The z80-asm is a double pass syntax controlled compiler. All is controlled from
compile function.
This function first calls lexical_analysis
which parses another line from the source and fills global structure lex. It performs lexical analysis and syntactical
analysis. Syntactical analysis is very simple because of simple assembler
syntax (line oriented source, max. one instruction per line, fixed order of
label, instruction and arguments) so syntactical analysis is done at the same
time as lexical analysis.
After return from lexical analysis the compile function
calls special translating function according to the instruction body.
Each translating function can translate group of instructions (e.g. arithmetical
instructions, load instructions, bit instructions ...) and it also performs
semantical analysis. It tests if there is a correct number of arguments, it
tests types of arguments and so on. When everything's OK translaing instruction
translates instruction and returns. Translating functions can be found in files
compile.c and compile.h (where headers are).
Compiler has 64Kb memory buffer representing Z80's memory. Translating functions write translated
code to this buffer. This buffer is in variable unsigned char memory[65536]; in z80-asm.c
file.
Because labels can be used before they're defined this compiler passes through
the source code twice. During first pass are labels collected to a table, where
label text and label address is stored. This table is hashed so access is
immediate and adding is quite simple. I think it's the best way to store
labels. If labels were found binary it would be in logarithmic time. And storing
labels as they come has linear complexity for searching.
During first pass nothing's put to memory but translating functions don't know
it. They "translate" normally. So possible errors are revealed already in first
pass. What happens when an instruction uses label during first pass? Hashing
function replies current address on a query, so everything's OK.
During second pass instructions are completely translated. If a label
occurs it's searched in the table and address of label is used. If there's not
such label error message is generated.
- int convert(const struct seznam_type *seznam,int len,char *txt);
Binary finds txt in seznam list, length of seznam is
len. This is called to convert string (instruction name, register name,
flag name, ...) to token. In files instr.c, instr.h,
regs.c, regs.h and z80-asm.h are tables for converting
these strings to tokens. Tokens are represented by numbers, in sources macros.
- int test_number(char *txt,int *value);
Converts string txt to integer number pointed by value.
Recognized numbers are signed, unsigned; numbers in decimal, hexadecimal
(prefixed with 0x) and binary (prefixed with %) scale. It accepts
characters too. Character is enclosed in apostrophes same as in the C language.
Value of character is it's ascii code. If string txt isn't recognized as
a number 0 is returned. Otherwise 1 is returned.
- int convert_arg(struct argument_type *arg,char *txt);
Converts string txt to struct
argument_type. This function is called from lexical parser to
convert arguments. It determines type of argument and it's value. If argument
can't be recognized 1 is returned. On success 0 is returned.
There's a problem how to recognize carry flag and C register in lexical
analysis. Lexical parser doesn't know which "C" is it, it knows semantical
parser. This problem is solved using simple but a bit dirty trick: value of
token "C register" and token "carry flag" are the same. Competent translating functions
then accept both types: register and flag.
- int lexical_analysis(char *line);
This function performes whole lexical analysis. On input it gets string
containing line from the source code. Parser reads string from left to righ,
skips whitespace. If semicolon is read parser ends. Instruction parts are
expected in this order: label, instruction body, arguments. Labels are added to
the table in first pass. Instruction body is parsed by
convert function. Arguments are parsed by
convert_arg function.
- int compile(void);
This is main compiling function. It controlls entire translation. It's
described in section 1.
- struct argument_type
- unsigned char type - type of the argument. It can be for example A_REG, A_NUM, A_CONST, A_STRING, ...
- int value - value of the argument
- char *text - this variable is only for string arguments. If argument is a string, value is not used and variable text
is used. Otherwise text is unused.
- unsigned char label - flag to recognize if number was a constant or a label (for arguments of type A_NUM or
A_PODLE_NUM). Using this flag translating functions for jr and djnz instructions can decide between writing fixed address
and computing relative address.
- struct argument_type *next,*previous - pointer to next and previous member of the argument list. Argument list
is a bidirectional list. End of list is signed with zero value of either next or previous variable.
- struct lex_type - structure for communication between lexical_analysis function and compile function.
- int instruction - instruction token. For example: I_NOP, I_LD, I_RET, I_EX, I_HALT, ...
- struct argument_type *arg - bidirectional list of arguments
Other compiler variables are here.
Main compiler functions are made as a library functions so they can be simply
called from a separate program.
To use compiler as a library you must compile the asm.a library typing
make asm.a, include file asm.h to your source and define
following functions:
- void take_line(char *line);
- this function loads one line of the source code to line variable
- warning: this function doesn't alloc memory, memory allocation is made
in the compiler
- maximal length of line is MAX_TEXT
- void error(int line_number,char *line,char* err_message);
- this function prints an errror message to output
- line_number is number of the line, where the error occured
- line is source line where the error occured
- err_message is an error message from the compiler
There are some variables for controlling the compiler, these variables must be
set before calling the compiler:
- int WARNINGS;
- if this flag is on compiler checks if you overwrite code on any address
and writes warning message to stderr
- 0 means off
- 1 means on
- int pruchod;
- number of compiler pass
- 1 is first pass when source is compiled and labels are read and stored to
label table, if labels are as an instruction argument zero value is written
to memory instead of real label value; this allows using labels before these
are defined
- 2 means code isn't compiled and only label addresses are written to memory
- int disable_defx;
- with this this flag you can prohibit using of defb, defw and defs
instructions, using these instructions when they're prohibited causes an
error
- 1 is disable using defx instructions
- 0 is enable
- unsigned short address;
- address of lowest used memory byte
- this is good especially for saving code to a file
- this variable is only for reading
- unsigned short end;
- address of highest used memory byte
- similar meaning as the above one
- this variable is only for reading
- unsigned char *memory_ptr;
- pointer to start of the Z80 memory
- this variable MUST be set before using the compiler
- int line;
- number of currently compiled line
- read only variable
And now to start compiling:
- first run the asm_init function
- then you can call the compile function; this function returns 0 when
everything's OK; otherwise it returns 1
- after all call the asm_close function which cleans after the compiler