Z80-ASM - the Z80 assembler

0. Content

0. Content
1. How it works
2. Labels
3. Functions
4. Data
5. The compiler library

1. How it works

The z80-asm is a double pass syntax controlled compiler. All is controlled from compile function.

This function first calls lexical_analysis which parses another line from the source and fills global structure lex. It performs lexical analysis and syntactical analysis. Syntactical analysis is very simple because of simple assembler syntax (line oriented source, max. one instruction per line, fixed order of label, instruction and arguments) so syntactical analysis is done at the same time as lexical analysis.

After return from lexical analysis the compile function calls special translating function according to the instruction body. Each translating function can translate group of instructions (e.g. arithmetical instructions, load instructions, bit instructions ...) and it also performs semantical analysis. It tests if there is a correct number of arguments, it tests types of arguments and so on. When everything's OK translaing instruction translates instruction and returns. Translating functions can be found in files compile.c and compile.h (where headers are).

Compiler has 64Kb memory buffer representing Z80's memory. Translating functions write translated code to this buffer. This buffer is in variable unsigned char memory[65536]; in z80-asm.c file.

2. Labels

Because labels can be used before they're defined this compiler passes through the source code twice. During first pass are labels collected to a table, where label text and label address is stored. This table is hashed so access is immediate and adding is quite simple. I think it's the best way to store labels. If labels were found binary it would be in logarithmic time. And storing labels as they come has linear complexity for searching.

During first pass nothing's put to memory but translating functions don't know it. They "translate" normally. So possible errors are revealed already in first pass. What happens when an instruction uses label during first pass? Hashing function replies current address on a query, so everything's OK.

During second pass instructions are completely translated. If a label occurs it's searched in the table and address of label is used. If there's not such label error message is generated.

3. Functions

int convert(const struct seznam_type *seznam,int len,char *txt);
Binary finds txt in seznam list, length of seznam is len. This is called to convert string (instruction name, register name, flag name, ...) to token. In files instr.c, instr.h, regs.c, regs.h and z80-asm.h are tables for converting these strings to tokens. Tokens are represented by numbers, in sources macros.
int test_number(char *txt,int *value);
Converts string txt to integer number pointed by value. Recognized numbers are signed, unsigned; numbers in decimal, hexadecimal (prefixed with 0x) and binary (prefixed with %) scale. It accepts characters too. Character is enclosed in apostrophes same as in the C language. Value of character is it's ascii code. If string txt isn't recognized as a number 0 is returned. Otherwise 1 is returned.
int convert_arg(struct argument_type *arg,char *txt);
Converts string txt to struct argument_type. This function is called from lexical parser to convert arguments. It determines type of argument and it's value. If argument can't be recognized 1 is returned. On success 0 is returned.
There's a problem how to recognize carry flag and C register in lexical analysis. Lexical parser doesn't know which "C" is it, it knows semantical parser. This problem is solved using simple but a bit dirty trick: value of token "C register" and token "carry flag" are the same. Competent translating functions then accept both types: register and flag.
int lexical_analysis(char *line);
This function performes whole lexical analysis. On input it gets string containing line from the source code. Parser reads string from left to righ, skips whitespace. If semicolon is read parser ends. Instruction parts are expected in this order: label, instruction body, arguments. Labels are added to the table in first pass. Instruction body is parsed by convert function. Arguments are parsed by convert_arg function.
int compile(void);
This is main compiling function. It controlls entire translation. It's described in section 1.

4. Data

struct argument_type
- unsigned char type - type of the argument. It can be for example A_REG, A_NUM, A_CONST, A_STRING, ...
- int value - value of the argument
- char *text - this variable is only for string arguments. If argument is a string, value is not used and variable text is used. Otherwise text is unused.
- unsigned char label - flag to recognize if number was a constant or a label (for arguments of type A_NUM or A_PODLE_NUM). Using this flag translating functions for jr and djnz instructions can decide between writing fixed address and computing relative address.
- struct argument_type *next,*previous - pointer to next and previous member of the argument list. Argument list is a bidirectional list. End of list is signed with zero value of either next or previous variable.
struct lex_type - structure for communication between lexical_analysis function and compile function.
- int instruction - instruction token. For example: I_NOP, I_LD, I_RET, I_EX, I_HALT, ...
- struct argument_type *arg - bidirectional list of arguments

Other compiler variables are here.

5. The compiler library

Main compiler functions are made as a library functions so they can be simply called from a separate program.

To use compiler as a library you must compile the asm.a library typing make asm.a, include file asm.h to your source and define following functions:

void take_line(char *line);
- this function loads one line of the source code to line variable
- warning: this function doesn't alloc memory, memory allocation is made in the compiler
- maximal length of line is MAX_TEXT
void error(int line_number,char *line,char* err_message);
- this function prints an errror message to output
- line_number is number of the line, where the error occured
- line is source line where the error occured
- err_message is an error message from the compiler

There are some variables for controlling the compiler, these variables must be set before calling the compiler:

int WARNINGS;
- if this flag is on compiler checks if you overwrite code on any address and writes warning message to stderr
- 0 means off
- 1 means on
int pruchod;
- number of compiler pass
- 1 is first pass when source is compiled and labels are read and stored to label table, if labels are as an instruction argument zero value is written to memory instead of real label value; this allows using labels before these are defined
- 2 means code isn't compiled and only label addresses are written to memory
int disable_defx;
- with this this flag you can prohibit using of defb, defw and defs instructions, using these instructions when they're prohibited causes an error
- 1 is disable using defx instructions
- 0 is enable
unsigned short address;
- address of lowest used memory byte
- this is good especially for saving code to a file
- this variable is only for reading
unsigned short end;
- address of highest used memory byte
- similar meaning as the above one
- this variable is only for reading
unsigned char *memory_ptr;
- pointer to start of the Z80 memory
- this variable MUST be set before using the compiler
int line;
- number of currently compiled line
- read only variable

And now to start compiling:

first run the asm_init function
then you can call the compile function; this function returns 0 when everything's OK; otherwise it returns 1
after all call the asm_close function which cleans after the compiler