Program in assembly language. General characteristics of the command system of the Assembler language for IBM-PC (basic set of commands, basic methods of addressing operands). Program structure in Assembler language. Assembly language commands

General information about assembly language

Symbolic assembly language can largely eliminate the disadvantages of machine language programming.

Its main advantage is that in assembly language all program elements are presented in symbolic form. Converting symbolic command names to their binary codes is the responsibility of special program- assembler, which frees the programmer from labor-intensive work and eliminates the inevitable errors.

Symbolic names entered when programming in assembly language usually reflect the semantics of the program, and the abbreviation of commands reflects their main function. For example: PARAM - parameter, TABLE - table, MASK - mask, ADD - addition, SUB - subtraction, etc. etc. Such names are easy for a programmer to remember.

For programming in assembly language, it is necessary to have complex tools than for programming in machine language: you need computer systems based on a microcomputer or PC with a set peripheral devices(alphanumeric keyboard, character display, float drive and printing device), as well as resident or cross-programming systems for the required types of microprocessors. Assembly language allows you to effectively write and debug much more complex programs than machine language (up to 1 - 4 KB).

Assembly languages ​​are machine-oriented, i.e., dependent on the machine language and structure of the corresponding microprocessor, since in them each microprocessor instruction is assigned a specific symbolic name.

Assembly languages ​​provide a significant increase in programmer productivity compared to machine languages ​​and at the same time retain the ability to use all software-available hardware resources of the microprocessor. This makes it possible for qualified programmers to write programs that can be executed in less than a short time and occupy less memory compared to programs created in a high-level language.

In this regard, almost all programs for controlling input/output devices (drivers) are written in assembly language, despite the presence of a fairly large range of high-level languages.

Using assembly language, the programmer can set the following parameters:

mnemonics (symbolic name) of each microprocessor machine language command;

a standard format for lines of a program written in assembly language;

format for indicating in various ways addressing and command variants;

format for specifying character constants and integer constants in various number systems;

pseudo-commands that control the process of assembling (translating) a program.

In assembly language, a program is written line by line, that is, one line is allocated for each command.

For microcomputers built on the basis of the most common types of microprocessors, there may be several variants of assembly language, but usually one is widely used in practice - this is the so-called standard assembly language

Programming at the machine instruction level is the minimum level at which programs can be written. The system of machine instructions must be sufficient to implement the required actions by issuing instructions to the computer hardware.

Each machine command consists of two parts:

· operating room - determining “what to do”;

· operand - defining processing objects, “what to do with”.

The microprocessor machine command, written in assembly language, is one line with the following syntactic form:

label command/directive operand(s) ;comments

In this case, the required field in the line is a command or directive.

The label, command/directive, and operands (if any) are separated by at least one space or tab character.

If a command or directive needs to be continued on the next line, the backslash character is used: \.

By default, assembly language does not distinguish between uppercase and lowercase letters when writing commands or directives.

Direct addressing: The effective address is determined directly by the offset field of the machine instruction, which can be 8, 16, or 32 bits in size.

mov eax, sum ; eax = sum

The assembler replaces sum with the corresponding address stored in the data segment (addressed by the ds register by default) and places the value stored at sum in the eax register.

Indirect addressing in turn has the following types:

· indirect basic (register) addressing;

· indirect basic (register) addressing with offset;

· indirect index addressing;

· indirect basic index addressing.

Indirect basic (register) addressing. With this addressing, the effective address of the operand can be located in any of the general purpose registers, except sp/esp and bp/ebp (these are specific registers for working with the stack segment). Syntactically in a command, this addressing mode is expressed by enclosing the register name in square brackets.

mov eax, ; eax = *esi; *esi value at address esi

Introduction.

Language in which it is written original program, called entrance language, and the language into which it is translated for execution by the processor is on days off tongue. The process of converting input language into output language is called broadcast. Since processors are capable of executing programs in binary machine language, which is not used for programming, translation of all source programs is necessary. Known two ways broadcasts: compilation and interpretation.

At compilation the source program is first completely translated into an equivalent program in the output language, called object program and then executed. This process is implemented using a special programs, called compiler. A compiler for which the input language is a symbolic form of representing the machine (output) language of binary codes is called assembler.

At interpretations Each line of text in the source program is analyzed (interpreted) and the command specified in it is immediately executed. The implementation of this method is entrusted to interpreter program. Interpretation takes a long time. To increase its efficiency, instead of processing each line, the interpreter first converts all team strings to characters (

). The generated sequence of symbols is used to perform the functions assigned to the original program.

The assembly language discussed below is implemented using compilation.

Features of the language.

Main features of the assembler:

● instead of binary codes, the language uses symbolic names - mnemonics. For example, for the addition command (

) mnemonics are used

Subtractions (

multiplication (

Divisions (

etc. Symbolic names are also used to address memory cells. To program in assembly language, instead of binary codes and addresses, you need to know only symbolic names that the assembler translates into binary codes;

each statement corresponds one machine command(code), i.e. there is a one-to-one correspondence between machine commands and operators in an assembly language program;

● language provides access to all objects and teams. High-level languages ​​do not have this ability. For example, assembly language allows you to check bits of the flag register, and high-level language (for example,

) does not have this ability. Note that systems programming languages ​​(for example, C) often occupy an intermediate position. In terms of accessibility, they are closer to assembly language, but have the syntax of a high-level language;

● assembly language is not a universal language. Each specific group of microprocessors has its own assembler. High-level languages ​​do not have this drawback.

Unlike high-level languages, writing and debugging a program in assembly language takes a lot of time. Despite this, assembly language has received wide use due to the following circumstances:

● a program written in assembly language is significantly smaller in size and runs much faster than a program written in a high-level language. For some applications these indicators play a primary role, for example, many system programs(including compilers), programs on credit cards, cell phones, device drivers, etc.;

● some procedures require full access to the hardware, which is usually impossible to do in a high-level language. This case includes interrupts and interrupt handlers in operating systems, as well as device controllers in embedded real-time systems.

In most programs, only a small percentage of the total code is responsible for a large percentage of the program's execution time. Typically, 1% of the program is responsible for 50% of the execution time, and 10% of the program is responsible for 90% of the execution time. Therefore, to write a specific program in real conditions, both assembler and one of the high-level languages ​​are used.

Operator format in assembly language.

An assembly language program is a list of commands (statements, sentences), each of which occupies a separate line and contains four fields: a label field, an operation field, an operand field, and a comment field. Each field has a separate column.

Label field.

Column 1 is allocated for the label field. The label is a symbolic name, or identifier, addresses memory. It is necessary so that you can:

● make a conditional or unconditional transition to the command;

● gain access to the location where the data is stored.

Such statements are provided with a label. To indicate a name, (capital) letters of the English alphabet and numbers are used. The name must have a letter at the beginning and a colon separator at the end. The colon label can be written on a separate line, and the opcode can be written on the next line in column 2, which simplifies the compiler's work. The absence of a colon does not allow distinguishing a label from an operation code if they are located on separate lines.

In some versions of assembly language, colons are placed only after instruction labels, not after data labels, and the length of the label may be limited to 6 or 8 characters.

There should not be identical names in the label field, since the label is associated with command addresses. If during program execution there is no need to call a command or data from memory, then the label field remains empty.

Operation code field.

This field contains the mnemonic code for a command or pseudo-command (see below). The command mnemonic code is chosen by the language developers. In assembly language

mnemonic is selected to load a register from memory

), and to save the contents of the register in memory - a mnemonic

). In assembly languages

for both operations you can use the same name, respectively

If the choice of mnemonic names can be arbitrary, then the need to use two machine instructions is determined by the processor architecture

The mnemonics of registers also depends on the assembler version (Table 5.2.1).

Operand field.

Here is located Additional Information, necessary to perform the operation. In the operand field for jump commands, the address to which the jump needs to be made is indicated, as well as addresses and registers that are operands for the machine command. As an example, we give operands that can be used for 8-bit processors

● numerical data,

presented in different number systems. To indicate the number system used, the constant is followed by one of Latin letters: IN,

Accordingly, binary, octal, hexadecimal, decimal number systems (

You don't have to write it down). If the first digit of a hexadecimal number is A, B, C,

Then an insignificant 0 (zero) is added in front;

● codes of internal microprocessor registers and memory cells

M (sources or receivers of information) in the form of the letters A, B, C,

M or their addresses in any number system (for example, 10B - register address

in binary system);

● identifiers,

for register pairs of aircraft,

The first letters are B,

N; for a pair of accumulator and feature register -

; for the program counter -

;for the stack pointer -

● labels indicating the addresses of the operands or next instructions in the conditional

(if the condition is met) and unconditional transitions. For example, operand M1 in the command

means the need for an unconditional transition to the command, the address of which in the label field is marked with the identifier M1;

● expressions,

which are constructed by linking the data discussed above using arithmetic and logical operators. Note that the method for reserving data space depends on the language version. Assembly language developers for

Define the word), and later entered Alternative option.

which was in the language for processors from the very beginning

In language version

used

Define a constant).

Processors process operands of different lengths. To define it, assembler developers made different decisions, for example:

II registers of different lengths have different names: EAX - for placing 32-bit operands (type

); AX - for 16-bit (type

and AN - for 8-bit (type

● for processors

Suffixes are added to each operation code: suffix

For type

; suffix ".B" for type

different opcodes are used for operands of different lengths, for example, to load a byte, a halfword (

) and words into a 64-bit register using opcodes

respectively.

Comments field.

This field provides explanations about the actions of the program. Comments do not affect the operation of the program and are intended for humans. They may be needed to modify a program, which without such comments may be completely incomprehensible even to experienced programmers. A comment begins with a symbol and is used to explain and document programs. The starting character of a comment can be:

● semicolon (;) in languages ​​for the company’s processors

Exclamation point(!) in languages ​​for

Each separate comment line is preceded by a leading character.

Pseudo-commands (directives).

In assembly language there are two main types of commands:

basic instructions that are the equivalent of processor machine code. These commands perform all the processing intended by the program;

pseudo-commands or directives, designed to service the process of translating a program into a code combination language. As an example in table. 5.2.2 shows some pseudo-commands from the assembler

for the family

.

When programming, there are situations when, according to the algorithm, the same chain of commands must be repeated many times. To get out of this situation you can:

● write the required sequence of commands whenever it appears. This approach leads to an increase in the volume of the program;

● arrange this sequence into a procedure (subroutine) and call it if necessary. This output has its drawbacks: each time you have to execute a special procedure call command and a return command, which, if the sequence is short and frequently used, can greatly reduce the speed of the program.

The simplest and effective method repeated repetition of a chain of commands consists of using macro, which can be represented as a pseudo-command designed to re-translate a group of commands often found in a program.

A macro, or macrocommand, is characterized by three aspects: macrodefinition, macroinversion and macroextension.

Macro definition

This is a designation for a repeatedly repeated sequence of program commands, used for references in the text of the program.

The macro definition has the following structure:

List of expressions; Macro definition

In the given structure of macro-definition, three parts can be distinguished:

● title

macro, including the name

Pseudo-command

and a set of parameters;

● marked with dots body macro;

● team

graduation

macro definitions.

The macro definition parameter set contains a list of all parameters given in the operand field for the selected group of instructions. If these parameters were given earlier in the program, then they do not need to be indicated in the macro definition header.

To reassemble the selected group of commands, an appeal consisting of the name is used

macro commands and list of parameters with other values.

When the assembler encounters a macro definition during the compilation process, it stores it in the macro definition table. At subsequent appearances in the program of the name (

) of a macro, the assembler replaces it with the body of the macro.

Using a macro name as an opcode is called macro-reversal(macro call), and replacing it with the body of the macro - macro expansion.

If a program is represented as a sequence of characters (letters, numbers, spaces, punctuation marks and carriage returns to move to a new line), then macro expansion consists of replacing some chains from this sequence with other chains.

Macro expansion occurs during the assembly process, not during program execution. Methods for manipulating strings of characters are assigned to macro means.

The assembly process is carried out in two passes:

● On the first pass, all macro definitions are preserved, and macro calls are expanded. In this case, the original program is read and converted into a program in which all macro definitions are removed, and each macro call is replaced by the body of the macro;

● the second pass processes the resulting program without macros.

Macros with parameters.

To work with repeated sequences of commands, the parameters of which can take different values, macro definitions are provided:

● with actual parameters that are placed in the operand field of the macro call;

● with formal parameters. During macro expansion, each formal parameter appearing in the body of the macro is replaced by the corresponding actual parameter.

using macros with parameters.

Program 1 contains two similar sequences of commands, differing in that the first one swaps P and

And the second

Program 2 includes a macro with two formal parameters P1 and P2. During macro expansion, each P1 character within the macro body is replaced by the first actual parameter (P,

), and the symbol P2 is replaced by the second actual parameter (

) from program No. 1. In the macro call

program 2 is marked: P,

The first actual parameter,

Second actual parameter.

Program 1

Program 2

MOV EBX,Q MOV EAX,Pl

MOV Q,EAX MOV EBX,P2

MOV P,EBX MOV P2,EAX

Extended capabilities.

Let's look at some advanced language features

If a macro containing a conditional jump command and a label to be jumped to is called two or more times, the label will be duplicated (duplicate label problem), which will cause an error. Therefore, each call assigns a separate label as a parameter (by the programmer). In language

the label is declared local (

) and thanks to advanced capabilities, the assembler automatically generates a different label each time the macro is expanded.

allows you to define macros inside other macros. This advanced feature is very useful in combination with conditional linking of a program. Let's consider

IF WORDSIZE GT 16 M2 MACRO

The M2 macro can be defined in both parts of the statement

However, the definition depends on which processor the program is assembled on: 16-bit or 32-bit. If M1 is not called, then macro M2 will not be defined at all.

Another advanced feature is that macros can call other macros, including themselves - recursive call. In the latter case, to avoid an endless loop, the macro must pass a parameter to itself that changes with each expansion, and also check this parameter and end the recursion when the parameter reaches a certain value.

On the use of macro means in assembler.

When using macros, the assembler must be able to perform two functions: save macro definitions And expand macro challenges.

Saving macro definitions.

All macro names are stored in a table. Each name is accompanied by a pointer to the corresponding macro so that it can be called if necessary. Some assemblers have a separate table for macro names, others have a general table in which, along with macro names, all machine instructions and directives are located.

When encountering a macro during assembly is created:

new table element with the name of the macro, the number of parameters and a pointer to another macro definition table where the body of the macro will be stored;

● list formal parameters.

The body of the macro, which is simply a string of characters, is then read and stored in the macro definition table. Formal parameters occurring in the body of the loop are marked special character.

Internal representation of a macro

from the example above for program 2 (p. 244) is:

MOV EAX, MOV EBX, MOV MOV &

where the semicolon is used as the carriage return character, and the ampersand & is used as the formal parameter character.

Extending macro calls.

Whenever a macro definition is encountered during assembly, it is stored in the macro table. When a macro is called, the assembler temporarily stops reading input data from the input device and begins reading the stored macro body. The formal parameters extracted from the macro body are replaced by actual parameters and provided by the call. The ampersand & before parameters allows the assembler to recognize them.

Despite the fact that there are many versions of assembler, the assembly processes have common features and are similar in many ways. The operation of a two-pass assembler is discussed below.

Two-pass assembler.

A program consists of a number of statements. Therefore, it would seem that when assembling, you can use the following sequence of actions:

● translate it into machine language;

● transfer the resulting machine code to a file, and the corresponding part of the listing to another file;

● repeat the listed procedures until the entire program is translated.

However, this approach is not effective. An example is the so-called problem forward link. If the first statement is a jump to statement P, located at the very end of the program, then the assembler cannot translate it. He must first determine the address of operator P, and to do this he must read the entire program. Each complete reading of the source program is called passage. Let's show how you can solve the lookahead link problem using two passes:

on the first pass you should collect and store all symbol definitions (including labels) in the table, and on the second pass, read and assemble each operator. This method is relatively simple, but a second pass through the original program requires additional time spent on I/O operations;

● on the first pass you should convert the program into an intermediate form and save it in a table, and perform the second pass not according to the original program, but according to the table. This method of assembly saves time, since the second pass does not perform I/O operations.

First pass.

First pass goal- build a symbol table. As noted above, another goal of the first pass is to preserve all macro definitions and expand calls as they appear. Consequently, both symbol definition and macro expansion occur in one pass. The symbol can be either label, or meaning, to which a specific name is assigned using the -you directive:

;Value - buffer size

By assigning meaning to symbolic names in the command label field, the assembler essentially specifies the addresses that each command will have during program execution. For this purpose, the assembler stores during the assembly process instruction address counter(

) as a special variable. At the beginning of the first pass, the value of the special variable is set to 0 and is increased after each processed command by the length of that command. As an example in table. 5.2.3 shows a program fragment indicating the length of commands and counter values. On the first pass, tables are generated symbolic names, directives And operation codes, and if necessary literal table. A literal is a constant for which the assembler automatically reserves memory. Let us immediately note that modern processors contain instructions with immediate addresses, so their assemblers do not support literals.

Symbol Name Table

contains one element for each name (Table 5.2.4). Each element of the symbolic name table contains the name itself (or a pointer to it), its numerical value, and sometimes some additional information, which may include:

● the length of the data field associated with the symbol;

● memory reallocation bits (which indicate whether the value of a symbol changes if the program is loaded at a different address than the assembler intended);

● information about whether the symbol can be accessed from outside the procedure.

Symbolic names are labels. They can be specified using operators (for example,

Directive table.

This table lists all the directives, or pseudo-commands, that are encountered when assembling a program.

Operation code table.

For each operation code, the table has separate columns: operation code designation, operand 1, operand 2, hexadecimal value of the operation code, command length and command type (Table 5.2.5). Operation codes are divided into groups depending on the number and type of operands. The command type determines the group number and specifies the procedure that is called to process all commands in that group.

Second pass.

Goal of the second pass- creation of an object program and printing, if necessary, of the assembly protocol; output information necessary for the linker to link procedures that were assembled at different times into one executable file.

In the second pass (as in the first), the lines containing the statements are read and processed one by one. The original operator and the output operator derived from it in hexadecimal object The code can be printed or placed in a buffer for later printing. After resetting the command address counter, the command is called next statement.

The source program may contain errors, for example:

the given symbol is not defined or is defined more than once;

● the opcode is represented by an invalid name (due to a typo), does not have enough operands, or has too many operands;

● no operator

Some assemblers can detect an undefined symbol and replace it. However, in most cases, when it encounters an error statement, the assembler displays an error message on the screen and attempts to continue the assembly process.

Articles dedicated to assembly language.

Topic 2.5 Basics of processor programming

As the length of the program increases, it becomes increasingly difficult to remember the codes of various operations. Mnemonics provide some assistance in this regard.

The symbolic command coding language is called assembler.

Assembly language is a language in which each utterance corresponds to exactly one machine command.

Assembly called converting a program from assembly language, i.e. preparing a program in machine language by replacing symbolic names of operations with machine codes, and symbolic addresses with absolute or relative numbers, as well as incorporating library programs and generating sequences of symbolic instructions by specifying specific parameters in micro-teams. This program usually located in ROM or entered into RAM from some external media.

Assembly language has several features that distinguish it from high-level languages:

1. This is a one-to-one correspondence between assembly language statements and machine instructions.

2. An assembly language programmer has access to all objects and instructions present on the target machine.

Understanding the basics of programming in machine-oriented languages ​​is useful for:



Better understanding of PC architecture and more competent use of computers;

To develop more rational structures of algorithms for programs for solving applied problems;

The ability to view and correct executable programs with the extension .exe and .com, compiled from any high-level languages, in case of loss of the source programs (by calling the specified programs in the DEBUG program debugger and decompiling their display in assembly language);

Compiling programs for solving the most critical problems (a program written in a machine-oriented language is usually more effective - shorter and faster by 30-60 percent of programs obtained as a result of translation from high-level languages)

To implement procedures included in the main program in the form of separate fragments in the event that they cannot be implemented either in the high-level language used or using OS service procedures.

A program in assembly language can only run on one family of computers, while a program written in a high-level language can potentially run on different machines.

The assembly language alphabet is made up of ASCII characters.

Numbers are only integers. There are:

Binary numbers end with the letter B;

Decimal numbers ending with the letter D;

Hexadecimal numbers end with the letter H.

RAM, registers, data presentation

For a certain series of MPs, an individual programming language is used - assembly language.

Assembly language occupies an intermediate position between machine codes and high-level languages. Programming in this language is easier. A program in assembly language makes more efficient use of the capabilities of a specific machine (more precisely, an MP) than a program in a high-level language (which is simpler for a programmer than assembler). Let's look at the basic principles of programming in machine-oriented languages ​​using the example of assembly language for the MP KR580VM80. A general methodology is used to program in the language. Specific technical techniques for recording programs are associated with the features of the architecture and command system of the target MP.

Software model microprocessor system based on MP KR580VM80

Software model of the MPS in accordance with Figure 1

MP Ports Memory

S Z A.C. P C

Picture 1

From the programmer's point of view, the MP KR580VM80 has the following program-accessible registers.

A– 8-bit accumulator register. It is the main register of the MP. Any operation performed in an ALU involves placing one of the operands to be processed in the accumulator. The result of an operation in the ALU is also usually stored in A.

B, C, D, E, H, L– 8-bit general purpose registers (GPR). Inner memory MP. Designed to store processed information, as well as the results of the operation. When processing 16-bit words, registers form pairs BC, DE, HL, and the double register is called the first letter - B, D, H. In a register pair, the highest is the first register. Registers H and L have a special property, used both for storing data and for storing 16-bit addresses of RAM cells.

FL– flag register (sign register) 8-bit register in which five signs of the result of performing arithmetic and logical operations in the MP are stored. FL format according to the picture

Bit C (CY - carry) - carry, set to 1 if there was a carry from the high order of the byte when performing arithmetic operations.

Bit P (parity) – parity, set to 1 if the number of ones in the bits of the result is even.

The AC digit is an additional carry, designed to store the carry value from the low-order tetrad of the result.

Bit Z (zero) – set to 1 if the result of the operation is 0.

Bit S (sign) – is set to 1 if the result is negative, and to 0 if the result is positive.

SP– stack pointer, a 16-bit register, designed to store the address of the memory cell where the last byte inserted onto the stack was written.

RS– program counter (program counter), a 16-bit register, designed to store the address of the next instruction to be executed. The contents of the program counter are automatically incremented by 1 immediately after fetching the next instruction byte.

The initial memory area of ​​address 0000Н – 07FF contains the control program and demonstration programs. This is the ROM area.

0800 – 0AFF - address area for recording the programs under study. (RAM).

0В00 – 0ВВ0 - address area for writing data. (RAM).

0ВВ0 – starting address of the stack. (RAM).

A stack is a specially organized area of ​​RAM intended for temporary storage of data or addresses. The last number written to the stack is popped first. The stack pointer stores the address of the last stack cell in which information is written. When a subroutine is called, the return address to the main program is automatically stored on the stack. As a rule, at the beginning of each subroutine the contents of all registers involved in its execution are saved on the stack, and at the end of the subroutine they are restored from the stack.

Data format and command structure of assembly language

The memory of the MP KR580VM80 is an array of 8-bit words called bytes. Each byte has its own 16-bit address, which determines its position in the sequence of memory cells. The MP can address 65536 bytes of memory, which can be contained in both ROM and RAM.

Data Format

Data is stored in memory as 8-bit words:

D7 D6 D5 D4 D3 D2 D1 D0

The least significant bit is bit 0, the most significant bit is bit 7.

A command is characterized by its format, i.e., the number of bits allocated for it, which are divided byte-by-byte into certain functional fields.

Command Format

MP KR580VM80 commands have one, two or three byte format. Multibyte commands must be placed in adjacent languages. The command format depends on the specifics of the operation being performed.

The first byte of the command contains the operation code, written in mnemonic form.

It determines the command format and the actions that must be performed by the MP on the data during its execution, and the addressing method, and may also contain information about the location of the data.

The second and third bytes may contain data on which operations are performed, or addresses indicating the location of the data. The data on which actions are performed are called operands.

Single-byte command format according to Figure 2

Figure 4

In assembly language commands, the operation code has a shortened form of writing English words - a mnemonic notation. Mnemonics (from the Greek mnemonic - the art of memorization) makes it easier to remember commands by their functional purpose.

Before execution, the source program is translated using a translation program called assembler into the language of code combinations - machine language, in this form it is placed in the memory of the MP and is then used when executing the command.


Addressing methods

All operand codes (input and output) must be located somewhere. They can be located in the internal registers of the MP (the most convenient and quick option). They can be located in system memory(the most common option). Finally, they can be located in I/O devices (the rarest case). The location of the operands is determined by the instruction code. Exist different methods, with which the instruction code can determine where to take the input operand and where to place the output operand. These methods are called addressing methods.

For MP KR580VM80, the following addressing methods exist:

Direct;

Register;

Indirect;

Stacked.

Direct addressing assumes that the (input) operand is located in memory immediately after the instruction code. The operand is usually a constant that needs to be sent somewhere, added to something, etc. the data is contained in the second or second and third bytes of the command, with the low byte of data located in the second byte of the command, and the high byte in the third command byte.

Straight (aka absolute) addressing assumes that the operand (input or output) is located in memory at the address, the code of which is located inside the program immediately after the instruction code. Used in three-byte commands.

Register addressing assumes that the operand (input or output) is in the internal register of the MP. Used in single-byte commands

Indirect (implicit) addressing assumes that the internal register of the MP contains not the operand itself, but its address in memory.

Stack addressing assumes that the command does not contain an address. Addressing memory cells using the contents of the 16-bit SP register (stack pointer).

Command system

The MP command system is a complete list of elementary actions that the MP is capable of performing. The MP controlled by these commands performs simple actions, such as elementary arithmetic and logical operations, data transfer, comparison of two values, etc. The number of commands of the MP KR580VM80 is 78 (taking into account modifications 244).

The following groups of commands are distinguished:

Data transmission;

Arithmetic;

Brain teaser;

Jump commands;

Input/output, control and stack commands.


Symbols and abbreviations used when describing commands and composing programs

Symbol Reduction
ADDR 16-bit address
DATA 8-bit data
DATA 16 16-bit data
PORT 8-bit I/O device address
BYTE 2 Second byte of the command
BYTE 3 Third byte of command
R, R1, R2 One of the registers: A, B, C, D, E, H, L
R.P. One of the register pairs: B - specifies the pair BC; D - specifies a DE pair; H – specifies the HL pair
RH First register of the pair
R.L. Second register of the pair
Λ Logical multiplication
V Logical addition
Addition modulo two
M A memory cell whose address specifies the contents of the register pair HL, i.e. M = (HL)

1. PC architecture………………………………………………………………………………5

    1.1. Registers.

    1.1.1 General purpose registers.

1.1.2. Segment registers

1.1.3 Flag register

1.2. Organization of memory.

1.3. Data presentation.

1.3.1 Data types

1.3.2 Representation of characters and strings

2. Program statements in assembler ……………………………………

    1. Assembly language commands

2.2. Addressing modes and machine instruction formats

3. Pseudo-operators……………………………………………………….

3.1 Data definition directives

3.2 Structure of an assembler program

3.2.1 Program segments. assume directive

3.2.3 Simplified segmentation directive

4. Assembling and composing the program ………………………….

5. Data transfer commands…………………………………………….

    5.1 General commands

    5.2 Stack commands

5.3 I/O commands

5.4 Address forwarding commands

5.5 Flag forwarding commands

6. Arithmetic commands…………………………………………….

    6.1 Arithmetic operations on binary integers

6.1.1 Addition and subtraction

6.1.2 Commands to increment and decrement the receiver by one

6.2 Multiplication and division

6.3 Change of sign

7. Logical operations…………………………………………….

8. Shifts and cyclic shifts………………………………………………………

9. String operations……………………………………………………….

10. Logic and organization of programs……………………………………

10.1 Unconditional jumps

10.2 Conditional jumps

10.4 Procedures in assembly language

10.5 INT Interrupts

10.6 System software

10.6.1.1 Reading the keyboard.

10.6.1.2 Displaying characters on the screen

10.6.1.3 Ending programs.

10.6.2.1 Selecting display modes

11. Disk memory………………………………………………………………..

11.2 File distribution table

11.3 Disk I/O operations

11.3.1 Writing a file to disk

11.3.1.1 ASCIIZ data

11.3.1.2 File number

11.3.1.3 Creating a disk file

11.3.2 Reading a disk file

Introduction

Assembly language is a symbolic representation of machine language. All processes in a personal computer (PC) at the lowest hardware level are driven only by machine language commands (instructions). It is impossible to truly solve problems related to hardware (or even, moreover, dependent on hardware, such as increasing the speed of a program), without knowledge of assembler.

The assembler is a convenient form of commands directly for PC components and requires knowledge of the properties and capabilities of the integrated circuit containing these components, namely the PC microprocessor. Thus, assembly language is directly related to the internal organization of the PC. And it is no coincidence that almost all high-level language compilers support access to the assembly level of programming.

An element of the training of a professional programmer is necessarily the study of assembler. This is because assembly language programming requires knowledge of PC architecture, which allows you to create more efficient programs in other languages ​​and combine them with assembly language programs.

The manual discusses programming in assembly language for computers based on Intel microprocessors.

This tutorial is addressed to everyone who is interested in processor architecture and the basics of programming in Assembly language, primarily to software product developers.

    PC architecture.

Computer architecture is an abstract representation of a computer, which reflects its structural, circuitry and logical organization.

All modern computers have some common and individual architectural properties. Individual properties are unique to a specific computer model.

The concept of computer architecture includes:

    computer block diagram;

    means and methods of access to elements of the computer block diagram;

    set and availability of registers;

    organization and methods of addressing;

    method of presentation and format of computer data;

    a set of computer machine instructions;

    machine instruction formats;

    interrupt handling.

The main elements of computer hardware: system unit, keyboard, display devices, disk drives, printing devices (printer) and various communications equipment. System unit consists of a motherboard, power supply and expansion cells for additional cards. The system board contains a microprocessor, read-only memory (ROM), RAM(RAM) and coprocessor.

      Registers.

Inside the microprocessor, information is contained in a group of 32 registers (16 user, 16 system), to one degree or another, available for use by the programmer. Since the manual is devoted to programming for the 8088-i486 microprocessor, it is most logical to start this topic with a discussion of the internal registers of the microprocessor that are accessible to the user.

User registers are used by the programmer to write programs. These registers include:

    eight 32-bit registers (general purpose registers) EAX/AX/AH/AL, EBX/BX/BH/BL, ECX/CX/CH/CL, EDX/DX/DLH/DL, EBP/BP, ESI/SI, EDI/DI, ESP/SP;

    six 16-bit segment registers: CS, DS, SS, ES, FS, GS;

    status and control registers: EFLAGS/FLAGS flag register, and EIP/IP command pointer register.

Parts of one 32-bit register are indicated through a slash. The prefix E (Extended) indicates the use of a 32-bit register. To work with bytes, registers with prefixes L (low) and H(high) are used, for example, AL, CH - denoting the low and high bytes of the 16-bit parts of the registers.

        General purpose registers.

EAX/AX/AH/AL(Accumulator register) – battery. Used in multiplication and division, in I/O operations, and in some string operations.

EBX/BX/BH/BL – base register(base register), often used when addressing data in memory.

ECX/CX/CH/CL – counter(count register), used as a counter for the number of repetitions of the loop.

EDX/DX/DH/DL – data register(data register), used to store intermediate data. In some teams, its use is mandatory.

All registers in this group allow access to their “lower” parts. Only the lower 16- and 8-bit parts of these registers can be used for self-addressing. The upper 16 bits of these registers are not available as independent objects.

To support string processing commands that allow sequential processing of chains of elements with a length of 32, 16 or 8 bits, the following are used:

ESI/SI (source index register) – index source. Contains the address of the current source element.

EDI/DI (destination index register) – index receiver(recipient). Contains the current address in the destination line.

In the microprocessor architecture, a data structure – a stack – is supported at the hardware and software level. There are special instructions and special registers for working with the stack. It should be noted that the stack is filled towards smaller addresses.

ESP/SP (stack poINTer register) – register pointer stack. Contains a pointer to the top of the stack in the current stack segment.

EBP/BP (base poINTer register) – stack base pointer register. Designed to organize random access to data inside the stack.

1.1.2. Segment registers

The microprocessor software model has six segment registers: CS, SS, DS, ES, GS, FS. Their existence is due to the specific organization and use of RAM by Intel microprocessors. The microprocessor hardware supports the structural organization of the program consisting of segments. To indicate the segments available in this moment segment registers are intended. The microprocessor supports the following segment types:

    Code segment. Contains program commands. To access this segment, use the CS register (code segment register) - segment code register. It contains the address of the machine instruction segment that the microprocessor has access to.

    Data segment. Contains data processed by the program. To access this segment, use the DS (data segment register) register - segment data register, which stores the address of the current program's data segment.

    Stack segment. This segment is an area of ​​memory called the stack. The microprocessor organizes the stack according to the principle - first “in”, first “out”. To access the stack, use the SS (stack segment register) register - stack segment register, containing the address of the stack segment.

    Additional data segment. The processed data can be located in three additional data segments. By default, the data is assumed to be in the data segment. When using additional data segments, their addresses must be specified explicitly using special segment override prefixes in the command. Addresses of additional data segments must be contained in the ES, GS, FS registers (extenSIon data segment registers).

        Control and status registers

The microprocessor contains several registers that contain information about the state of both the microprocessor itself and the program whose commands are currently loaded into the pipeline. This:

EIP/IP Instruction Pointer Register;

    flag register EFLAGS/FLAGS.

Using these registers, you can obtain information about the results of command execution and influence the state of the microprocessor itself.

EIP/IP (instruction poINTer register) – pointer teams. The EIP/IP register is 32-bit or 16-bit wide and contains the offset of the next instruction to be executed relative to the contents of the CS segment register in the current instruction segment. This register is not directly accessible, but can be changed using jump instructions.

EFLAGS/FLAGS (Flag register) – register flags. Bit size 32/16 bits. Individual bits of this register have a specific functional purpose and are called flags. A flag is a bit that takes the value 1 ("flag set") if some condition is met, and the value 0 ("flag cleared") otherwise. The low part of this register is completely similar to the FLAGS register for i8086.

1.1.3 Flag register

The flags register is 32-bit and is named EFLAGS (Fig. 1). Individual bits of the register have a specific functional purpose and are called flags. Each of them is assigned a specific name (ZF, CF, etc.). The lower 16 bits of EFLAGS represent the 16-bit FLAGS flag register used when executing programs written for the i086 and i286 microprocessors.

Fig.1 Flag Register

Some flags are commonly called condition flags; they automatically change when commands are executed and record certain properties of their result (for example, whether it is equal to zero). Other flags are called state flags; they change from the program and influence the further behavior of the processor (for example, they block interrupts).

Condition flags:

CF (carry flag) - carry flag. Takes the value 1 if, when adding integers, a carry unit appeared that did not “fit” into the bit grid, or if, when subtracting unsigned numbers, the first of them was less than the second. In shift commands, the bit that is outside the bit grid is entered into CF. CF also captures the features of the multiplication instruction.

OF (overflow flag) - overflow flag. Set to 1 if, when adding or subtracting signed integers, the result is a result that exceeds the permissible value in absolute value (the mantissa overflowed and it “climbed” into the sign digit).

ZF (zero flag) - zero flag. Set to 1 if the command result is 0.

SF (SIgn flag) - flag sign. Set to 1 if an operation on signed numbers produces a negative result.

PF (parity flag) - flag parity. Equal to 1 if the result of the next command contains an even number of binary ones. Usually taken into account only for I/O operations.

AF (auxiliary carry flag) - extra carry flag. Fixes the features of performing operations on binary decimal numbers.

State flags:

DF (direction flag) - direction flag. Sets the direction for viewing lines in line commands: when DF=0, lines are viewed “forward” (from beginning to end), when DF=1 - in the opposite direction.

IOPL (input/output privilege level) – I/O privilege level. Used in protected mode of microprocessor operation to control access to I/O commands, depending on the privilege of the task.

NT (nested task) – task nesting flag. Used in protected mode of microprocessor operation to record the fact that one task is nested within another.

System flag:

IF (INTerrupt flag) - interrupt flag. When IF=0, the processor stops responding to incoming interrupts; when IF=1, the interrupt blocking is removed.

TF (trap flag) - trace flag. When TF=1, after executing each command, the processor makes an interrupt (numbered 1), which can be used when debugging a program to trace it.

RF (resume flag) – resume flag. Used when processing interrupts from debug registers.

VM (virtuAL 8086 mode) – virtual 8086 flag. 1-processor operates in virtual 8086 mode. 0-processor operates in real or protected mode.

AC (ALignment check) – alignment control flag. Designed to allow alignment control when accessing memory.

      Organization of memory.

The physical memory that the microprocessor has access to is called RAM ( or random access memory - RAM). RAM is a chain of bytes that have their own unique address (its number), called physical. The range of physical address values ​​is from 0 to 4 GB. The memory management mechanism is entirely hardware.

The microprocessor hardware supports several models of using RAM:

    segmented model. In this model, memory for programs is divided into contiguous memory areas (segments), and the program itself can only access the data that is located in these segments;

    page model. In this case, RAM is considered as a set of blocks of a fixed size of 4 KB. The main application of this model is related to the organization virtual memory, which allows programs to use more memory space than physical memory. For a Pentium microprocessor, the size of possible virtual memory can reach 4 TB.

The use and implementation of these models depends on the operating mode of the microprocessor:

    Real address mode (real mode). The mode is similar to the operation of the i8086 processor. Necessary for the operation of programs developed for early processor models.

    Protected mode. Protected mode allows you to multitask information processing, memory protection using a four-level privilege mechanism and its paging organization.

    Virtual 8086 mode. In this mode, it becomes possible to run several programs for the i8086. In this case, real-mode programs can operate.

Segmentation is an addressing mechanism that ensures the existence of several independent address spaces. A segment is an independent, hardware-supported block of memory.

Each program can generally consist of any number of segments, but it has direct access to three main ones: code, data and stack - and from one to three additional data segments. The operating system places program segments in RAM at specific physical addresses, and then places the values ​​of these addresses in the appropriate registers. Within a segment, the program accesses addresses relative to the beginning of the segment linearly, that is, starting from address 0 and ending with an address equal to the size of the segment. Relative address or bias, which the microprocessor uses to access data within a segment is called effective.

Formation of a physical address in real mode

In real mode, the range of changes in the physical address is from 0 to 1 MB. The maximum segment size is 64 KB. When contacting a specific physical address RAM is determined by the address of the beginning of the segment and the offset within the segment. The segment start address is taken from the corresponding segment register. In this case, the segment register contains only the most significant 16 bits of the physical address of the beginning of the segment. The missing low four bits of the 20-bit address are obtained by shifting the value of the segment register to the left by 4 bits. The shift operation is performed in hardware. The resulting 20-bit value is the real physical address corresponding to the beginning of the segment. That is physical adress is specified as a “segment:offset” pair, where “segment” is the first 16 bits of the starting address of the memory segment to which the cell belongs, and “offset” is the 16-bit address of this cell, counted from the beginning of this memory segment (value 16 * segment +offset gives the absolute address of the cell). If, for example, the CS register stores the value 1234h, then the address pair 1234h:507h defines an absolute address equal to 16*1234h+507h =12340h+507h = 12847h. Such a pair is written as a double word, and (as for numbers) in an “inverted” form: the first word contains an offset, and the second - a segment, and each of these words, in turn, is presented in an “inverted” form. For example, the pair 1234h:5678h would be written like this:| 78 | 56| 34 | 12|.

This mechanism for generating a physical address makes it possible to make the software relocatable, that is, independent of specific loading addresses in RAM.

Programming at the machine instruction level is the minimum level at which programs can be written. The system of machine instructions must be sufficient to implement the required actions by issuing instructions to the computer hardware.

Each machine command consists of two parts:

  • operational - determining “what to do”;
  • operand - defining processing objects, “what to do with.”

The microprocessor machine command, written in assembly language, is one line with the following syntactic form:

command/directive label operand(s) ;comments

In this case, the required field in the line is a command or directive.

The label, command/directive, and operands (if any) are separated by at least one space or tab character.

If a command or directive needs to be continued on the next line, the backslash character is used: \.

By default, assembly language does not distinguish between uppercase and lowercase letters when writing commands or directives.

Example lines of code:

Count db 1 ;Name, directive, one operand
mov eax,0 ;Command, two operands
cbw ; Team

Tags

Label in assembly language can contain the following symbols:

  • all letters of the Latin alphabet;
  • numbers from 0 to 9;
  • special characters: _, @, $, ?.

A period can be used as the first character of a label, but some compilers do not recommend using this character. Reserved Assembler names (directives, operators, command names) cannot be used as labels.

The first character in the label must be a letter or special character (but not a number). Maximum length tags – 31 characters. All labels that are written on a line that does not contain an assembler directive must end with a colon: .

Teams

Team tells the translator what action the microprocessor should perform. In a data segment, a command (or directive) defines a field, workspace, or constant. In a code segment, a command specifies an action, such as moving (mov) or adding (add).

Directives

The assembler has a number of operators that allow you to control the process of assembly and listing. These operators are called directives . They act only during the process of assembling the program and, unlike commands, do not generate machine code.

Operands

Operand – an object on which a machine command or programming language statement is executed.
An instruction may have one or two operands, or no operands at all. The number of operands is implicitly specified by the instruction code.
Examples:

  • No operands ret ;Return
  • One operand inc ecx ;Increase ecx
  • Two operands add eax,12 ;Add 12 to eax

The label, command (directive), and operand do not have to start at any particular position in the line. However, it is recommended to write them in a column to make the program easier to read.

The operands can be

  • identifiers;
  • strings of characters enclosed in single or double quotes;
  • integers in binary, octal, decimal or hexadecimal number systems.
Identifiers

Identifiers – sequences of valid characters used to denote program objects such as operation codes, variable names, and label names.

Rules for recording identifiers.

  • The identifier can consist of one or more characters.
  • As symbols you can use letters of the Latin alphabet, numbers and some special characters: _, ?, $, @.
  • An identifier cannot begin with a digit character.
  • The length of the identifier can be up to 255 characters.
  • The translator accepts the first 32 characters of the identifier and ignores the rest.
Comments

Comments are separated from the executable line by a character; . In this case, everything written after the semicolon and up to the end of the line is a comment. Using comments in a program improves its clarity, especially where the purpose of a set of commands is unclear. The comment can contain any printable characters, including spaces. A comment can span the entire line or follow a command on the same line.

Assembly program structure

A program written in assembly language can consist of several parts called modules . Each module can have one or more data, stack, and code segments defined. Any complete assembler program must include one main, or main, module from which its execution begins. A module can contain code segments, data segments, and stack segments, declared using appropriate directives. Before declaring segments, you must specify the memory model using the .MODEL directive.

An example of a “do nothing” program in assembly language:

686P
.MODEL FLAT, STDCALL
.DATA
.CODE
START:

RET
END START

This program contains only one microprocessor command. This command is RET. It ensures that the program terminates correctly. In general, this command is used to exit a procedure.
The rest of the program concerns the operation of the translator.
.686P - Pentium 6 (Pentium II) protected mode commands are allowed. This directive selects the supported set of assembler instructions, indicating the processor model. The letter P indicated at the end of the directive informs the translator that the processor is operating in protected mode.
.MODEL FLAT, stdcall - flat memory model. This memory model is used in the operating room Windows system. stdcall
.DATA is a program segment containing data.
.CODE is a program block containing code.
START - label. In assembler, tags play a big role, which cannot be said about modern high-level languages.
END START - the end of the program and a message to the translator that program execution should begin with the START label.
Each module must contain an END directive to mark the end source code programs. All lines that follow the END directive are ignored. If you omit the END directive, an error is generated.
The label specified after the END directive tells the translator the name of the main module from which program execution begins. If the program contains one module, the label after the END directive can be omitted.




Top