Basic components of assembly language and instruction structure. Data format and structure of assembly language commands. In the discipline "System programming"

Introduction.

The language in which the source program is written is called entrance language, and the language into which it is translated for execution by the processor is on days off tongue. The process of converting input language into output language is called broadcast. Since processors are capable of executing programs in binary machine language, which is not used for programming, translation of all source programs is necessary. Known two ways broadcasts: compilation and interpretation.

At compilation the source program is first completely translated into an equivalent program in the output language, called object program and then executed. This process is implemented using a special programs, called compiler. A compiler for which the input language is a symbolic form of representing the machine (output) language of binary codes is called assembler.

At interpretations Each line of text in the source program is analyzed (interpreted) and the command specified in it is immediately executed. The implementation of this method is entrusted to interpreter program. Interpretation takes a long time. To increase its efficiency, instead of processing each line, the interpreter first converts all team strings to characters (

). The generated sequence of symbols is used to perform the functions assigned to the original program.

The assembly language discussed below is implemented using compilation.

Features of the language.

Main features of the assembler:

● instead of binary codes, the language uses symbolic names - mnemonics. For example, for the addition command (

) mnemonics are used

Subtractions (

multiplication (

Divisions (

etc. Symbolic names are also used to address memory cells. To program in assembly language, instead of binary codes and addresses, you need to know only symbolic names that the assembler translates into binary codes;

each statement corresponds one machine command(code), i.e. there is a one-to-one correspondence between machine commands and operators in an assembly language program;

● language provides access to all objects and teams. High-level languages ​​do not have this ability. For example, assembly language allows you to check bits of the flag register, and high-level language (for example,

) does not have this ability. Note that systems programming languages ​​(for example, C) often occupy an intermediate position. In terms of accessibility, they are closer to assembly language, but have the syntax of a high-level language;

● assembly language is not a universal language. Each specific group of microprocessors has its own assembler. High-level languages ​​do not have this drawback.

Unlike high-level languages, writing and debugging a program in assembly language takes a lot of time. Despite this, assembly language has received wide use due to the following circumstances:

● a program written in assembly language is significantly smaller in size and runs much faster than a program written in a high-level language. For some applications these indicators play a primary role, for example, many system programs(including compilers), programs on credit cards, cell phones, device drivers, etc.;

● some procedures require full access to the hardware, which is usually impossible to do in a high-level language. This case includes interrupts and interrupt handlers in operating systems, as well as device controllers in embedded real-time systems.

In most programs, only a small percentage of the total code is responsible for a large percentage of the program's execution time. Typically, 1% of the program is responsible for 50% of the execution time, and 10% of the program is responsible for 90% of the execution time. Therefore, to write a specific program in real conditions, both assembler and one of the high-level languages ​​are used.

Operator format in assembly language.

An assembly language program is a list of commands (statements, sentences), each of which occupies a separate line and contains four fields: a label field, an operation field, an operand field, and a comment field. Each field has a separate column.

Label field.

Column 1 is allocated for the label field. The label is a symbolic name, or identifier, addresses memory. It is necessary so that you can:

● make a conditional or unconditional transition to the command;

● gain access to the location where the data is stored.

Such statements are provided with a label. To indicate a name, (capital) letters of the English alphabet and numbers are used. The name must have a letter at the beginning and a colon separator at the end. The colon label can be written on a separate line, and the opcode can be written on the next line in column 2, which simplifies the compiler's work. The absence of a colon does not allow distinguishing a label from an operation code if they are located on separate lines.

In some versions of assembly language, colons are placed only after instruction labels, not after data labels, and the length of the label may be limited to 6 or 8 characters.

There should not be identical names in the label field, since the label is associated with command addresses. If during program execution there is no need to call a command or data from memory, then the label field remains empty.

Operation code field.

This field contains the mnemonic code for a command or pseudo-command (see below). The command mnemonic code is chosen by the language developers. In assembly language

mnemonic is selected to load a register from memory

), and to save the contents of the register in memory - a mnemonic

). In assembly languages

for both operations you can use the same name, respectively

If the choice of mnemonic names can be arbitrary, then the need to use two machine instructions is determined by the processor architecture

The mnemonics of registers also depends on the assembler version (Table 5.2.1).

Operand field.

Here is located Additional Information, necessary to perform the operation. In the operand field for jump commands, the address to which the jump needs to be made is indicated, as well as addresses and registers that are operands for the machine command. As an example, we give operands that can be used for 8-bit processors

● numerical data,

presented in different number systems. To indicate the number system used, the constant is followed by one of the Latin letters: B,

Accordingly, binary, octal, hexadecimal, decimal number systems (

You don't have to write it down). If the first digit of a hexadecimal number is A, B, C,

Then an insignificant 0 (zero) is added in front;

● codes of internal microprocessor registers and memory cells

M (sources or receivers of information) in the form of the letters A, B, C,

M or their addresses in any number system (for example, 10B - register address

in binary system);

● identifiers,

for register pairs of aircraft,

The first letters are B,

N; for a pair of accumulator and feature register -

; for the program counter -

;for the stack pointer -

● labels indicating the addresses of the operands or next instructions in the conditional

(if the condition is met) and unconditional transitions. For example, operand M1 in the command

means the need for an unconditional transition to the command, the address of which in the label field is marked with the identifier M1;

● expressions,

which are constructed by linking the data discussed above using arithmetic and logical operators. Note that the method for reserving data space depends on the language version. Assembly language developers for

Define the word), and later entered Alternative option.

which was in the language for processors from the very beginning

In language version

used

Define a constant).

Processors process operands of different lengths. To define it, assembler developers made different decisions, for example:

II registers of different lengths have different names: EAX - for placing 32-bit operands (type

); AX - for 16-bit (type

and AN - for 8-bit (type

● for processors

Suffixes are added to each operation code: suffix

For type

; suffix ".B" for type

different opcodes are used for operands of different lengths, for example, to load a byte, halfword (

) and words into a 64-bit register using opcodes

respectively.

Comments field.

This field provides explanations about the actions of the program. Comments do not affect the operation of the program and are intended for humans. They may be needed to modify a program, which without such comments may be completely incomprehensible even to experienced programmers. A comment begins with a symbol and is used to explain and document programs. The starting character of a comment can be:

● semicolon (;) in languages ​​for the company’s processors

Exclamation point(!) in languages ​​for

Each separate comment line is preceded by a leading character.

Pseudo-commands (directives).

In assembly language there are two main types of commands:

basic instructions that are the equivalent of processor machine code. These commands perform all the processing intended by the program;

pseudo-commands or directives, designed to service the process of translating a program into a code combination language. As an example in table. 5.2.2 shows some pseudo-commands from the assembler

for the family

.

When programming, there are situations when, according to the algorithm, the same chain of commands must be repeated many times. To get out of this situation you can:

● write the required sequence of commands whenever it appears. This approach leads to an increase in the volume of the program;

● arrange this sequence into a procedure (subroutine) and call it if necessary. This output has its drawbacks: each time you have to execute a special procedure call command and a return command, which, if the sequence is short and frequently used, can greatly reduce the speed of the program.

The simplest and effective method repeated repetition of a chain of commands consists of using macro, which can be represented as a pseudo-command designed to re-translate a group of commands often found in a program.

A macro, or macrocommand, is characterized by three aspects: macrodefinition, macroinversion and macroextension.

Macro definition

This is a designation for a repeatedly repeated sequence of program commands, used for references in the text of the program.

The macro definition has the following structure:

List of expressions; Macro definition

In the given structure of macro-definition, three parts can be distinguished:

● title

macro, including the name

Pseudo-command

and a set of parameters;

● marked with dots body macro;

● team

graduation

macro definitions.

The macro definition parameter set contains a list of all parameters given in the operand field for the selected group of instructions. If these parameters were given earlier in the program, then they do not need to be indicated in the macro definition header.

To reassemble the selected group of commands, an appeal consisting of the name is used

macro commands and list of parameters with other values.

When the assembler encounters a macro definition during the compilation process, it stores it in the macro definition table. At subsequent appearances in the program of the name (

) of a macro, the assembler replaces it with the body of the macro.

Using a macro name as an opcode is called macro-reversal(macro call), and replacing it with the body of the macro - macro expansion.

If a program is represented as a sequence of characters (letters, numbers, spaces, punctuation marks and carriage returns to move to a new line), then macro expansion consists of replacing some chains from this sequence with other chains.

Macro expansion occurs during the assembly process, not during program execution. Methods for manipulating strings of characters are assigned to macro means.

The assembly process is carried out in two passes:

● On the first pass, all macro definitions are preserved, and macro calls are expanded. In this case, the original program is read and converted into a program in which all macro definitions are removed, and each macro call is replaced by the body of the macro;

● the second pass processes the resulting program without macros.

Macros with parameters.

To work with repeated sequences of commands, the parameters of which can take different values, macro definitions are provided:

● with actual parameters that are placed in the operand field of the macro call;

● with formal parameters. During macro expansion, each formal parameter appearing in the body of the macro is replaced by the corresponding actual parameter.

using macros with parameters.

Program 1 contains two similar sequences of commands, differing in that the first one swaps P and

And the second

Program 2 includes a macro with two formal parameters P1 and P2. During macro expansion, each P1 character within the macro body is replaced by the first actual parameter (P,

), and the symbol P2 is replaced by the second actual parameter (

) from program No. 1. In the macro call

program 2 is marked: P,

The first actual parameter,

Second actual parameter.

Program 1

Program 2

MOV EBX,Q MOV EAX,Pl

MOV Q,EAX MOV EBX,P2

MOV P,EBX MOV P2,EAX

Extended capabilities.

Let's look at some advanced language features

If a macro containing a conditional jump command and a label to be jumped to is called two or more times, the label will be duplicated (duplicate label problem), which will cause an error. Therefore, each call assigns a separate label as a parameter (by the programmer). In language

the label is declared local (

) and thanks to advanced capabilities, the assembler automatically generates a different label each time the macro is expanded.

allows you to define macros inside other macros. This advanced feature is very useful in combination with conditional linking of a program. Let's consider

IF WORDSIZE GT 16 M2 MACRO

The M2 macro can be defined in both parts of the statement

However, the definition depends on which processor the program is assembled on: 16-bit or 32-bit. If M1 is not called, then macro M2 will not be defined at all.

Another advanced feature is that macros can call other macros, including themselves - recursive call. In the latter case, to avoid an endless loop, the macro must pass a parameter to itself that changes with each expansion, and also check this parameter and end the recursion when the parameter reaches a certain value.

On the use of macro means in assembler.

When using macros, the assembler must be able to perform two functions: save macro definitions And expand macro challenges.

Saving macro definitions.

All macro names are stored in a table. Each name is accompanied by a pointer to the corresponding macro so that it can be called if necessary. Some assemblers have a separate table for macro names, others have a general table in which, along with macro names, all machine instructions and directives are located.

When encountering a macro during assembly is created:

new table element with the name of the macro, the number of parameters and a pointer to another macro definition table where the body of the macro will be stored;

● list formal parameters.

The body of the macro, which is simply a string of characters, is then read and stored in the macro definition table. Formal parameters occurring in the body of the loop are marked special character.

Internal representation of a macro

from the example above for program 2 (p. 244) is:

MOV EAX, MOV EBX, MOV MOV &

where the semicolon is used as the carriage return character, and the ampersand & is used as the formal parameter character.

Extending macro calls.

Whenever a macro definition is encountered during assembly, it is stored in the macro table. When a macro is called, the assembler temporarily stops reading input data from the input device and begins reading the stored macro body. The formal parameters extracted from the macro body are replaced by actual parameters and provided by the call. The ampersand & before parameters allows the assembler to recognize them.

Despite the fact that there are many versions of assembler, the assembly processes have common features and are similar in many ways. The operation of a two-pass assembler is discussed below.

Two-pass assembler.

A program consists of a number of statements. Therefore, it would seem that when assembling, you can use the following sequence of actions:

● translate it into machine language;

● transfer the resulting machine code to a file, and the corresponding part of the listing to another file;

● repeat the listed procedures until the entire program is translated.

However, this approach is not effective. An example is the so-called problem forward link. If the first statement is a jump to statement P, located at the very end of the program, then the assembler cannot translate it. He must first determine the address of operator P, and to do this he must read the entire program. Each complete reading of the source program is called passage. Let's show how you can solve the lookahead link problem using two passes:

on the first pass you should collect and store all symbol definitions (including labels) in the table, and on the second pass, read and assemble each operator. This method is relatively simple, but a second pass through the original program requires additional time spent on I/O operations;

● on the first pass you should convert the program into an intermediate form and save it in a table, and perform the second pass not according to the original program, but according to the table. This method of assembly saves time, since the second pass does not perform I/O operations.

First pass.

First pass goal- build a symbol table. As noted above, another goal of the first pass is to preserve all macro definitions and expand calls as they appear. Consequently, both symbol definition and macro expansion occur in one pass. The symbol can be either label, or meaning, to which a specific name is assigned using the -you directive:

;Value - buffer size

By assigning meaning to symbolic names in the command label field, the assembler essentially specifies the addresses that each command will have during program execution. For this purpose, the assembler stores during the assembly process instruction address counter(

) as a special variable. At the beginning of the first pass, the value of the special variable is set to 0 and incremented after each command processed by the length of that command. As an example in table. 5.2.3 shows a program fragment indicating the length of commands and counter values. On the first pass, tables are generated symbolic names, directives And operation codes, and if necessary literal table. A literal is a constant for which the assembler automatically reserves memory. Let us immediately note that modern processors contain instructions with immediate addresses, so their assemblers do not support literals.

Symbol Name Table

contains one element for each name (Table 5.2.4). Each element of the symbolic name table contains the name itself (or a pointer to it), its numerical value, and sometimes some additional information, which may include:

● the length of the data field associated with the symbol;

● memory reallocation bits (which indicate whether the value of a symbol changes if the program is loaded at a different address than the assembler intended);

● information about whether the symbol can be accessed from outside the procedure.

Symbolic names are labels. They can be specified using operators (for example,

Directive table.

This table lists all the directives, or pseudo-commands, that are encountered when assembling a program.

Operation code table.

For each operation code, the table has separate columns: operation code designation, operand 1, operand 2, hexadecimal value of the operation code, command length and command type (Table 5.2.5). Operation codes are divided into groups depending on the number and type of operands. The command type determines the group number and specifies the procedure that is called to process all commands in that group.

Second pass.

Goal of the second pass- creation of an object program and printing, if necessary, of the assembly protocol; output information necessary for the linker to link procedures that were assembled at different times into one executable file.

In the second pass (as in the first), the lines containing the statements are read and processed one by one. The original operator and the output operator derived from it in hexadecimal object The code can be printed or placed in a buffer for later printing. After resetting the command address counter, the command is called next statement.

The source program may contain errors, for example:

the given symbol is not defined or is defined more than once;

● the opcode is represented by an invalid name (due to a typo), does not have enough operands, or has too many operands;

● no operator

Some assemblers can detect an undefined symbol and replace it. However, in most cases, when it encounters an error statement, the assembler displays an error message on the screen and attempts to continue the assembly process.

Articles dedicated to assembly language.

NATIONAL UNIVERSITY OF UZBEKISTAN NAMED AFTER MIRZO ULUGBEK

FACULTY OF COMPUTER TECHNOLOGY

On the topic: Semantic parsing of an EXE file.

Completed:

Tashkent 2003.

Preface.

Assembly language and command structure.

EXE file structure (semantic parsing).

COM file structure.

The principle of action and spread of the virus.

Disassembler.

Programs.

Preface

The profession of a programmer is amazing and unique. Nowadays, it is impossible to imagine science and life without the latest technology. Everything related to human activity cannot be done without computer technology. And this contributes to its high development and perfection. Although the development of personal computers began not so long ago, during this time colossal steps have been made in software products and these products will be widely used for a long time. The field of computer-related knowledge has undergone an explosion, as has the corresponding technology. If we do not take into account the commercial side, then we can say that there are no strangers in this area of ​​​​professional activity. Many people develop programs not for profit or income, but out of their own free will, out of passion. Of course, this should not affect the quality of the program, and in this business, so to speak, there is competition and demand for quality execution, stable work and meeting all modern requirements. Here it is also worth noting the appearance of microprocessors in the 60s, which came to replace a large number of lamp sets. There are some types of microprocessors that are very different from each other. These microprocessors differ from each other in their bit depth and built-in system commands. The most common ones are: Intel, IBM, Celeron, AMD, etc. All these processors are related to the advanced architecture of Intel processors. The spread of microcomputers caused a reconsideration of attitudes towards assembly language for two main reasons. First, programs written in assembly language require significantly less memory and execution time. Secondly, knowledge of assembly language and the resulting machine code provides an understanding of the machine's architecture, which is unlikely to be provided when working in a high-level language. Although most software professionals develop in high-level languages ​​such as Pascal, C or Delphi, which is easier when writing programs, the most powerful and effective software written entirely or partially in assembly language. High level languages ​​were designed to avoid special technical features specific computers. And assembly language, in turn, is designed for the specific specifics of the processor. Therefore, in order to write an assembly language program for a specific computer, you must know its architecture. These days, the view of the main software product is an EXE file. Considering positive sides This means that the author of the program can be confident in its integrity. But often this is far from the case. There is also a disassembler. Using a disassembler, you can find out interruptions and program codes. It will not be difficult for a person well versed in assembler to remake the entire program to his taste. Perhaps this is where the most insoluble problem arises - the virus. Why do people write a virus? Some ask this question with surprise, some with anger, but nevertheless there continue to be people who are interested in this task not from the point of view of causing any harm, but as an interest in system programming. Viruses are written by various reasons. Some people like system calls, others improve their knowledge of assembler. I will try to explain all this in my course work. It also says not only about the structure of the EXE file but also about the assembly language.

^ Assembly Language.

It is interesting to follow, from the time of the appearance of the first computers to the present day, the transformation of programmers’ ideas about assembly language.

Once upon a time, assembly was a language without which you could not make a computer do anything useful. Gradually the situation changed. More convenient means of communicating with a computer appeared. But, unlike other languages, assembler did not die; moreover, it could not do this in principle. Why? In search of an answer, let's try to understand what assembly language is in general.

In short, assembly language is a symbolic representation of machine language. All processes in a machine at the lowest hardware level are driven only by machine language commands (instructions). From this it is clear that, despite the common name, the assembly language is different for each type of computer. This also applies appearance programs written in assembly language and ideas that this language is a reflection of.

It is impossible to truly solve problems related to hardware (or even, moreover, dependent on hardware, such as increasing the speed of a program), without knowledge of assembler.

A programmer or any other user can use any high-level tools, even programs for constructing virtual worlds, and perhaps not even suspect that in fact the computer does not execute the commands of the language in which its program is written, but their transformed representation in the form of a boring and dull sequences of commands from a completely different language - machine language. Now let’s imagine that such a user has a non-standard problem or something just doesn’t work out. For example, his program must work with some unusual device or perform other actions that require knowledge of the operating principles of computer hardware. No matter how smart the programmer is, no matter how good the language in which he wrote his wonderful program, he cannot do without knowledge of assembler. And it is no coincidence that almost all high-level language compilers contain means of connecting their modules with assembler modules or support access to the assembly level of programming.

Of course, the time of computer generalists has already passed. As they say, you cannot embrace the immensity. But there is something in common, a kind of foundation on which any serious computer education is built. This is knowledge about the principles of computer operation, its architecture and assembly language as a reflection and embodiment of this knowledge.

A typical modern computer (i486 or Pentium based) consists of the following components (Figure 1).

Rice. 1. Computer and peripherals

Rice. 2. Block diagram personal computer

From the figure (Figure 1) it is clear that the computer is made up of several physical devices, each of which is connected to one unit, called the system unit. If we think logically, it is clear that it plays the role of some kind of coordinating device. Let's look inside the system unit (no need to try to get inside the monitor - there is nothing interesting there, and besides, it is dangerous): open the case and see some boards, blocks, connecting wires. To understand their functional purpose, let's look at the block diagram of a typical computer (Fig. 2). It does not claim absolute accuracy and is intended only to show the purpose, interconnection and typical composition of the elements of a modern personal computer.

Let's discuss the diagram in Fig. 2 in a somewhat unconventional style.
It is common for a person, when encountering something new, to look for some associations that can help him understand the unknown. What associations does the computer evoke? For example, I often associate a computer with the person himself. Why?

When a person created a computer, somewhere deep inside himself he thought that he was creating something similar to himself. The computer has organs for receiving information from the outside world - a keyboard, a mouse, and magnetic disk drives. In Fig. 2 these organs are located to the right of the system buses. The computer has organs that “digest” the information received - these are CPU and RAM. And finally, the computer has speech organs that produce the results of processing. These are also some of the devices on the right.

Modern computers, of course, is far from human. They can be compared to creatures that interact with the outside world at the level of a large but limited set of unconditioned reflexes.
This set of reflexes forms a system of machine commands. No matter how high a level you communicate with a computer, it ultimately comes down to a boring and monotonous sequence of machine commands.
Each machine command is a kind of stimulus to excite one or another unconditioned reflex. The reaction to this stimulus is always unambiguous and “hardwired” in the microcommand block in the form of a microprogram. This microprogram implements actions to implement a machine command, but at the level of signals supplied to certain logic computer, thereby controlling various subsystems of the computer. This is the so-called principle of microprogram control.

Continuing the analogy with a person, we note: in order for a computer to eat properly, many operating systems, compilers for hundreds of programming languages, etc. have been invented. But all of them are, in fact, just a platter on which food (programs) is delivered according to certain rules. stomach (computer). Only the computer's stomach loves diet, monotonous food - give it structured information, in the form of strictly organized sequences of zeros and ones, the combinations of which make up machine language.

Thus, although outwardly a polyglot, the computer understands only one language - the language of machine instructions. Of course, to communicate and work with a computer, it is not necessary to know this language, but almost any professional programmer sooner or later is faced with the need to study it. Fortunately, the programmer does not have to try to comprehend the meaning of various combinations of binary numbers, since back in the 50s, programmers began to use a symbolic analogue of machine language for programming, which was called assembly language. This language accurately reflects all the features of machine language. That is why, unlike high-level languages, assembly language is different for each type of computer.

From all of the above, we can conclude that since assembly language is “native” for a computer, the most effective program can only be written in it (provided that it is written by a qualified programmer). There is one small “but” here: this is a very labor-intensive process that requires a lot of attention and practical experience. Therefore, in reality, they mainly write programs in assembler that should provide effective work with hardware. Sometimes program sections that are critical in terms of execution time or memory consumption are written in assembler. Subsequently, they are formalized in the form of subroutines and combined with code in a high-level language.

It makes sense to start learning the assembly language of any computer only after finding out what part of the computer is left visible and accessible for programming in this language. This is the so-called computer program model, part of which is the microprocessor program model, which contains 32 registers, to one degree or another, available for use by the programmer.

These registers can be divided into two large groups:

^ 16 user registers;

16 system registers.

Assembly language programs use registers very intensively. Most registers have a specific functional purpose.

As the name suggests, user registers are called user registers because the programmer can use them when writing his programs. These registers include (Fig. 3):

Eight 32-bit registers that can be used by programmers to store data and addresses (also called general purpose registers (GPR)):

six segment registers: cs, ds, ss, es, fs, gs;

status and control registers:

Flags register eflags/flags;

Command pointer register eip/ip.

Rice. 3. User registers of i486 and Pentium microprocessors

Why are many of these registers shown with slashes? No, these are not different registers - they are parts of one large 32-bit register. They can be used in the program as separate objects. This was done to ensure the functionality of programs written for younger 16-bit models of Intel microprocessors, starting with i8086. The i486 and Pentium microprocessors have mostly 32-bit registers. Their number, with the exception of segment registers, is the same as that of the i8086, but the dimension is larger, which is reflected in their designations - they have
prefix e (Extended).

^ General purpose registers
All registers in this group allow you to access their “lower” parts (see Fig. 3). Looking at this figure, note that only the lower 16 and 8-bit parts of these registers can be used for self-addressing. The upper 16 bits of these registers are not available as independent objects. This was done, as we noted above, for compatibility with younger 16-bit models of Intel microprocessors.

Let us list the registers belonging to the group of general purpose registers. Since these registers are physically located in the microprocessor inside an arithmetic logic unit (ALU), they are also called ALU registers:

eax/ax/ah/al (Accumulator register) - battery.
Used to store intermediate data. Some commands require the use of this register;

ebx/bx/bh/bl (Base register) - base register.
Used to store the base address of some object in memory;

ecx/cx/ch/cl (Count register) - counter register.
Used in teams that perform some repetitive actions. Its use is often implicit and hidden in the algorithm of the corresponding command.
For example, the command for organizing a loop loop, in addition to transferring control to a command located at a certain address, analyzes and decreases the value of the ecx/cx register by one;

edx/dx/dh/dl (Data register) - data register.
Just like the eax/ax/ah/al register, it stores intermediate data. In some commands its use is mandatory; For some commands this happens implicitly.

The following two registers are used to support so-called chain operations, that is, operations that sequentially process chains of elements, each of which can be 32, 16 or 8 bits long:

esi/si (Source Index register) - source index.
This register in chained operations contains the current address of the element in the source chain;

edi/di (Destination Index register) - index of the receiver (recipient).
This register in chained operations contains the current address in the destination chain.

In the microprocessor architecture, a data structure such as a stack is supported at the hardware and software level. To work with the stack, there are special commands in the microprocessor instruction system, and in the microprocessor software model there are special registers for this:

esp/sp (Stack Pointer register) - stack pointer register.
Contains a pointer to the top of the stack in the current stack segment.

ebp/bp (Base Pointer register) - stack frame base pointer register.
Designed to organize random access to data inside the stack.

A stack is a program area for temporary storage of arbitrary data. Of course, data can also be stored in a data segment, but in this case, for each data temporarily stored, a separate named memory cell must be created, which increases the size of the program and the number of names used. The convenience of the stack lies in the fact that its area is reusable, and storing data on the stack and retrieving it from there is done using the effective push and pop commands without specifying any names.
The stack is traditionally used, for example, to save the contents of registers used by a program before calling a subroutine, which, in turn, will use the processor registers "for its own purposes." The original contents of the registers are popped off the stack after the subroutine returns. Another common technique is to pass the parameters it requires to a subroutine via the stack. The subroutine, knowing in what order the parameters are placed on the stack, can take them from there and use them during its execution. Distinctive feature The stack is a unique order in which the data contained in it is retrieved: at any given time, only the top element is available on the stack, i.e. the element most recently pushed onto the stack. Popping the top element from the stack makes the next element available. Stack elements are located in the memory area allocated for the stack, starting from the bottom of the stack (i.e., from its maximum address) at sequentially decreasing addresses. The address of the top, accessible element is stored in the stack pointer register SP. Like any other area of ​​program memory, the stack must be part of some segment or form a separate segment. In either case, the segment address of this segment is placed in the segment stack register SS. Thus, the pair of registers SS:SP describes the address of an accessible stack cell: SS stores the segment address of the stack, and SP stores the offset of the last data stored on the stack (Fig. 4, a). Note that in the initial state, the stack pointer SP points to a cell that lies under the bottom of the stack and is not included in it.

Fig 4. Stack organization: a - initial state, b - after loading one element (in this example, the contents of the AX register), c - after loading the second element (contents of the DS register), d - after unloading one element, e - after unloading two elements and return to their original state.

Loading into the stack is carried out by a special command for working with the stack (push). This instruction first decrements the contents of the stack pointer by 2 and then places the operand at the address in SP. If, for example, we want to temporarily store the contents of the AX register on the stack, we should run the command

The stack goes into the state shown in Fig. 1.10, b. It can be seen that the stack pointer is shifted up two bytes (towards lower addresses) and the operand specified in the push command is written to this address. The following stack loading command is e.g.

will put the stack into the state shown in Fig. 1.10, c. The stack will now store two elements, and only the top one, pointed to by the stack pointer SP, will be accessible. If after some time we need to restore the original contents of the registers stored on the stack, we must execute the pop (push) commands to unload from the stack:

pop DS
pop AX

How big should the stack be? It depends on how intensively it is used in the program. If, for example, you plan to store an array of 10,000 bytes on the stack, then the stack must be at least this size. It should be borne in mind that in some cases the stack is automatically used by the system, in particular, when executing the int 21h interrupt command. With this command, the processor first pushes the return address onto the stack, and then DOS pushes the contents of the registers and other information related to the interrupted program onto the stack. Therefore, even if a program does not use a stack at all, it must still be present in the program and be at least several dozen words in size. In our first example, we allocated 128 words to the stack, which is certainly enough.

^ Structure of an assembler program

An assembly language program is a collection of blocks of memory called memory segments. A program may consist of one or more such block segments. Each segment contains a collection of language sentences, each of which occupies a separate line of program code.

There are four types of assembler statements:

commands or instructions that are symbolic analogues of machine commands. During the translation process, assembler instructions are converted into the corresponding commands of the microprocessor instruction set;

macrocommands - sentences of program text formatted in a certain way, replaced during broadcast by other sentences;

directives, which are instructions to the assembler translator to perform certain actions. Directives have no counterparts in machine representation;

comment lines containing any characters, including letters of the Russian alphabet. Comments are ignored by the translator.

^ Assembly syntax

The sentences that make up a program can be a syntactic construct corresponding to a command, macro, directive, or comment. In order for the assembler translator to recognize them, they must be formed according to certain syntactic rules. To do this, it is best to use a formal description of the syntax of the language, like the rules of grammar. The most common ways to describe a programming language in this way are syntax diagrams and extended Backus-Naur forms. For practical use syntax diagrams are more convenient. For example, the syntax of assembly language statements can be described using the syntax diagrams shown in the following figures.

Rice. 5. Assembly sentence format

Rice. 6. Directive format

Rice. 7. Format of commands and macros

In these pictures:

label name - an identifier whose value is the address of the first byte of the sentence in the source code of the program that it designates;

name - an identifier that distinguishes this directive from other directives of the same name. As a result of the assembler's processing of a particular directive, certain characteristics may be assigned to that name;

an operation code (OPC) and a directive are mnemonic symbols for the corresponding machine instruction, macro instruction or translator directive;

operands are parts of a command, macro, or assembler directive that designate the objects on which actions are performed. Assembly language operands are described by expressions with numeric and text constants, labels and variable identifiers using operator signs and some reserved words.

^ How to use syntax diagrams? It's very simple: all you need to do is find and then follow the path from the diagram's input (on the left) to its output (on the right). If such a path exists, then the sentence or construction is syntactically correct. If there is no such path, then the compiler will not accept this construction. When working with syntax diagrams, pay attention to the direction of the traversal indicated by the arrows, since among the paths there may be some that can be followed from right to left. In essence, syntax diagrams reflect the logic of the translator's operation when parsing the input sentences of the program.

Acceptable characters when writing program text are:

All letters: A-Z, a-z. In this case, uppercase and lowercase letters are considered equivalent;

Numbers from 0 to 9;

Signs ?, @, $, _, &;

Separators, . ()< > { } + / * % ! " " ? \ = # ^.

Assembly language sentences are formed from lexemes, which are syntactically inseparable sequences of valid language symbols that make sense to the translator.

The lexemes are:

identifiers are sequences of valid characters used to designate program objects such as operation codes, variable names, and label names. The rule for writing identifiers is as follows: an identifier can consist of one or more characters. As symbols you can use letters of the Latin alphabet, numbers and some special characters - _, ?, $, @. An identifier cannot begin with a digit character. The length of the identifier can be up to 255 characters, although the translator accepts only the first 32 and ignores the rest. You can adjust the length of possible identifiers using the option command line mv. In addition, it is possible to instruct the translator to distinguish between upper and lowercase letters or to ignore their difference (which is done by default).

^Assembler commands.

Assembler commands reveal the ability to transfer your requirements to the computer, a mechanism for transferring control in a program (cycles and transitions) for logical comparisons and program organization. However, programmable tasks are rarely that simple. Most programs contain a series of loops in which several commands are repeated until a certain requirement is achieved, and various checks that determine which of several actions should be performed. Some instructions can transfer control by changing the normal sequence of steps by directly modifying the offset value in the instruction pointer. As mentioned earlier, there are different commands for different processors, but we will look at a number of some commands for the 80186, 80286 and 80386 processors.

To describe the state of the flags after executing a certain command, we will use a selection from a table reflecting the structure of the eflags flag register:

The bottom row of this table shows the values ​​of the flags after the command is executed. The following notations are used:

1 - after the command is executed, the flag is set (equal to 1);

0 - after the command is executed, the flag is reset (equal to 0);

r - the value of the flag depends on the result of the command;

After the command is executed, the flag is not defined;

space - after the command is executed, the flag does not change;

The following notation is used to represent operands in syntax diagrams:

r8, r16, r32 - an operand in one of the registers of byte size, word or double word;

m8, m16, m32, m48 - memory operand size byte, word, double word or 48 bits;

i8, i16, i32 - immediate operand size byte, word or double word;

a8, a16, a32 - relative address (offset) in the code segment.

Commands (in alphabetical order):

*These commands are described in detail.

ADD
(ADDition)

Addition

^ Command diagram:

add destination, source

Purpose: addition of two source and destination operands of size byte, word or double word.

Work algorithm:

add the source and destination operands;

write the addition result to the receiver;

set flags.

State of flags after command execution:

Application:
The add command is used to add two integer operands. The result of the addition is placed at the address of the first operand. If the result of the addition goes beyond the boundaries of the receiver operand (an overflow occurs), then this situation should be taken into account by analyzing the cf flag and the subsequent possible use of the adc command. For example, let's add the values ​​in the ax register and the ch memory area. When adding, take into account the possibility of overflow.

Register plus register or memory:

|000000dw|modregr/rm|

AX register (AL) plus immediate value:

|0000010w|--data--|data if w=1|

Register or memory plus immediate value:

|100000sw|mod000r/m|--data--|data if BW=01|

CALL
(CALL)

Calling a procedure or task

^ Command diagram:

Purpose:

transferring control to a near or far procedure with storing the address of the return point on the stack;

switching tasks.

Work algorithm:
determined by the operand type:

Near label - the contents of the eip/ip command pointer are pushed onto the stack and the new address value corresponding to the label is loaded into the same register;

Far label - the contents of the eip/ip and cs command pointer are pushed onto the stack. Then new address values ​​corresponding to the far label are loaded into the same registers;

R16, 32 or m16, 32 - define a register or memory cell containing offsets in the current instruction segment to which control is transferred. When control is transferred, the contents of the eip/ip command pointer are pushed onto the stack;

Memory pointer - defines a memory location containing a 4 or 6 byte pointer to the called procedure. The structure of such a pointer is 2+2 or 2+4 bytes. The interpretation of such a pointer depends on the operating mode of the microprocessor:

^ State of flags after command execution (except task switching):

executing the command does not affect the flags

When a task is switched, the flag values ​​are changed according to information about the eflags register in the TSS status segment of the task being switched to.
Application:
The call command allows you to organize a flexible and multi-variant transfer of control to a subroutine while preserving the address of the return point.

Object code (four formats):

Direct addressing in a segment:

|11101000|disp-low|diep-high|

Indirect addressing in a segment:

|11111111|mod010r/m|

Indirect addressing between segments:

|11111111|mod011r/m|

Direct addressing between segments:

|10011010|offset-low|offset-high|seg-low|seg-high|

CMP
(CoMPare operands)

Operand comparison

^ Command diagram:

cmp operand1,operand2

Purpose: comparison of two operands.

Work algorithm:

perform subtraction(operand1-operand2);

depending on the result, set the flags, do not change operand1 and operand2 (that is, do not remember the result).

Application:
This command used to compare two operands by subtraction without changing the operands. Based on the results of the command, flags are set. The cmp command is used with the conditional jump commands and the set byte by value command setcc.

Object code (three formats):

Register or memory with register:

|001110dw|modregr/m|

Immediate value with AX (AL) register:

|0011110w|--data--|data if w=1|

Immediate value with register or memory:

|100000sw|mod111r/m|--data--|data if sw=0|

DEC
(DECrement operand by 1)

Decreasing an operand by one

^ Command diagram:

dec operand

Purpose: Decrease the value of an operand in memory or register by 1.

Work algorithm:
the command subtracts 1 from the operand. State of flags after command execution:

Application:
The dec instruction is used to decrement the value of a byte, word, double word in memory or register by one. However, note that the command does not affect the cf flag.

Register: |01001reg|

^ Register or memory: |1111111w|mod001r/m|

DIV
(DIVide unsigned)

Unsigned division

Team outline:

div divider

Purpose: Perform a division operation between two unsigned binary values.

^ Operating algorithm:
The command requires two operands - the dividend and the divisor. The dividend is specified implicitly and its size depends on the size of the divisor, which is specified in the command:

if the divisor is a byte in size, then the dividend must be located in the ax register. After the operation, the quotient is placed in al and the remainder in ah;

if the divisor is a word in size, then the dividend must be located in the register pair dx:ax, with the low-order part of the dividend located in ax. After the operation, the quotient is placed in ax and the remainder in dx;

if the divisor is a double word in size, then the dividend must be located in the register pair edx:eax, with the low-order part of the dividend located in eax. After the operation, the quotient is placed in eax and the remainder in edx.

^ State of flags after command execution:

Application:
The command performs an integer division of the operands, producing the result of the division as the quotient and the remainder of the division. When performing a division operation, an exception may occur: 0 - division error. This situation occurs in one of two cases: the divisor is 0 or the quotient is too large to fit into the eax/ax/al register.

Object code:

|1111011w|mod110r/m|

INT
(INTerrupt)

Calling the interrupt service routine

^ Command diagram:

int interrupt_number

Purpose: call the interrupt service routine with the interrupt number specified by the command operand.

^ Operating algorithm:

push the flags register eflags/flags and the return address onto the stack. When writing a return address, the contents of the segment register cs are written first, then the contents of the command pointer eip/ip;

reset the if and tf flags to zero;

transfer control to the interrupt service program with the specified number. The control transfer mechanism depends on the operating mode of the microprocessor.

^ State of flags after command execution:

Application:
As you can see from the syntax, there are two forms of this command:

int 3 - has its own individual operation code 0cch and occupies one byte. This circumstance makes it very convenient for use in various software debuggers to set breakpoints by replacing the first byte of any command. The microprocessor, encountering a command with operation code 0cch in the sequence of commands, calls the interrupt processing program with vector number 3, which serves to communicate with the software debugger.

The second form of the command occupies two bytes, has an opcode of 0cdh and allows you to initiate a call to an interrupt service routine with a vector number in the range 0–255. Features of control transfer, as noted, depend on the operating mode of the microprocessor.

Object code (two formats):

Register: |01000reg|

^ Register or memory: |1111111w|mod000r/m|

J.C.C.
JCXZ/JECXZ
(Jump if condition)

(Jump if CX=Zero/ Jump if ECX=Zero)

Jump if condition is met

Jump if CX/ECX is zero

^ Command diagram:

jcc label
jcxz label
jecxz label

Purpose: transition within the current command segment depending on some condition.

^ Command algorithm (except jcxz/jecxz):
Checking the state of the flags depending on the opcode (it reflects the condition being checked):

if the condition being tested is true, then go to the cell indicated by the operand;

if the condition being checked is false, then transfer control to the next command.

Algorithm for the jcxz/jecxz command:
Checking the condition that the contents of the ecx/cx register are equal to zero:

if the condition being checked

Command structure in assembly language Programming at the level of machine commands is the minimum level at which computer programming is possible. The machine command system must be sufficient to implement the required actions by issuing instructions to the machine equipment. Each machine instruction consists of two parts: an operational one, which determines “what to do” and an operand, which determines the processing objects, that is, “what to do”. A microprocessor machine instruction, written in Assembly language, is a single line that has the following form: label command/directive operand(s); comments The label, command/directive, and operand are separated by at least one space or tab character. The operands of the command are separated by commas.

Assembly Language Command Structure An assembler command tells the translator what action the microprocessor should perform. Assembly directives are parameters specified in the program text that affect the assembly process or the properties of the output file. The operand specifies the initial value of the data (in the data segment) or the elements on which the command action is performed (in the code segment). An instruction may have one or two operands, or no operands. The number of operands is implicitly specified by the instruction code. If a command or directive needs to be continued on the next line, the backslash character is used: "" . By default, assembler does not distinguish between upper and lower case letters when writing commands and directives. Examples of directives and commands Count db 1 ; Name, directive, one operand mov eax, 0 ; Command, two operands

Identifiers are sequences of valid characters used to denote variable names and label names. The identifier may consist of one or more of the following characters: all letters of the Latin alphabet; numbers from 0 to 9; special characters: _, @, $, ? . A dot can be used as the first character of the label. Reserved assembler names (directives, operators, command names) cannot be used as identifiers. The first character of the identifier must be a letter or a special character. Maximum length The identifier has 255 characters, but the translator accepts the first 32 and ignores the rest. All labels that are written on a line that does not contain an assembler directive must end with a colon ":". The label, command (directive), and operand do not have to start at any particular position in the line. It is recommended to write them in a column for greater readability of the program.

Labels All labels that are written on a line that does not contain an assembler directive must end with a colon ":". The label, command (directive), and operand do not have to start at any particular position in the line. It is recommended to write them in a column for greater readability of the program.

Comments Using comments in a program improves its clarity, especially where the purpose of a set of commands is unclear. Comments begin on any line in the source module with a semicolon (;). All characters to the right of "; " to the end of the line are a comment. A comment can contain any printable characters, including space. A comment can span the entire line or follow a command on the same line.

Assembly Language Program Structure A program written in assembly language can consist of several parts called modules, each of which can define one or more data, stack, and code segments. Any complete program in assembly language must include one main, or main, module from which its execution begins. The module may contain program segments, data and stack segments declared using appropriate directives.

Memory models Before declaring segments, you need to specify the memory model using a directive. MODEL modifier memory_model, calling_convention, OS_type, stack_parameter Basic memory models of assembly language: Memory model Code addressing Data addressing operating system Interleaving code and data TINY NEAR MS-DOS Acceptable SMALL NEAR MS-DOS, Windows No MEDIUM FAR NEAR MS-DOS, Windows No COMPACT NEAR FAR MS-DOS, Windows No LARGE FAR MS-DOS, Windows No HUGE FAR MS-DOS, Windows No NEAR Windows 2000, Windows XP, Windows Acceptable FLAT NEAR NT,

Memory Models The tiny model only works in 16-bit MS-DOS applications. In this model, all data and code are located on one physical segment. The size of the program file in this case does not exceed 64 KB. The small model supports one code segment and one data segment. Data and code are addressed as near when using this model. The medium model supports multiple code segments and one data segment, with all references in code segments considered far by default, and references in a data segment considered near. The compact model supports several data segments that use far data addressing (far), and one code segment that uses near addressing (near). The large model supports multiple code segments and multiple data segments. By default, all references to code and data are considered far. The huge model is almost equivalent to the large memory model.

Memory models The flat model assumes an unsegmented program configuration and is used only in 32-bit operating systems. This model is similar to the tiny model in that the data and code are located in a single segment, but it is 32-bit. To develop a program for the flat model before the directive. model flat should place one of the directives: . 386, . 486, . 586 or. 686. The choice of processor selection directive determines the set of instructions available when writing programs. The letter p after the processor selection directive means protected operating mode. Data and code addressing is near, with all addresses and pointers being 32-bit.

Memory models. MODEL modifier memory_model, calling_convention, OS_type, stack_parameter The modifier parameter is used to define segment types and can take the following values: use 16 (segments of the selected model are used as 16-bit) use 32 (segments of the selected model are used as 32-bit). The calling_convention parameter is used to determine the method of passing parameters when calling a procedure from other languages, including high-level languages ​​(C++, Pascal). The parameter can take the following values: C, BASIC, FORTRAN, PASCAL, SYSCALL, STDCALL.

Memory models. MODEL modifier memory_model, calling_convention, OS_type, stack_parameter The OS_type parameter is OS_DOS by default, and on this moment this is the only supported value for this parameter. The stack_parameter parameter is set to: NEARSTACK (SS register is equal to DS, data and stack areas are located in the same physical segment) FARSTACK (SS register is not equal to DS, data and stack areas are located in different physical segments). The default value is NEARSTACK.

An example of a program that does nothing. 686 P. MODEL FLAT, STDCALL. DATA. CODE START: RET END START RET - microprocessor command. It ensures that the program terminates correctly. The rest of the program concerns the operation of the translator. . 686 P - Pentium 6 (Pentium II) protected mode commands are allowed. This directive selects the supported set of assembler instructions, indicating the processor model. . MODEL FLAT, stdcall - flat memory model. This memory model is used in the Windows operating system. stdcall - the procedure calling convention used.

An example of a program that does nothing. 686 P. MODEL FLAT, STDCALL. DATA. CODE START: RET END START. DATA is a program segment containing data. This program doesn't use the stack, so the segment. STACK is missing. . CODE is a program segment containing code. START - label. END START - the end of the program and a message to the compiler that program execution should begin with the START label. Every program must contain an END directive to mark the end source code programs. All lines that follow the END directive are ignored. The label specified after the END directive tells the translator the name of the main module from which program execution begins. If the program contains one module, the label after the END directive can be omitted.

Assembly language translators Translator - program or technical means, which converts a program represented in one of the programming languages ​​into a program in the target language, called object code. In addition to supporting machine instruction mnemonics, each translator has its own set of directives and macro tools, often incompatible with anything else. The main types of assembly language translators: MASM (Microsoft Assembler), TASM (Borland Turbo Assembler), FASM (Flat Assembler) - a freely distributed multi-pass assembler written by Tomasz Gryshtar (Polish), NASM (Netwide Assembler) - a free assembler for the Intel x architecture 86, was created by Simon Tatham with Julian Hall and is currently being developed by a small team of developers at Source. Forge. net.

Src="https://present5.com/presentation/-29367016_63610977/image-15.jpg" alt="Translating a program in Microsoft Visual Studio 2005 1) Create a project by selecting the File->New->Project menu And"> Трансляция программы в Microsoft Visual Studio 2005 1) Создать проект, выбрав меню File->New->Project и указав имя проекта (hello. prj) и тип проекта: Win 32 Project. В дополнительных опциях мастера проекта указать “Empty Project”.!}

Src="https://present5.com/presentation/-29367016_63610977/image-16.jpg" alt="Translating the program in Microsoft Visual Studio 2005 2) In the project tree (View->Solution Explorer) add"> Трансляция программы в Microsoft Visual Studio 2005 2) В дереве проекта (View->Solution Explorer) добавить файл, в котором будет содержаться текст программы: Source. Files->Add->New. Item.!}

Translating the program into Microsoft Visual Studio 2005 3) Select the Code C++ file type, but specify the name with extension. asm:

Translating the program into Microsoft Visual Studio 2005 5) Set compiler parameters. Right-click the Custom Build Rules menu in the project file...

Translate the program into Microsoft Visual Studio 2005 and select Microsoft Macro Assembler in the window that appears.

Translation of the program in Microsoft Visual Studio 2005 Check with the right button in the file hello. asm project tree of the Properties menu and install General->Tool: Microsoft Macro Assembler.

Src="https://present5.com/presentation/-29367016_63610977/image-22.jpg" alt="Translating the program in Microsoft Visual Studio 2005 6) Compile the file by selecting Build->Build hello. prj."> Трансляция программы в Microsoft Visual Studio 2005 6) Откомпилировать файл, выбрав Build->Build hello. prj. 7) Запустить программу, нажав F 5 или выбрав меню Debug->Start Debugging.!}

Programming in Windows OS Programming in Windows OS is based on the use of API functions (Application Program Interface, i.e., software application interface). Their number reaches 2000. The Windows program largely consists of such calls. All interactions with external devices and operating system resources occurs, as a rule, through such functions. operating room Windows system uses a flat memory model. The address of any memory cell will be determined by the contents of one 32-bit register. There are 3 types of program structures for Windows: dialog (the main window is dialog), console or windowless structure, classic structure (windowed, frame).

Call Windows functions API In the help file, any API function is presented as type function_name (FA 1, FA 2, FA 3) Type – return value type; FAx – a list of formal arguments in the order they appear. For example, int Message. Box(HWND h. Wnd, LPCTSTR lp. Text, LPCTSTR lp. Caption, UINT u. Type); This function displays a window with a message and an exit button (or buttons). Meaning of parameters: h. Wnd is a descriptor of the window in which the message window will appear, lp. Text - text that will appear in the window, lp. Caption - text in the window title, u. Type - window type; in particular, you can determine the number of exit buttons.

Calling Windows API int Message functions. Box(HWND h. Wnd, LPCTSTR lp. Text, LPCTSTR lp. Caption, UINT u. Type); Almost all API function parameters are actually 32-bit integers: HWND is a 32-bit integer, LPCTSTR is a 32-bit pointer to a string, UINT is a 32-bit integer. The suffix "A" is often added to the function name to move to newer versions of the function.

Calling Windows API int Message functions. Box(HWND h. Wnd, LPCTSTR lp. Text, LPCTSTR lp. Caption, UINT u. Type); When using MASM, you must add @N N to the end of the name - the number of bytes that the passed arguments occupy on the stack. For Win 32 API functions, this number can be defined as the number of arguments n multiplied by 4 (bytes in each argument): N=4*n. To call a function, use the assembler CALL instruction. In this case, all function arguments are passed to it via the stack (PUSH command). Direction of passing arguments: LEFT TO RIGHT - BOTTOM UP. The argument u will be pushed onto the stack first. Type. The call to the specified function will look like this: CALL Message. Box. A@16

Calling Windows API int Message functions. Box(HWND h. Wnd, LPCTSTR lp. Text, LPCTSTR lp. Caption, UINT u. Type); The result of executing any API function is usually an integer that is returned in the EAX register. The OFFSET directive represents an "offset in a segment", or, translated into high-level language terms, a "pointer" to the beginning of a line. The EQU directive, like #define in SI, defines a constant. The EXTERN directive tells the translator that the function or identifier is external to this module.

Example of a “Hello everyone!” program . 686 P. MODEL FLAT, STDCALL. STACK 4096. DATA MB_OK EQU 0 STR 1 DB "My first program", 0 STR 2 DB "Hello everyone!", 0 HW DD ? EXTERN Message. Box. A@16: NEAR. CODE START: PUSH MB_OK PUSH OFFSET STR 1 PUSH OFFSET STR 2 PUSH HW CALL Message. Box. A@16 RET END START

The INVOKE directive The MASM language translator also allows you to simplify function calls using a macro tool - the INVOKE directive: INVOKE function, parameter1, parameter2, ... There is no need to add @16 to the function call; parameters are written exactly in the order in which they are given in the function description. By macro means of the translator, parameters are placed on the stack. To use the INVOKE directive, you must have a description of the function prototype using the PROTO directive in the form: Message. Box. A PROTO: DWORD, : DWORD If a program uses many Win 32 API functions, it is advisable to use the include directive C: masm 32includeuser 32. inc

Topic 2.5 Basics of processor programming

As the length of the program increases, it becomes increasingly difficult to remember the codes of various operations. Mnemonics provide some assistance in this regard.

The symbolic command coding language is called assembler.

Assembly language is a language in which each utterance corresponds to exactly one machine command.

Assembly called converting a program from assembly language, i.e. preparing a program in machine language by replacing symbolic names of operations with machine codes, and symbolic addresses with absolute or relative numbers, as well as incorporating library programs and generating sequences of symbolic instructions by specifying specific parameters in micro-teams. This program is usually located in ROM or entered into RAM from some external media.

Assembly language has several features that distinguish it from high-level languages:

1. This is a one-to-one correspondence between assembly language statements and machine instructions.

2. An assembly language programmer has access to all objects and instructions present on the target machine.

Understanding the basics of programming in machine-oriented languages ​​is useful for:



Better understanding of PC architecture and more competent use of computers;

To develop more rational structures of algorithms for programs for solving applied problems;

The ability to view and correct executable programs with the extension .exe and .com, compiled from any high-level languages, in case of loss of the source programs (by calling the specified programs in the DEBUG program debugger and decompiling their display in assembly language);

Compiling programs for solving the most critical problems (a program written in a machine-oriented language is usually more effective - shorter and faster by 30-60 percent of programs obtained as a result of translation from high-level languages)

To implement procedures included in the main program in the form of separate fragments in the event that they cannot be implemented either in the high-level language used or using OS service procedures.

A program in assembly language can only run on one family of computers, while a program written in a high-level language can potentially run on different machines.

The assembly language alphabet is made up of ASCII characters.

Numbers are only integers. There are:

Binary numbers end with the letter B;

Decimal numbers ending with the letter D;

Hexadecimal numbers end with the letter H.

RAM, registers, data presentation

For a certain series of MPs, an individual programming language is used - assembly language.

Assembly language occupies an intermediate position between machine codes and high-level languages. Programming in this language is easier. A program in assembly language makes more efficient use of the capabilities of a specific machine (more precisely, an MP) than a program in a high-level language (which is simpler for a programmer than assembler). Let's look at the basic principles of programming in machine-oriented languages ​​using the example of assembly language for the MP KR580VM80. A general methodology is used to program in the language. Specific technical techniques for recording programs are associated with the features of the architecture and command system of the target MP.

Software model microprocessor system based on MP KR580VM80

Software model of the MPS in accordance with Figure 1

MP Ports Memory

S Z A.C. P C

Picture 1

From the programmer's point of view, the MP KR580VM80 has the following program-accessible registers.

A– 8-bit accumulator register. It is the main register of the MP. Any operation performed in an ALU involves placing one of the operands to be processed in the accumulator. The result of an operation in the ALU is also usually stored in A.

B, C, D, E, H, L– 8-bit general purpose registers (GPR). Inner memory MP. Designed to store processed information, as well as the results of the operation. When processing 16-bit words, registers form pairs BC, DE, HL, and the double register is called the first letter - B, D, H. In a register pair, the highest is the first register. Registers H and L have a special property, used both for storing data and for storing 16-bit addresses of RAM cells.

FL– flag register (sign register) 8-bit register in which five signs of the result of performing arithmetic and logical operations in the MP are stored. FL format according to the picture

Bit C (CY - carry) - carry, set to 1 if there was a carry from the high order of the byte when performing arithmetic operations.

Bit P (parity) – parity, set to 1 if the number of ones in the bits of the result is even.

The AC digit is an additional carry, designed to store the carry value from the low-order tetrad of the result.

Bit Z (zero) – set to 1 if the result of the operation is 0.

Bit S (sign) – is set to 1 if the result is negative, and to 0 if the result is positive.

SP– stack pointer, a 16-bit register, designed to store the address of the memory cell where the last byte inserted onto the stack was written.

RS– program counter (program counter), a 16-bit register, designed to store the address of the next instruction to be executed. The contents of the program counter are automatically incremented by 1 immediately after fetching the next instruction byte.

In the initial memory area of ​​address 0000Н – 07FF there is control program and demonstration programs. This is the ROM area.

0800 – 0AFF - address area for recording the programs under study. (RAM).

0В00 – 0ВВ0 - address area for writing data. (RAM).

0ВВ0 – starting address of the stack. (RAM).

A stack is a specially organized area of ​​RAM intended for temporary storage of data or addresses. The last number written to the stack is popped first. The stack pointer stores the address of the last stack cell in which information is written. When a subroutine is called, the return address to the main program is automatically stored on the stack. As a rule, at the beginning of each subroutine the contents of all registers involved in its execution are saved on the stack, and at the end of the subroutine they are restored from the stack.

Data format and command structure of assembly language

The memory of the MP KR580VM80 is an array of 8-bit words called bytes. Each byte has its own 16-bit address, which determines its position in the sequence of memory cells. The MP can address 65536 bytes of memory, which can be contained in both ROM and RAM.

Data Format

Data is stored in memory as 8-bit words:

D7 D6 D5 D4 D3 D2 D1 D0

The least significant bit is bit 0, the most significant bit is bit 7.

A command is characterized by its format, i.e., the number of bits allocated for it, which are divided byte-by-byte into certain functional fields.

Command Format

MP KR580VM80 commands have one, two or three byte format. Multibyte commands must be placed in adjacent languages. The command format depends on the specifics of the operation being performed.

The first byte of the command contains the operation code, written in mnemonic form.

It determines the command format and the actions that must be performed by the MP on the data during its execution, and the addressing method, and may also contain information about the location of the data.

The second and third bytes may contain data on which operations are performed, or addresses indicating the location of the data. The data on which actions are performed are called operands.

Single-byte command format according to Figure 2

Figure 4

In assembly language commands, the operation code has a shortened form of writing English words - a mnemonic notation. Mnemonics (from the Greek mnemonic - the art of memorization) makes it easier to remember commands by their functional purpose.

Before execution, the source program is translated using a translation program called assembler into the language of code combinations - machine language, in this form it is placed in the memory of the MP and is then used when executing the command.


Addressing methods

All operand codes (input and output) must be located somewhere. They can be located in the internal registers of the MP (the most convenient and quick option). They can be located in system memory(the most common option). Finally, they can be located in I/O devices (the rarest case). The location of the operands is determined by the instruction code. Exist different methods, with which the instruction code can determine where to take the input operand and where to place the output operand. These methods are called addressing methods.

For MP KR580VM80, the following addressing methods exist:

Direct;

Register;

Indirect;

Stacked.

Direct addressing assumes that the (input) operand is located in memory immediately after the instruction code. The operand is usually a constant that needs to be sent somewhere, added to something, etc. the data is contained in the second or second and third bytes of the command, with the low byte of data located in the second byte of the command, and the high byte in the third command byte.

Straight (aka absolute) addressing assumes that the operand (input or output) is located in memory at the address, the code of which is located inside the program immediately after the instruction code. Used in three-byte commands.

Register addressing assumes that the operand (input or output) is in the internal register of the MP. Used in single-byte commands

Indirect (implicit) addressing assumes that the internal register of the MP contains not the operand itself, but its address in memory.

Stack addressing assumes that the command does not contain an address. Addressing memory cells using the contents of the 16-bit SP register (stack pointer).

Command system

The MP command system is a complete list of elementary actions that the MP is capable of performing. The MP controlled by these commands performs simple actions, such as elementary arithmetic and logical operations, data transfer, comparison of two values, etc. The number of commands of the MP KR580VM80 is 78 (taking into account modifications 244).

The following groups of commands are distinguished:

Data transmission;

Arithmetic;

Brain teaser;

Jump commands;

Input/output, control and stack commands.


Symbols and abbreviations used when describing commands and composing programs

Symbol Reduction
ADDR 16-bit address
DATA 8-bit data
DATA 16 16-bit data
PORT 8-bit I/O device address
BYTE 2 Second byte of the command
BYTE 3 Third byte of command
R, R1, R2 One of the registers: A, B, C, D, E, H, L
R.P. One of the register pairs: B - specifies the pair BC; D - specifies a DE pair; H – specifies the HL pair
RH First register of the pair
R.L. Second register of the pair
Λ Logical multiplication
V Logical addition
Addition modulo two
M A memory cell whose address specifies the contents of the register pair HL, i.e. M = (HL)



Top