What what is the power of alphabet information. Information volume of the text and units of measurement of information. Methods for measuring information in electronic form

PROBLEM SOLVING

When storing and transmitting information using technical devices information should be considered as a sequence of symbols - signs (letters, numbers, color codes of image points, etc.).

A set of symbols of a sign system (alphabet) can be considered as various possible states (events).
Then, if we assume that the appearance of symbols in a message is equally probable, the number of possible events N can be calculated as N=2i
Amount of information in a message I can be calculated by multiplying the number of characters K per information weight of one character i
So, we have the formulas needed to determine the amount of information in the alphabetical approach:

The following combinations of known (Given) and sought (Find) quantities are possible:

TypeGivenFindFormula
1 i N N=2i
2 N i
3 i,K I I=K*i
4 i,I K
5 I, K i
6 N, K I Both formulas
7 N, I K
8 I, K N

If we add to these problems tasks on the ratio of quantities written in different units of measurement, using the representation of quantities in the form of powers of two, we get 9 types of problems.
Let's consider tasks of all types. Let's agree that when moving from one unit of information measurement to another, we will build a chain of values. Then the probability of a computational error decreases.

Problem 1. A message has been received with an information volume of 32 bits. What is this volume in bytes?

Solution: There are 8 bits in one byte. 32:8=4
Answer: 4 bytes.

Problem 2. The volume of the information message is 12582912 bits, expressed in kilobytes and megabytes.

Solution: Since 1Kbyte=1024 bytes=1024*8 bits, then 12582912:(1024*8)=1536 Kbytes and
since 1 MB = 1024 KB, then 1536: 1024 = 1.5 MB
Answer: 1536KB and 1.5MB.

Task 3. The computer has RAM 512 MB. The number of bits corresponding to this value is greater:

1) 10,000,000,000bits 2) 8,000,000,000bits 3) 6,000,000,000bits 4) 4,000,000,000bits Solution: 512*1024*1024*8 bits=4294967296 bits.
Answer: 4.

Task 4. Determine the number of bits in two megabytes, using only powers of 2 for numbers.
Solution: Since 1 byte = 8 bits = 2 3 bits, and 1 MB = 2 10 KB = 2 20 bytes = 2 23 bits. Hence, 2MB = 2 24 bits.
Answer: 2 24 bits.

Task 5. How many megabytes of information does a 2 23 bit message contain?
Solution: Since 1 byte = 8 bits = 2 3 bits, then
2 23 bits=2 23 *2 23 *2 3 bits=2 10 2 10 bytes=2 10 KB=1MB.
Answer: 1MB

Task 6. One character of the alphabet “weighs” 4 bits. How many characters are in this alphabet?
Solution:
Given:


Answer: 16

Task 7. Each character of the alphabet is written using 8 digits of binary code. How many characters are in this alphabet?
Solution:
Given:


Answer: 256

Task 8. The Russian alphabet is sometimes estimated at 32 letters. What is the information weight of one letter of such an abbreviated Russian alphabet?
Solution:
Given:


Answer: 5

Task 9. The alphabet consists of 100 characters. How much information does one character of this alphabet carry?
Solution:
Given:


Answer: 5

Problem 10. The Chichevok tribe has 24 letters and 8 numbers in its alphabet. There are no punctuation marks or arithmetic signs. What is the minimum number of binary digits they need to encode all the characters? Please note that words must be separated from each other!
Solution:
Given:


Answer: 5

Problem 11. The book, typed using a computer, contains 150 pages. Each page has 40 lines, each line has 60 characters. How much information is in the book? Give your answer in kilobytes and megabytes
Solution:
Given:


Answer: 351KB or 0.4MB

Problem 12. The information volume of the book text, typed on a computer using Unicode encoding, is 128 kilobytes. Determine the number of characters in the text of the book.
Solution:
Given:


Answer: 65536

Problem 13. A 1.5 KB information message contains 3072 characters. Determine the information weight of one character of the alphabet used
Solution:
Given:


Answer: 4

Problem 14. The message, written in letters from the 64-character alphabet, contains 20 characters. How much information does it carry?
Solution:
Given:


Answer: 120bit

Problem 15. How many characters does a message written using a 16-character alphabet contain if its size is 1/16 of a megabyte?
Solution:
Given:


Answer: 131072

Problem 16. The size of the message, containing 2048 characters, was 1/512 of a megabyte. What is the size of the alphabet in which the message is written?
Solution:
Given:


Answer: 256

Tasks for independent solution:

  1. Each character of the alphabet is written using 4 digits of binary code. How many characters are in this alphabet?
  2. The alphabet for writing messages consists of 32 characters; what is the information weight of one character? Don't forget to indicate the unit of measurement.
  3. The information volume of text typed on a computer using Unicode encoding (each character is encoded by 16 bits) is 4 KB. Determine the number of characters in the text.
  4. The volume of the information message is 8192 bits. Express it in kilobytes.
  5. How many bits of information does a 4 MB message contain? Give the answer in powers of 2.
  6. A message written in letters from the 256-character alphabet contains 256 characters. How much information does it carry in kilobytes?
  7. How many different ones are there? sound signals, consisting of sequences of short and long calls. The length of each signal is 6 calls.
  8. The meteorological station monitors air humidity. The result of one measurement is an integer from 20 to 100%, which is written using the smallest possible number of bits. The station made 80 measurements. Determine the information volume as a result of observations.
  9. The data transfer rate via an ADSL connection is 512,000 bps. Through this connection transfer a file of 1500 KB in size. Determine the file transfer time in seconds.
  10. Determine the speed of the modem if it can transmit a raster image of 640x480 pixels in 256 s. There are 3 bytes for each pixel. What if there are 16 million colors in the palette?
The topic of determining the amount of information based on the alphabetical approach is used in tasks A1, A2, A3, A13, B5 of the Unified State Examination test materials.

There are several ways to measure the amount of information. One of them is called alphabetical.

Alphabetical approach allows you to measure the amount of information in a text (symbolic message) composed of characters of a certain alphabet.

Alphabet is a set of letters, signs, numbers, brackets, etc.
The number of characters in the alphabet is called its power.

With the alphabetic approach, it is believed that each character of the text has a specific information weight. The information weight of a symbol depends on the power of the alphabet.

What is the minimum power of the alphabet that can be used to record (encode) information?



Let's call a combination of 2, 3, etc. bit binary code.

How many characters can be encoded with two bits?

Symbol sequence number

1

2

3

4

Two-digit binary code

00

01

10

11

4 characters 2 bits.

How many characters can be encoded with three bits?

Symbol sequence number

1

2

3

4

5

6

7

8

Three digit binary code

000

001

010

011

100

101

110

111


It follows that in the alphabet with cardinality 8 characters information weight of each character - 3 bits.

We can conclude that in the alphabet with capacity 16 characters the information weight of each character will be 4 bits.

Let us denote the power of the alphabet by the letter N, and the information weight of the symbol is the letter b.

The relationship between the power of the alphabet N and information weight of the symbol b.

N

2

4

8

16

b

1 bit

Measuring information.

Alphabetical approach to information measurement.

The same message can carry a lot of information for one person and not carry it at all for another person. With this approach, it is difficult to determine the amount of information unambiguously.

The alphabetic approach allows us to measure the information volume of a message presented in some language (natural or formal), regardless of its content.

To express quantitatively any quantity, first of all, a unit of measurement is necessary. Measurement is carried out by comparing the measured value with a unit of measurement. The number of times a unit of measurement “fits” into the measured value is the result of the measurement.

In the alphabetical approach, it is believed that each character of a certain message has a specific information weight- carries a fixed amount of information. All characters of the same alphabet have the same weight, depending on the power of the alphabet. The information weight of a symbol of the binary alphabet is taken as the minimum unit of information and is called 1 bit.

Please note that the name of the unit of information “bit” comes from the English phrase binary digit - “binary digit”.

1 bit is taken as the minimum unit of information. It is believed that this is the information weight of a symbol of the binary alphabet.

1.6.2. Information weight of a character of an arbitrary alphabet

Earlier we found out that the alphabet of any natural or formal language can be replaced by a binary alphabet. In this case, the power of the original alphabet N is related to the bit capacity of the binary code i required to encode all the characters of the original alphabet, the relation: N = 2 i.

The information weight of the alphabet symbol i and the power of the alphabet N are related to each other by the relation: N = 2 i.

Task 1. The Pulti alphabet contains 8 characters. What is the information weight of a symbol of this alphabet?

Solution. Let's make a brief statement of the conditions of the problem.

The relationship between the quantities i and N is known: N = 2 i.

Taking into account the initial data: 8 = 2 i. Hence: i = 3.

The complete solution in a notebook might look like this:

Answer: 3 bits.

1.6.3. Information volume of the message

Information volume message (the amount of information in a message), represented by symbols of a natural or formal language, consists of the information weights of its constituent symbols.

The information volume of message I is equal to the product of the number of characters in the message K and the information weight of the alphabet character i: I = K * i.

Problem 2. The message, written in the 32-character alphabet, contains 140 characters. How much information does it carry?

Task 3. An information message with a volume of 720 bits consists of 180 characters. What is the power of the alphabet in which this message is written?

1.6.4. Units of information

Nowadays, text preparation is mainly carried out using computers. We can talk about a “computer alphabet”, which includes the following characters: lowercase and uppercase Russian and letters, numbers, punctuation marks, arithmetic symbols, brackets, etc. This alphabet contains 256 characters. Since 256 = 28, the information weight of each character in this alphabet is 8 bits. A value equal to eight bits is called a byte. 1 byte is the information weight of an alphabet symbol with a capacity of 256.

1 byte = 8 bits

Bit and byte are “small” units of measurement. In practice, larger units are used to measure information volumes:

1 kilobyte = 1 KB = 1024 bytes = 210 bytes

1 megabyte = 1 MB = 1024 KB = 210 KB = 220 bytes

1 gigabyte = 1 GB = 1024 MB = 210 MB = 220 KB = 230 bytes

1 terabyte = 1 TB = 1024 GB = 210 GB = 220 MB = 230 KB = 240 bytes

Task 4. A 4 KB information message consists of 4096 characters. What is the information weight of the symbol of the alphabet used? How many characters does the alphabet with which this message is written contain?

Problem 5. 128 athletes participate in cyclocross. A special device registers each participant's passing of the intermediate finish, recording its number in a chain of zeros and ones of minimum length, the same for each athlete. What will be the information volume of the message recorded by the device after 80 cyclists have completed the intermediate finish?

Solution. The 128 participants' numbers are encoded using the binary alphabet. The required bit depth of the binary code (chain length) is 7, since 128 = 27. In other words, the message recorded by the device that one cyclist has passed the intermediate finish carries 7 bits of information. When 80 athletes complete the intermediate finish, the device will record 80 7 = 560 bits, or 70 bytes of information.

Let us remember that, from the point of view of a subjective approach to defining information, information is the content of messages that a person receives from various sources. The same message can carry a lot of information for one person and not carry it at all for another person. With this approach, it is difficult to determine the amount of information unambiguously.

The alphabetic approach allows us to measure the information volume of a message presented in some language (natural or formal), regardless of its content.

To express quantitatively any quantity, first of all, a unit of measurement is necessary. Measurement is carried out by comparing the measured value with a unit of measurement. The number of times a unit of measurement “fits” into the measured value is the result of the measurement.

With the alphabetical approach, it is believed that each character of a message has a certain information weight - it carries a fixed amount of information. All characters of the same alphabet have the same weight, depending on the power of the alphabet. The information weight of a symbol of the binary alphabet is taken as the minimum unit of information and is called 1 bit. Please note that the name of the unit of information “bit” comes from the English phrase “binary digit”.

1.4.2. Information weight of a character of an arbitrary alphabet

Earlier we found out that the alphabet of any natural or formal language can be replaced by a binary alphabet. In this case, the power of the original alphabet N is related to the bit capacity of the binary code i required to encode all the characters of the original alphabet, the relation: N = 2 i.

Problem 1. The Pulti alphabet contains 8 characters. What is the information weight of a symbol of this alphabet?

Solution. Let's make a brief statement of the conditions of the problem.

The relationship between the quantities i and N is known: N = 2 i.

Taking into account the initial data: 8 = 2 i. Hence: i = 3.

The complete solution in a notebook might look like this:

Answer: 3 bits

1.4.3. Information volume of the message

The information volume of a message (the amount of information in a message), represented by symbols of a natural or formal language, consists of the information weights of its constituent symbols.

Problem 2. The message, written in the 32-character alphabet, contains 140 characters. How much information does it carry?

Solution.

Answer": 700 bits.

Problem 3. An information message with a volume of 720 bits consists of 180 characters. What is the power of the alphabet in which this message is written?

Solution.

Answer: 16 characters.

1.4.4. Units of information

Nowadays, text preparation is mainly carried out using computers. We can talk about a “computer alphabet” that includes the following characters: lowercase and uppercase Russian and Latin letters, numbers, punctuation marks, arithmetic operations signs, brackets, etc. This alphabet contains 256 characters. Since 256 = 2 8 , the information weight of each character in this alphabet is 8 bits. A value equal to eight bits is called a byte. 1 byte is the information weight of an alphabet symbol with a capacity of 256.

Problem 4. A 4 KB information message consists of 4096 characters. What is the informational weight of the symbol of this message? How many characters does the alphabet with which this message is written contain? Solution.

Answer: 256 characters.

The most important

With the alphabetical approach, it is believed that each character of a certain message has a certain information weight - it carries a fixed amount of information.

1 bit is the minimum unit of information.

The information weight i of the alphabet symbol and the power N of the alphabet are related to each other by the relation: N = 2 i . The information volume I of the message is equal to the product of the number K of characters in the message by the information weight of the i character of the alphabet: I = K i.

1 byte = 8 bits.

Byte, kilobyte, megabyte, gigabyte, terabyte are units of measurement of information. Each subsequent unit is 1024 (2 10) times larger than the previous one.

Questions and tasks


In computer science, an alphabet is a system of signs that can be used to convey an information message. To understand the essence of this definition, here are some additional theoretical facts:

  1. Any messages consist of the alphabet. For example, this article is a message. Then it consists of characters from the Russian alphabet.
  2. By symbol we can understand the minimally significant particle of the alphabet. Indivisible particles are also called atoms. The characters in the Russian alphabet are “a”, then “b”, “c”, and so on.
  3. In theory, the alphabet does not need to be encoded in any way. For example, in a printed book, the characters of the alphabet mean themselves, which means they do not have any encoding.

But in practice we have the following: the computer does not understand what letters are. Therefore, to transmit an information message, it must first be encoded in a language that the computer can understand. In order to move further, it is necessary to introduce additional terms.

What is the power of the alphabet

By the power of an alphabet we mean the total number of characters in it. In order to find out how powerful the alphabet is, you just need to count the number of characters in it. Let's figure it out. For the Russian alphabet, the power of the alphabet is 33, or 32 characters if you do not use “ё”.

Let's assume that all the characters in our alphabet occur with equal probability. This assumption can be understood as follows: let's say we have a bag of labeled cubes. The number of cubes in it is infinite, and each is signed with only one symbol. Then, with a uniform distribution, no matter how many cubes we take out of the bag, the number of cubes with different symbols will be the same, or will tend to this as the number of cubes we take out of the bag increases.

Estimation of the weight of information messages

Almost a hundred years ago, American engineer Ralph Hartley developed a formula that can be used to estimate the amount of information in a message. His formula works for equally probable events and looks like this:

i = log 2 M

Where "i" is the number of indivisible information atoms (bits) in the message, "M" is the power of the alphabet. Let's move on. Using mathematical transformations, we can determine that the power of the alphabet can be calculated as follows:

This formula in general defines the relationship between the number of equally probable events “M” and the amount of information “i”.

Calculating power

Most likely, you already know from your school computer science course that modern computing systems built on von Neumann architecture use a binary information encoding system. This is how both programs and data are encoded.

In order to represent text in a computing system, a uniform code of eight digits is used. A code is considered uniform because it contains a fixed set of elements - 0 and 1. The values ​​in such a code are specified by a certain order of these elements. Using an eight-bit code, we can encode messages weighing 256 bits, because according to Hartley’s formula: M 8 = 2 8 = 256 bits of information.

This situation with binary character encoding has developed historically. But theoretically we could use other alphabets to represent data. So, for example, in a four-character alphabet, each character would have a weight of not one, but two bits, in an eight-character alphabet - 3 bits, and so on. This is calculated using the binary logarithm that was given above ( i = log 2 M).

Since in an alphabet with a power of 256 bits eight binary digits are allocated to designate one character, it was decided to introduce an additional measure of information - a byte. One byte contains one ASCII character and contains eight bits.

How information is measured

Eight-bit encoding text messages, which is used in the ASCII character table, allows you to accommodate basic set Latin and Cyrillic characters in upper and lower case, numbers, punctuation symbols and other basic characters.

In order to measure larger amounts of data, special prefixes are used for the words byte and bit. Such prefixes are shown in the table below:

Many people who have studied physics will argue that it would be rational to use classical prefixes to denote units of information (like kilo- and mega-), but in fact this is not entirely correct, because such prefixes to quantities denote multiplication by one or another power of the number ten , when in computer science the binary measurement system is used everywhere.

Correct names of data units

In order to eliminate inaccuracies and inconveniences, in March 1999, the International Commission in the Field of Electrical Engineering approved new prefixes to units that are used to determine the amount of information in electronic computer technology. These prefixes were “mebi”, “kibi”, “gibi”, “tebi”, “exbi”, “peti”. These units have not yet taken root, so it will most likely take time to introduce this standard and begin widespread use. You can determine how to make the transition from classic units to newly approved ones using the following table:

Let's assume that we have a text that contains K characters. Then, using the alphabetical approach, we can calculate the amount of information V that it contains. It will be equal to the product of the power of the alphabet and the information weight of one character in it.

Using Hartley's formula, we know how to calculate the amount of information through the binary logarithm. Assuming that the number of alphabet characters is N and the number of characters in the information message record is K, we obtain the following formula for calculating the information volume of the message:

V = K ⋅ log 2 N

The alphabetic approach indicates that the information volume will depend only on the power of the alphabet and the size of the messages (that is, the number of characters in it), but will not be related in any way to the semantic content for a person.

Power calculation examples

In computer science lessons, they often give problems to find the power of the alphabet, the length of a message, or the volume of information. Here is one such task:

"The text file occupies 11 KB of disk space and contains 11264 characters. Determine the alphabet capacity of this text file."

What the solution will be can be seen in the picture below.

Thus, an alphabet with a capacity of 256 characters carries only 8 bits of information, which in computer science is called one byte. A byte describes 1 character of the ASCII table, which, if you think about it, is not a lot at all.

Is one byte a lot or a little?

Modern data warehouses like Google and Facebook data centers contain no less than tens of petabytes of information. The exact amount of data, however, will be difficult to calculate even for them, because then it will be necessary to stop all processes on the servers and deny users access to recording and editing their personal information.

But to imagine such incredible amounts of data, you need to clearly understand that everything is made up of small details. It is necessary to understand what the power of the alphabet is (256) and how many bits 1 byte of information contains (as you remember, 8).




Top