Processor bottleneck. Bottleneck: Evolution of the PC bottleneck problem. In addition to monitoring production, the following tools are used to identify bottlenecks:

FX vs Core i7 | Looking for bottlenecks with the Eyefinity configuration

We've seen processor performance double every three to four years. Yet the most demanding game engines we tested are as old as Core 2 Duo processors. Naturally, CPU bottlenecks should be a thing of the past, right? As it turns out, GPU speed grows even faster than CPU performance. Thus, the debate about buying a faster CPU or increasing graphics power continues.

But there always comes a time when arguing is pointless. For us, it came when games began to run smoothly on the largest monitor with a native resolution of 2560x1600. And if a faster component can provide an average of 200 rather than 120 frames per second, the difference will still not be noticeable.

In response to the lack of more high resolutions for fast graphics adapters, AMD introduced Eyefinity technology, and Nvidia introduced Surround. Both technologies allow you to play on more than one monitor, and running at 5760x1080 resolution has become an objective reality for high-end GPUs. Essentially, three 1920x1080 displays will be cheaper and more impressive than one 2560x1600 display. Hence the reason to spend extra money on more powerful graphics solutions.

But is it really necessary? powerful processor to play without stuttering at a resolution of 5760x1080? The question turned out to be interesting.

AMD recently introduced a new architecture and we bought a boxed FX-8350. In the article "AMD FX-8350 Review and Test: Will Piledriver Fix Bulldozer's Shortcomings?" We liked a lot about the new processor.

From an economic point of view, in this comparison Intel will have to prove that it is not only faster than the AMD chip in games, but also justifies the high price difference.


Both motherboards belong to the Asus Sabertooth family, but the company is asking a higher price for the model with the LGA 1155 socket, which further complicates Intel's budget situation. We specifically selected these platforms to make performance comparisons as fair as possible, without taking cost into account.

FX vs Core i7 | Configuration and tests

While we were waiting for him to appear in the testlab FX-8350, conducted boxing tests. Considering that the AMD processor reaches 4.4 GHz without any problems, we started testing the Intel chip at the same frequency. It later turned out that we had underestimated our samples, as both CPUs reached 4.5 GHz at the selected voltage level.

We did not want to delay publication due to repeated testing at higher frequencies, so we decided to leave the test results at 4.4 GHz.

Test configuration
CPU Intel Intel Core i7-3770K (Ivy Bridge): 3.5 GHz, 8 MB shared L3 cache, LGA 1155 overclocked to 4.4 GHz at 1.25 V
Intel motherboard Asus Sabertooth Z77, BIOS 1504 (08/03/2012)
Intel CPU Cooler Thermalright MUX-120 w/Zalman ZM-STG1 Paste
CPU AMD AMD FX-8350 (Vishera): 4.0 GHz, 8 MB shared L3 cache, Socket AM3+ overclocked to 4.4 GHz at 1.35 V
AMD motherboard Asus Sabertooth 990FX, BIOS 1604 (10/24/2012)
CPU cooler AMD Sunbeamtech Core-Contact Freezer w/Zalman ZM-STG1 Paste
Net Built-in Gigabit LAN controller
Memory G.Skill F3-17600CL9Q-16GBXLD (16 GB) DDR3-2200 CAS 9-11-9-36 1.65 V
Video card 2 x MSI R7970-2PMD3GD5/OC: GPU, 1010 MHz GDDR5-5500
Storage device Mushkin Chronos Deluxe DX 240 GB, SATA 6 Gb/s SSD
Nutrition Seasonic X760 SS-760KM: ATX12V v2.3, EPS12V, 80 PLUS Gold
Software and drivers
operating system Microsoft Windows 8 Professional RTM x64
Graphics driver AMD Catalyst 12.10

Due to their high efficiency and quick installation, we have been using Thermalright MUX-120 and Sunbeamtech Core Contact Freezer coolers for several years. However, the mounting brackets that come with these models are not interchangeable.


G.Skill F3-17600CL9Q-16GBXLD memory modules have DDR3-2200 CAS 9 specification, and use Intel XMP profiles for semi-automatic configuration. Sabertooth 990FX uses XMP values ​​via Asus DOCP.

The Seasonic X760 power supply provides the high efficiency needed to evaluate platform differences.

StarCraft II does not support AMD Eyefinity technology, so we decided to use older games: Aliens vs. Predator and Metro 2033.

Test configuration (3D games)
Aliens vs. Predator using AvP Tool v.1.03, SSAO/tessellation/shadows on.
Test configuration 1: High texture quality, no AA, 4x AF
Test configuration 2: Very High texture quality, 4x AA, 16x AF
Battlefield 3 Campaign mode, "Going Hunting" 90-second Fraps
Test setting 1: Medium quality (no AA, 4x AF)
Test Setup 2: Ultra Quality (4x AA, 16x AF)
F1 2012 Steam version, built-in benchmark
Test setting 1: High quality, no AA
Test Setup 2: Ultra Quality, 8x AA
Elder Scrolls V: Skyrim Update 1.7, Celedon Aethirborn level 6, 25-second Fraps
Test Setup 1: DX11, High detail level without AA, 8x AF, FXAA on.
Test Setup 2: DX11, Ultra Detail Level, 8x AA, 16x AF, FXAA On.
Metro 2033 Full version, built-in benchmark, "Frontline" scene
Test Setup 1: DX11, High, AAA, 4x AF, No PhysX, No DoF
Test Setup 2: DX11, Very High, 4x AA, 16x AF, no PhysX, DoF on.

FX vs Core i7 | Test results

Battlefield 3, F1 2012 and Skyrim

But first, let's take a look at power consumption and efficiency.

Power consumption not overclocked FX-8350 Compared to the Intel chip, it’s not so terrible, although in fact it is higher. However, on the graph we do not see the whole picture. We didn't see the chip running at 4GHz under constant load on the base settings. Instead, when processing eight threads in Prime95, he reduced the multiplier and voltage to stay within the stated thermal envelope. Throttling artificially limits CPU power consumption. Setting a fixed multiplier and voltage significantly increases this indicator for Vishera processor during acceleration.

At the same time, not all games can use the processor's capabilities FX-8350 process eight data streams at the same time, therefore, they will never be able to bring the chip to the throttling mechanism.

As already noted, during games on non-overclocked FX-8350 throttling is not activated because most games cannot fully load the processor. In fact, games benefit from Turbo Core technology, which boosts the processor frequency to 4.2 GHz. The AMD chip performed worst in the average performance chart, where Intel is noticeably ahead.

For the efficiency chart, we use the average power consumption and average performance of all four configurations as an average. This chart shows the performance per watt of an AMD processor. FX-8350 is about two-thirds of Intel's result.

FX vs Core i7 | Will AMD FX be able to catch up with the Radeon HD 7970?

When we talk about good and affordable hardware, we like to use phrases like “80% performance for 60% cost.” These metrics are always very fair because we have become in the habit of measuring performance, power consumption and efficiency. However, they take into account the cost of only one component, and components, as a rule, cannot work alone.

Adding the components used in today's review, the system is priced at Intel based increased to $1900, and AMD platforms to $1724, this does not take into account cases, peripherals and operating systems. If we consider “ready-made” solutions, then it’s worth adding about another $80 per case, so we end up with $1984 for Intel and $1804 for AMD. The savings on a ready-made configuration with an AMD processor is $180, which is not much as a percentage of the total cost of the system. In other words, the remaining components of a high-end personal computer are downplayed by more than favorable price processor.

As a result, we are left with two completely biased ways to compare price and performance. We openly admitted, so we hope that we will not be judged for the results presented.

It is more profitable for AMD if we only include the cost of the motherboard and CPU and increase the benefit. You will get a diagram like this:

As a third alternative, you can consider the motherboard and processor as an upgrade, assuming that the case, power supply, memory and drives are left over from the previous system. Most likely a couple of video cards Radeon HD 7970 was not used in the old configuration, so it is most reasonable to take into account processors, motherboards, and graphics adapters. So we're adding two $800 Tahiti GPUs to the list.

AMD FX-8350 looks better than Intel (especially in games, at the settings we chose) only in one case: when the rest of the system is “free”. Since other components cannot be free, FX-8350 also will not be able to become a profitable purchase for games.

Intel and AMD video cards

Our test results have long shown that ATI graphics chips are more processor-dependent than Nvidia chips. As a result, when testing high-end GPUs, we equip our test benches Intel processors, bypassing platform shortcomings that can interfere with graphics performance isolation and adversely affect results.

We hoped that the way out AMD Piledriver will change the situation, but even a few impressive improvements were not enough to make the CPU team match the efficiency of the graphics team at AMD itself. Well, let's wait for the exit AMD chips based on the Steamroller architecture, which promises to be 15% more productive than Piledriver.

When building a gaming PC, the most expensive part is the graphics card, and you want it to get your money's worth. Then the question arises: what processor should I choose for this video card so that it does not limit it in games? Our specially prepared material will help you with this dilemma.

Introduction

So it turns out that the main thing in a computer is the processor and it commands everything else. It is he who gives orders to your video card to draw certain objects, and also calculates the physics of objects (even the processor calculates some operations). If the video card is not working at full capacity, and the processor can no longer go faster, then a “bottleneck” effect occurs when the system performance is limited by its weakest component.

In reality, there are always operations when the video card does not strain at all, and the percentage is working at full capacity, but we are talking about games here, so we will reason in this paradigm.

How is the load distributed between processors and video card?

It should be noted that changing the settings in the game changes the ratio of processor and video card load.

As the resolution and graphics settings increase, the load on the video card increases faster than on the processor. This means that if the processor is not a bottleneck at lower resolutions, it won’t be at higher resolutions either.

With a decrease in resolution and graphics settings, the opposite is true: the load on the processor when rendering one frame remains almost unchanged, but the video card becomes much lighter. In such a situation, the processor is more likely to become a bottleneck.

What are the signs of bottleneck?

To carry out the test you need a program. You need to look at the “GPU Load” graph.

You also need to know the load on the processor. This can be done in system monitoring in the task manager, there is a processor load graph there.

So what are the signs that The processor does not open the video card?

  • GPU load is not close to 100%, but CPU load is always around this mark
  • The GPU load graph fluctuates a lot (maybe a poorly optimized game)
  • When changing graphics settings, FPS does not change

It is by these signs that you can find out whether bottleneck occurs in your case?

How to choose a processor?

To do this, I advise you to watch processor tests in the game you want. There are sites that specifically deal with this (,).

An example of a test in the game Tom Clancy's The Division:

Typically, when testing processors in different games, the graphics settings and resolution are specified. Conditions are selected such that the processor is the bottleneck. In this case, you can find out how many frames in a given resolution a particular processor is capable of. This way you can compare processors with each other.

Games are different (Captain Obvious) and their processor requirements may be different. So, in one game everything will be fine and the processor will cope with scenes without problems, but in another the video card will cool down while the processor will have great difficulty performing its tasks.

This is most influenced by:

  • complexity of physics in the game
  • complex space geometry (many large buildings with many details)
  • artificial intelligence

Our advice

  • When choosing, we advise you to focus on just such tests with the graphics settings you need and the FPS you need (what your card can handle).
  • It is advisable to look at the most demanding games if you want to be sure that future new products will work well.
  • You can also take the processor with a reserve. Now games run well even on chips that are 4 years old (), which means that good processor Now it will delight you in games for a very long time.
  • If the FPS in the game is normal and the load on the video card is low, load it. Raise the graphics settings so that the video card works at full capacity.
  • When using DirectX 12, the load on the processor should decrease slightly, which will reduce the demands on it.

Technological progress does not move evenly in all areas, this is obvious. In this article we will look at which nodes at which times improved their characteristics more slowly than others, becoming a weak link. So, today's topic is the evolution of weak links - how they arose, influenced, and how they were eliminated.

CPU

From the earliest personal computers the bulk of the calculations fell on the CPU. This was due to the fact that the chips were not very cheap, so most of the peripherals used processor time for their needs. And there were very few peripheries at that time. Soon, with the expansion of the scope of PC applications, this paradigm was revised. The time has come for various expansion cards to flourish.



In the days of “kopecks” and “threes” (these are not Pentiums II and III, as young people might think, but i286 and i386 processors), the tasks assigned to the systems were not very complex, mainly office applications and calculations. Expansion cards already partly relieved the processor; for example, the MPEG decoder, which decrypted files compressed in MPEG, did this without the participation of the CPU. A little later, standards began to be developed that would put less load on the processor when exchanging data. An example was PCI bus(appeared starting with i486), work on which loaded the processor to a lesser extent. Such examples also include PIO and (U)DMA.


Processors increased their power at a good pace, a multiplier appeared, since the speed of the system bus was limited, and a cache appeared to mask requests into RAM operating at a lower frequency. The processor was still the weak link, and the speed of operation depended almost entirely on it.



Meanwhile Intel company after releasing a good one Pentium processor releases a new generation - Pentium MMX. She wanted to change things up and move the calculations to the processor. The MMX - MultiMedia eXtensions instruction set, which was intended to speed up work with audio and video processing, helped a lot with this. With its help, MP3 music began to play normally, and it was possible to achieve acceptable MPEG4 playback using the CPU.

The first plugs in the tire

Systems based on the Pentium MMX processor were already more limited by memory bandwidth (memory bandwidth). The 66 MHz bus for the new processor was a bottleneck, despite the transition to a new type of SDRAM memory, which improved performance per megahertz. For this reason, bus overclocking became very popular, when the bus was set to 83 MHz (or 75 MHz) and received a very noticeable increase. Often, even a lower final processor frequency was compensated by a higher bus frequency. For the first time, higher speeds were achieved at lower frequencies. Volume became another bottleneck random access memory. For SIMM memory this was a maximum of 64 MB, but more often it was 32 MB or even 16. This greatly complicated the use of programs, since each a new version Windows is known to like to “eat a lot of tasty frame” (c). Recently there are rumors about a conspiracy between memory manufacturers and Microsoft Corporation.



Intel, meanwhile, began to develop the expensive and therefore not very popular Socket8 platform, while AMD continued to develop Socket7. Unfortunately, the latter used slow FPU (Floating Point Unit– module of operations with fractional numbers), created by the then newly acquired company Nexgen, which entailed a lag behind the competitor in multimedia tasks - primarily games. The transfer to a 100 MHz bus gave the processors the necessary bandwidth, and the full-speed 256 KB L2 cache on the AMD K6-3 processor improved the situation so much that now the system speed was characterized only by the processor frequency, and not the bus. Although, in part, this was due to the slow FPU. Office applications that depended on ALU power ran smoothly thanks to the fast memory subsystem faster decisions competitor.

Chipsets

Intel abandoned the expensive Pentium Pro, which had an L2 cache die integrated into the processor, and released the Pentium II. This CPU had a core very similar to the Pentium MMX core. The main differences were the L2 cache, which was located on the processor cartridge and operated at half the core frequency, and new tire– AGTL. With the help of new chipsets (in particular, i440BX), it was possible to increase the bus frequency to 100 MHz and, accordingly, bandwidth. In terms of efficiency (the ratio of random read speed to theoretical), these chipsets became one of the best, and to this day Intel has not been able to beat this indicator. The i440BX series chipsets had one weak link - the south bridge, the functionality of which no longer met the requirements of that time. The old south bridge from the i430 series, used in systems based on Pentium I, was used. It was this circumstance, as well as the connection between the chipsets via the PCI bus, that prompted manufacturers to release hybrids containing the i440BX north bridge and the VIA (686A/B) south bridge.



Meanwhile, Intel is demonstrating DVD movie playback without support cards. But the Pentium II did not receive much recognition due to its high cost. The need to produce cheap analogues became obvious. The first attempt - an Intel Celeron without L2 cache - was unsuccessful: in terms of speed, the Covingtons were very much inferior to their competitors and did not justify their prices. Then Intel makes a second attempt, which turned out to be successful - the Mendocino core, loved by overclockers, which has half the cache size (128 KB versus 256 KB for the Pentium II), but operates at twice the frequency (at the processor frequency, not half as slow as the Pentium II). Due to this, the speed in most tasks was no lower, and the lower price attracted buyers.

The first 3D and again the bus

Immediately after the release of the Pentium MMX, the popularization of 3D technologies began. At first these were professional applications for developing models and graphics, but the real era was opened by 3D games, or more precisely, the Voodoo 3D accelerators created by 3dfx. These accelerators became the first mainstream cards for creating 3D scenes, which relieved the processor during rendering. It was from this time that the evolution of three-dimensional games began. Quite quickly, scene calculations using the central processor began to lose to those performed using video accelerators, both in speed and quality.



With the advent of a new powerful subsystem - graphical, which began to compete with the volume of calculated data central processor, a new bottleneck has emerged - the PCI bus. In particular, Voodoo 3 and older cards received an increase in speed simply by overclocking the PCI bus to 37.5 or 41.5 MHz. Obviously, there is a need to provide video cards with a fast enough bus. Such a bus (or rather, a port) became AGP - Accelerated Graphics Port. As the name suggests, this is a dedicated graphics bus, and according to the specification, it could only have one slot. The first version of AGP supported AGP 1x and 2x speeds, which corresponded to single and double PCI 32/66 speeds, that is, 266 and 533 MB/s. The slow version was added for compatibility, and it was with it that problems arose for quite a long time. Moreover, there were problems with all chipsets, with the exception of those released by Intel. According to rumors, these problems were related to the presence of a license only from this company and its obstruction of the development of the competing Socket7 platform.



AGP has improved things and the graphics port is no longer a bottleneck. Video cards switched to it very quickly, but the Socket7 platform suffered from compatibility problems almost until the very end. Only the latest chipsets and drivers were able to improve this situation, but even then nuances arose.

And the screws are there!

The time has come for Coppermine, frequencies have increased, performance has increased, new video cards have improved performance and increased pipelines and memory. The computer has already become a multimedia center - they played music and watched movies on it. Integrated sound cards with weak characteristics are losing ground to SBLive!, which have become the people's choice. But something prevented complete idyll. What was it?



This factor was hard disks, the growth of which slowed down and stopped at around 40 GB. For movie collectors (then MPEG4), this caused confusion. Soon the problem was solved, and quite quickly - the disks grew in volume to 80 GB and above and ceased to worry the majority of users.


AMD produces a very good platform - Socket A and a K7 architecture processor, called Athlon by marketers (technical name Argon), as well as the budget Duron. Athlone's strengths had a bus and a powerful FPU, which made it an excellent processor for serious calculations and games, leaving its competitor - Pentium 4 - the role of office machines, where, however, powerful systems were never required. Early Durons had very low cache size and bus speed, making it difficult to compete with the Intel Celeron (Tualatin). But due to better scalability (due to a faster bus), they responded better to increasing frequencies, and therefore older models were already easily ahead of Intel solutions.

Between two bridges


During this period, two bottlenecks appeared at once. The first is the tire between the axles. Traditionally, PCI has been used for these purposes. It is worth remembering that PCI, as used in desktop computers, has a theoretical throughput of 133 MB/s. In fact, the speed depends on the chipset and application and varies from 90 to 120 MB/s. In addition to this, the bandwidth is shared among all the devices connected to it. If we have two IDE channels with theoretical throughput at 100 Mb/s (ATA-100) connected to a bus with a theoretical throughput of 133 Mb/s, then the problem is obvious. LPC, PS/2, SMBus, AC97 have low bandwidth requirements. But Ethernet, ATA 100/133, PCI, USB 1.1/2.0 already operate at speeds comparable to the inter-bridge interface. For a long time there was no problem. USB was not used, Ethernet was needed infrequently and mostly at 100 Mbps (12.5 Mbps), and hard drives could not even come close to the interface's maximum speed. But time passed, and the situation changed. It was decided to make a special inter-hub (between bridges) tire.


VIA, SiS and Intel have released their own bus options. They differed, first of all, in their throughput capabilities. They started with PCI 32/66 - 233 Mb/s, but the main thing was done - the PCI bus was allocated only for its own devices, and there was no need to transfer data through it to other buses. This improved the speed of working with peripherals (relative to bridge architecture).


The throughput of the graphics port was also increased. The ability to work with Fast Writes modes was introduced, which made it possible to write data to video memory directly, bypassing system memory, and Side Band Addressing, which used an additional 8-bit part of the bus for transmission, usually intended for transmitting technical data. The gain from using FW was achieved only under high processor load; in other cases it gave a negligible gain. Thus, the difference between the 8x mode and 4x was within the error.

CPU dependency

Another bottleneck that emerged, still relevant today, was processor dependence. This phenomenon arose as a result of the rapid development of video cards and meant insufficient power“processor – chipset – memory” connections in relation to the video card. After all, the number of frames in the game is determined not only by the video card, but also by this connection, since it is the latter that provides the card with instructions and data that needs to be processed. If the connection does not keep up, then the video subsystem will hit a ceiling determined primarily by it. Such a ceiling will depend on the power of the card and the settings used, but there are also cards that have such a ceiling with any settings in a particular game or with the same settings, but in most modern games with almost any processor. For example, the GeForce 3 card was heavily limited by the performance of the Puntium III and Pentium 4 processors based on the Willamete core. The slightly older GeForce 4 Ti model was already lacking the Athlon 2100+-2400+, and the increase with improved performance of the combination was very noticeable.



How were the performance improved? At first, AMD, taking advantage of the fruits of the developed efficient architecture, simply increased the processor frequency and improved technological process, and chipset manufacturers - memory bandwidth. Intel continued to follow the policy of increasing clock frequencies, fortunately the Netburst architecture was designed to do just that. Intel processors on Willamete and Northwood cores with a 400QPB (quad pumped bus) bus were inferior to competing solutions with a 266 MHz bus. After the introduction of 533QPB, the processors became equal in performance. But then Intel, instead of the 667 MHz bus implemented in server solutions, decided to use processors for desktop computers transfer directly to the 800 MHz bus in order to make power reserves to compete with the Barton core and the new top Athlon XP 3200+. Intel processors were very limited by the bus frequency, and even 533QPB was not enough to provide a sufficient amount of data flow. That is why the released 3.0 GHz CPU on an 800 MHz bus outperformed the 3.06 MHz processor on a 533 MHz bus in all, with the possible exception of a small number of applications.


Support for new frequency modes for memory was also introduced, and a dual-channel mode appeared. This was done to equalize the bandwidth of the processor and memory bus. Dual-channel DDR mode exactly matched QDR at the same frequency.


For AMD, dual-channel mode was a formality and gave a barely noticeable increase. The new Prescott core did not bring a clear increase in speed and in some places was inferior to the old Northwood. Its main goal was to transfer to a new technical process and the possibility of further increasing frequencies. Heat generation increased significantly due to leakage currents, which put an end to the release of a model operating at a frequency of 4.0 GHz.

Through the ceiling to a new memory

The generation Radeon 9700/9800 and GeForce 5 for processors of that time did not cause problems with processor dependence. But the GeForce 6 generation brought most systems to their knees, since the performance increase was very noticeable, and therefore the processor dependence was higher. Top processors based on Barton (Athlon XP 2500+ - 3200+) and Northwood/Prescott (3.0-3.4 MHz 800FSB) cores have hit a new limit - the memory frequency limit and the bus. AMD especially suffered from this - the 400 MHz bus was not enough to realize the power of a good FPU. The Pentium 4 had a better situation and showed good results at minimum timings. But JEDEC was unwilling to certify higher-frequency, lower-latency memory modules. Therefore, there were two options: either a complex four-channel mode, or switching to DDR2. The latter happened and the LGA775 (Socket T) platform was introduced. The bus remained the same, but memory frequencies were not limited to 400 MHz, but only started from there.



AMD solved the problem better in terms of scalability. The K8 generation, technically called Hammer, in addition to increasing the number of instructions per clock cycle (partly due to a shorter pipeline), had two innovations with a reserve for the future. They were the built-in memory controller (or rather, the north bridge with most of its functionality) and the fast universal HyperTransport bus, which served to connect the processor with the chipset or processors with each other in a multiprocessor system. The built-in memory controller made it possible to avoid the weak link - the chipset-processor connection. FSB as such ceased to exist, there was only a memory bus and an HT bus.


This allowed Athlon 64s to easily overtake existing solutions Intel on Netburst architecture and show the flawed ideology of a long pipeline. Tejas had a lot of problems and did not come to light. These processors easily realized their potential GeForce cards 6, however, like the older Pentium 4.


But then an innovation appeared that made processors a weak link for a long time. Its name is multi-GPU. It was decided to revive the ideas of 3dfx SLI and implement them in NVIDIA SLI. ATI responded symmetrically and released CrossFire. These were technologies for processing scenes using two cards. The doubling of the theoretical power of the video subsystem and calculations associated with splitting the frame into parts at the expense of the processor led to a skewed system. The older Athlon 64 loaded such a combination only in high resolutions. GeForce 7 release and ATI Radeon The X1000 further increased this imbalance.


Along the way, a new PCI Express bus was developed. This bidirectional serial bus designed for peripherals and has very high speed. It replaced AGP and PCI, although it did not completely supplant it. Due to its versatility, speed and low cost of implementation, it quickly replaced AGP, although at that time it did not bring any increase in speed. There was no difference between them. But from the point of view of unification, this was a very good step. Boards are now being produced that support PCI-E 2.0, which has twice the throughput (500 MB/s in each direction versus the previous 250 MB/s per line). This also did not provide any gains to current video cards. The difference between different PCI-E modes is only possible in the case of insufficient video memory, which already means an imbalance for the card itself. Such a card is the GeForce 8800GTS 320 MB - it reacts very sensitively to changes in the PCI-E mode. But taking an unbalanced card just to evaluate the gain from PCI-E 2.0 is not the most reasonable decision. Another thing is cards with support for Turbocache and Hypermemory - technologies for using RAM as video memory. Here the increase in memory bandwidth will be approximately twofold, which will have a positive effect on performance.


You can see whether the video card has enough memory in any review of devices with different VRAM sizes. Where there will be a sharp drop in frames per second, there is a lack of VideoRAM. But it happens that the difference becomes very noticeable only in non-playable modes - resolution 2560x1600 and AA/AF at maximum. Then the difference between 4 and 8 frames per second will be twofold, but it is obvious that both modes are impossible in real conditions, and therefore they should not be taken into account.

A new answer to video chips

The release of the new Core 2 architecture (technical name Conroe) improved the situation with processor dependence and loaded solutions on the GeForce 7 SLI without any problems. But Quad SLI and GeForce 8 arrived in time and took revenge, restoring the imbalance. This continues to this day. The situation only got worse with the release of 3-way SLI and the upcoming Quad SLI on GeForce 8800 and Crossfire X 3-way and 4-way. The release of Wolfdale slightly increased the clock speeds, but overclocking this processor is not enough to properly load such video systems. 64-bit games are very rare, and growth in this mode is observed in isolated cases. The games that benefit from four cores can be counted on the fingers of one disabled hand. As usual, Microsoft is pulling everyone out, loading up their new OS and memory, and the processor is doing great. It is implicitly announced that 3-way SLI and Crossfire X technologies will work exclusively under Vista. Given its appetites, gamers may be forced to take quad-core processors. This is due to a more uniform load of kernels than in Windoes XP. If it must eat up a fair share of processor time, then at least let it eat up the cores that are not used by the game anyway. However, I doubt it's new operating system will be satisfied with the given kernels.



The Intel platform is becoming obsolete. The four cores are already suffering greatly from lack of memory bandwidth and delays associated with bus switches. The bus is shared, and it takes time for the kernel to take control of the bus. With two cores this is tolerable, but with four cores the effect of temporary losses becomes more noticeable. Also, the system bus has not kept up with bandwidth for a long time. The influence of this factor was weakened by improving the efficiency of the asynchronous mode, which Intel implemented well. Workstations suffer from this to an even greater extent due to the fault of the unsuccessful chipset, the memory controller of which provides only up to 33% of the theoretical memory bandwidth. An example of this is losing Intel platforms Skulltrail in most gaming applications (3Dmark06 CPU test is not a gaming application :)) even when using the same video cards. Therefore, Intel announced a new generation of Nehalem, which will implement an infrastructure very similar to AMD's developments - an integrated memory controller and a QPI peripheral bus (technical name CSI). This will improve the scalability of the platform and give positive results in dual-processor and multi-core configurations.


AMD currently has several bottlenecks. The first is related to the caching mechanism - because of it, there is a certain bandwidth limit, depending on the processor frequency, such that it is not possible to jump above this value, even using higher frequency modes. For example, with an average processor the difference in working with memory between DDR2 667 and 800 MHz can be about 1-3%, but for a real task it is generally negligible. Therefore, it is best to select the optimal frequency and lower the timings - the controller responds very well to them. Therefore, it makes no sense to implement DDR3 - high timings will only hurt, and there may be no gain at all. Also, AMD’s problem now is the slow (despite SSE128) processing of SIMD instructions. It is for this reason that Core 2 is very much ahead of K8/K10. ALU, which has always been Intel's strong point, has become even stronger, and in some cases can be many times faster than its counterpart in Phenom. That is the main problem AMD processors– weak “mathematics”.


Generally speaking, weak links are very task specific. Only “epoch-making” ones were considered. So, in some tasks, the speed may be limited by the amount of RAM or the speed of the disk subsystem. Then more memory is added (the amount is determined using performance counters) and RAID arrays are installed. The speed of games can be increased by disabling the built-in sound card and purchasing a normal discrete one - Creative Audigy 2 or X-Fi, which load the processor less by processing effects with their chip. This applies to a greater extent to AC’97 sound cards and to a lesser extent to HD-Audio (Intel Azalia), since the latter has fixed the problem of high processor load.


Remember, the system should always be tailored to specific tasks. Often, if you can choose a balanced video card (and the choice according to price categories will depend on prices that vary greatly in different places), then, say, with a disk subsystem such an opportunity is not always available. Very few people need RAID 5, but for a server it is an indispensable thing. The same applies to a dual-processor or multi-core configuration, useless in office applications, but a “must have” for a designer working in 3Ds Max.

IN latest version Windows now has a power rating feature for different components PC. This gives an overview of the performance and bottlenecks of the system. But here you will not find any details about the speed parameters of the components. In addition, this diagnostic does not allow you to perform a hardware stress test, which can be useful for understanding peak loads during the launch of modern games. Third-party benchmarks of the 3DMark family also provide only estimates in conditional points. It is no secret that many computer hardware manufacturers optimize the operation of video cards and other components in such a way as to get the maximum number of points when passing 3DMark. This program even allows you to compare the performance of your equipment with similar ones from its database, but you will not get specific values.

Therefore, PC testing should be done separately, taking into account not only the benchmark’s performance assessment, but also real specifications, recorded as a result of equipment inspection. We have selected for you a set of utilities (both paid and free) that allow you to get specific results and identify weak links.

Image processing speed and 3D

Testing video cards is one of the most important steps when assessing PC power. Manufacturers of modern video adapters equip them with special software and drivers that allow the GPU to be used not only for image processing, but also for other calculations, for example, when encoding video. Therefore the only one reliable way find out how efficiently it is processed computer graphics, - resort to a special application that measures the performance of the device.

Checking video card stability

Program: FurMark 1.9.1 Website: www.ozone3d.net The FurMark program is one of the fastest and easiest tools for checking the operation of a video adapter. The utility tests the performance of a video card using OpenGL technology as a basis. The proposed visualization algorithm uses multi-pass rendering, each layer of which is based on GLSL (OpenGL shader language).

To load the graphics card's processor, this benchmark renders an abstract 3D image with a torus covered in fur. The need to process a large amount of hair leads to the maximum possible load on the device. FurMark checks the stability of the video card and also shows changes in the temperature of the device as the load increases.

In the FurMark settings, you can specify the resolution at which the hardware will be tested, and upon completion, the program will present a brief report on the PC configuration with a final score in conditional points. This value is convenient to use when comparing the performance of several video cards in general. You can also check the “standby” resolutions of 1080p and 720p.

Virtual stereo walk

Program: Unigine Heaven DX11 Benchmark Website: www.unigine.com One of the surest ways to test what you can do new computer, - run games on it. Modern games fully utilize hardware resources - video card, memory and processor. However, not everyone has the opportunity and desire to spend time on such entertainment. You can use Unigine Heaven DX11 Benchmark instead. This test is based on the Unigine game engine (games such as Oil Rush, Dilogus: The Winds of War, Syndicates of Arkon and others are built on it), which supports graphics APIs (DirectX 9, 10, 11 and OpenGL). After launching it, the program will create a demo visualization, drawing the virtual environment in real time. The user will see a short video that will include a virtual walk through a fantasy world. These scenes are created by the video card. In addition to three-dimensional objects, the engine simulates complex lighting, modeling global system with multiple reflections of light rays from scene elements.

You can test your computer in stereo mode, and in the benchmark settings you can select a surround video image standard: anaglyph 3D, separate frame output for the right and left eyes, etc.

Despite the fact that the title of the program mentions the eleventh version of DirectX, this does not mean that Unigine Heaven is intended only for modern video cards. In the settings of this test, you can select one of the earlier versions of DirectX, as well as set an acceptable level of picture detail and specify the quality of shader rendering.

Finding the weak link

In a situation where a user is overwhelmed by the desire to increase the performance of his computer, the question may arise: which component is the weakest? What will make the computer faster - replacing the video card, processor or installing a huge amount of RAM? To answer this question, it is necessary to test individual components and determine the “weak link” in the current configuration. A unique multi-testing utility will help you find it.

Load simulator

Program: PassMark PerformanceTest Website: www.passmark.com PassMark PerformanceTest analyzes almost any device present in the PC configuration - from motherboard and memory to optical drives.

A special feature of PassMark PerformanceTest is that the program uses a large number of different tasks, scrupulously measuring computer performance in different situations. At a certain moment, it may even seem that someone has taken control of the system into their own hands - windows open randomly, their contents are scrolled, and images are displayed on the screen. All this is the result of a benchmark that simulates the execution of the most typical tasks usually required in Windows. At the same time, the speed of data compression is checked, the time required to encrypt information is recorded, filters are applied to photographs, and the rendering speed is set vector graphics, short 3D demo videos are played, etc.

At the end of testing, PassMark PerformanceTest provides a total score and offers to compare this result with data obtained on PCs with different configurations. For each of the tested parameters, the application creates a diagram on which the weak components of the computer are clearly visible.

Checking the disk system

Disk system throughput can be the biggest bottleneck in PC performance. Therefore, knowing the real characteristics of these components is extremely important. Testing a hard drive will not only determine its read and write speeds, but will also show how reliably the device operates. To check your drive, we recommend trying two small utilities.

Exams for HDD

Programs: CrystalDiskInfo and CrystalDiskMark Website: http://crystalmark.info/software/index-e.html These programs were created by the same developer and complement each other perfectly. Both of them are free and can work without installation on a computer, directly from a flash drive.

Most hard drives implement SMART self-diagnosis technology, which allows you to predict possible malfunctions in the drive. Using the CrystalDiskInfo program, you can assess the real state of your HDD in terms of reliability: it reads SMART data, determines the number of problem sectors, the number of read head positioning errors, the time required to spin up the disk, as well as the current temperature of the device. If the latter indicator is too high, then the service life of the media before failure will be very short. The program also shows the firmware version and provides data on the duration of use hard drive.

CrystalDiskMark is a small application that measures write and read speeds. This disk checking tool differs from similar utilities in that it allows you to use different conditions for writing and reading data - for example, measuring readings for blocks of different sizes. The utility also allows you to set the number of tests to be performed and the amount of data used for them.

Speedometer for web surfing

Real speed network connection usually differs from what is indicated in its settings or declared by the provider, and, as a rule, to a lesser extent. The speed of data transfer can be influenced by a lot of factors - the level of electromagnetic interference in the room, the number of users simultaneously working on the network, cable quality, etc.

Network Speed ​​Estimate

Program: SpeedTest Website: www.raccoonworks.com If you want to know the actual data transfer speed in your local network, the SpeedTest program will help you. It allows you to determine whether the provider adheres to the stated parameters. The utility measures the data transfer speed between two working user machines, as well as between remote server and a personal computer.

The program consists of two parts - server and client. To measure the speed of information transfer from one computer to another, the first user needs to launch the server part and specify an arbitrary file (preferably big size) which will be used for the test. The second test participant must launch the client component and specify the server parameters - address and port. Both applications establish a connection and begin exchanging data. During the file transfer process, SpeedTest plots a graphical relationship and collects statistics about the time it took to copy the data over the network. If you test several remote PCs, the program will add new curves to the plotted graph over and over again.

In addition, SpeedTest will check the speed of the Internet: in “Web Page” mode, the program tests the connection to any site. This parameter can also be assessed by going to the specialized resource http://internet.yandex.ru.

Malfunctions in RAM may not appear immediately, but under certain loads. To be sure that the selected modules will not let you down in any situation, it is better to test them thoroughly and choose the fastest ones.

Meme Olympics

Program: MaxxMEM2 - PreView Website: www.maxxpi.net This program is designed to test memory speed. In a very short period, it performs several tests: it measures the time it takes to copy data into RAM, determines the speed of reading and writing data, and shows the memory latency parameter. In the utility settings, you can set the test priority, and compare the result with the current values ​​obtained by other users. From the program menu, you can quickly go to online statistics on the official MaxxMEM2 website and find out which memory is the most productive.

For sound, speed is not important

When testing most devices, data processing speed is usually important. But with regard to the sound card, this is not the main indicator. It is much more important for the user to check the characteristics of the analog and digital audio path - to find out how much the sound is distorted during playback and recording, measure the noise level, etc.

Comparison with the standard

Program: RightMark Audio Analyzer 6.2.3 Website: http://audio.rightmark.org The creators of this utility offer several ways to check audio performance. The first option is self-diagnosis of the sound card. The device reproduces a test signal through the audio path and immediately records it. The waveform of the received signal should ideally match the original. Deviations indicate sound distortion by the audio card installed in your PC.

The second and third testing methods are more accurate - using a reference generator sound signal or using an additional sound card. In both cases, the quality of the signal source is taken as the standard, although a certain error additional devices also contribute. When using a second audio card, the output signal distortion factor should be minimal - the device should have better characteristics than the sound card being tested. At the end of the test, you can also determine parameters such as the frequency characteristics of the audio card, its noise level, harmonic distortion output, etc.

In addition to the basic functions available in the free edition, the more powerful version of RightMark Audio Analyzer 6.2.3 PRO also includes support for a professional ASIO interface, four times more detailed spectrum resolution and the ability to use direct Kernel Streaming data transfer.

It is important that no one interferes

When running any performance test, keep in mind that the final results are affected by many factors, especially the performance of background services and applications. Therefore, for the most accurate assessment of your PC, it is recommended to first disable the anti-virus scanner and close all running applications, right down to the email client. And, of course, to avoid errors in measurements, you should stop all work until the program completes testing the equipment.

The theory of system limitations was formulated in the 80s of the twentieth century. and concerned the management of manufacturing enterprises. Briefly, its essence boils down to the fact that in each production system there are restrictions that limit efficiency. If you eliminate a key limitation, the system will work much more efficiently than if you try to influence the entire system at once. Therefore, the process of improving production must begin with eliminating bottlenecks.

Now the term bottleneck can be used in any industry - in the service sector, development software, logistics, Everyday life.

What is bottleneck

The definition of bottleneck is a place in a production system where congestion occurs because materials are flowing in too quickly but cannot be processed as quickly. This is often a station with less power than the previous node. The term comes from an analogy with the narrow neck of a bottle, which slows down the flow of liquid out.


Bottleneck - bottleneck in the production process

In manufacturing, the bottleneck effect causes downtime and production costs, reduces overall efficiency and increases delivery times to customers.

There are two types of bottlenecks:

  1. Short term bottlenecks- caused by temporary problems. Good example— sick leave or vacation of key employees. No one on the team can fully replace them, and work stops. In production, this may be a breakdown of one of a group of machines when its load is distributed among the working equipment.
  2. Long-term bottlenecks- act constantly. For example, a constant delay in monthly reports in a company due to the fact that one person must process a huge amount of information that will arrive to him in an avalanche at the very end of the month.

How to identify bottleneck in production process

There are several ways to search for bottlenecks in production of varying levels of complexity, with or without the use of special tools. Let's start with more simple ways based on observation.

Queues and congestion

The process on a production line that has the largest queue of work-in-process units in front of it is usually a bottleneck. This bottleneck search method is suitable for piece-by-piece conveyor production, for example, on a bottling line. It is clearly visible where bottles accumulate in the line, and which mechanism has insufficient power, often breaks down, or is serviced by an inexperienced operator. If there are several congestion points on the line, then the situation is more complicated, and additional methods must be used to find the most critical bottleneck.

Bandwidth

The throughput of the entire production line directly depends on the output of the bottleneck equipment. This characteristic will help you find the main bottleneck of the production process. Increasing the output of a piece of equipment that is not a bottleneck will not significantly affect the overall output of the line. By checking all the equipment one by one, you can identify the bottleneck - that is, the step whose power increase will most affect the output of the entire process.

Full power

Most production lines track the utilization percentage of each piece of equipment. Machines and stations have a fixed capacity and are used in the production process at a certain percentage of maximum power. The station that uses maximum power is bottleneck. Such equipment constrains the percentage of power utilization of other equipment. If you increase the bottleneck power, the power of the entire line will increase.

Expectation

The production process also takes into account downtime and waiting times. When there is a bottleneck on the line, the equipment that goes directly to it sits idle for a long time. Bottleneck delays production and the next machine does not receive enough material to operate continuously. When you find a machine with a long wait time, look for the bottleneck in the previous step.

In addition to monitoring production, the following tools are used to identify bottlenecks:

Value Stream Mapping - map of creating value streams

Once you understand the cause or causes of bottlenecks, you need to determine actions to expand the bottleneck and increase production. You may need to relocate employees to the problem area or hire additional staff and equipment.

Bottlenecks can occur where operators reconfigure equipment to produce a different product. In this case, you need to think about how to reduce downtime. For example, changing the production schedule to reduce the number of changeovers or reduce their impact.

How to reduce the impact of bottlenecks

Bottleneck Management suggests manufacturing companies take three approaches to reduce the impact of bottlenecks.

First approach

Increasing the capacity of existing bottlenecks.

There are several ways to increase the capacity of bottlenecks:

  1. Add resources to the limiting process. It is not necessary to hire new employees. Cross-functional staff training can reduce the impact of bottlenecks at little cost. In this case, workers will service several stations at once and facilitate the passage of bottlenecks.
  2. Ensure uninterrupted supply of parts to the bottleneck. Always keep an eye on the work-in-process before the bottleneck, manage the flow of resources to the bottleneck station, take into account overtime, during which the equipment must also always have parts to process.
  3. Make sure the bottleneck only works with quality parts. Don't waste power and bottleneck time on scrap processing. Place quality control points in front of bottleneck stations. This will increase the throughput of the process.
  4. Check the production schedule. If a process produces several different products that require different bottleneck times, adjust the production schedule so that the overall bottleneck demand decreases
  5. Increase the operating time of limiting equipment. Let bottleneck last longer than other equipment. Assign an operator to service the process during lunch breaks, scheduled downtime and, if necessary, overtime. Although this method will not reduce cycle time, it will keep the bottleneck running while the rest of the equipment is idle.
  6. Reduce downtime. Avoid planned and unplanned downtime. If the bottleneck equipment fails during the operating process, immediately send a repair team to repair it and get it running. Also try to reduce the time it takes to change equipment from one product to another.
  7. Improve the process at the bottleneck. Use VSM to eliminate non-value-adding activities and reduce time to add value while eliminating waste. In the end you will get more a short time cycle.
  8. Redistribute the load on the bottleneck. If possible, split the operation into parts and assign them to other resources. The result is shorter cycle times and increased power.


Second approach

Sale of surplus production produced by non-bottleneck equipment.

For example, you have 20 injection presses on your line, but you only use 12 of them because the bottleneck equipment cannot process the output of all 20 presses. In this case, you can find other companies that are interested in subcontracting injection molding operations. You will be profitable because you will receive more from subcontractors than your variable costs.


Third approach

Reduce unused power.

The third option for optimizing production is to sell off equipment with extra capacity and reduce or relocate the personnel who service it. In this case, the power of all equipment will be equalized.


Examples of bottleneck outside of production

Transport

A classic example is traffic jams, which can constantly form in certain places, or appear temporarily during an accident or road work. Other examples are a river lock, a forklift, a railway platform.

Computer networks

A slow WiFi router connected to an efficient, high-bandwidth network is a bottleneck.

Communication

A developer who spends six hours a day in meetings and only two hours writing code.

Software

Applications also have bottlenecks - these are code elements where the program “slows down”, forcing the user to wait.

Computer hardware

Computer bottlenecks are hardware limitations in which the power of the entire system is limited to a single component. Often the processor is seen as the limiting component for the graphics card.

Bureaucracy

In everyday life, we often encounter bottlenecks. For example, when forms for passports or driver's licenses suddenly run out and the entire system stops. Or when you need to undergo a medical examination, but the fluorography room is open only three hours a day.

Verdict

Bottlenecks in production, management and life are points of potential improvement.

Extending the bottleneck will provide a significant increase in productivity and efficiency.

And not paying attention to the limiting elements of the system means not making enough profit and working below your capabilities.




Top