Scanning for web resource security. How to protect a web application: basic tips, tools, useful links. Alleged scientific novelty

showed that more than 70% of scanned websites were infected with one or more vulnerabilities.

As a web application owner, how do you ensure that your site is protected from online threats? Or from leakage of confidential information?

If you use a cloud-based security solution, regular vulnerability scanning is likely part of your security plan.

However, if not, you need to perform a routine scan and take action necessary actions to mitigate risks.

There are two types of scanner.

1.Commercial - gives you the ability to automate scanning for continuous security, reporting, alerts, detailed instructions on risk mitigation etc. Some of the famous names in this industry are:

Acunetix
Detectify
Qualys

Open Source/Free - You can download and run security checks on demand.

Not all of them will be able to cover a wide range of vulnerabilities such as the commercial one.

Let's take a look at the following open source vulnerability scanners.

1. Arachni

Arachni is a high-performance security scanner built on top of Ruby for modern web applications.

It is available in binary format for Mac, Windows and Linux.

Not only is it a solution for a basic static or CMS website, but Arachni is also capable of integrating with the following platforms.

It performs active and passive checks.

Windows, Solaris, Linux, BSD, Unix
Nginx, Apache, Tomcat, IIS, Jetty
Java, Ruby, Python, ASP, PHP
Django, Rails, CherryPy, CakePHP, ASP.NET MVC, Symfony

Some of the discovered vulnerabilities:

NoSQL / Blind / SQL / Code / LDAP / Command / XPath injection
Request forgery cross-site scripting
Bypass path
Including local/remote file
Splitting the answer
Cross-site scripting
Undefined DOM redirects
Disclosure source code

2. XssPy

The python based XSS (Cross Site Scripting) vulnerability scanner is used by many organizations including Microsoft, Stanford, Motorola, Informatica, etc.

XssPy by Faizan Ahmad is a smart tool. Instead of just checking the homepage or page, it checks the entire link on the websites.

XssPy also checks the subdomain.

3. w3af

w3af, an open source project started back in late 2006, is based on Python and is available for Linux and Windows OS. w3af is capable of detecting more than 200 vulnerabilities, including OWASP top 10.

It supports various logging methods for reporting. Example:

CSV
HTML
Console
Text
XML
Email address

It is built on a plugin architecture and you can check out all the available plugins.

4. Nikto

An open source project sponsored by Netsparker, it aims to find web server misconfigurations, plugins, and vulnerabilities on the Internet.

5. Wfuzz

Wfuzz (Web Fuzzer) is an application assessment tool for penetration testing.

You can stub the data in the HTTP request for any field to use the web application and validate it.

Wfuzz requires Python on the computer you want to run the scan on.

6. OWASP ZAP

ZAP (Zet Attack Proxy) is one of the famous penetration testing tools that is actively updated by hundreds of volunteers around the world.

It is a cross-platform Java tool that can even run on Raspberry Pi.

The ZIP sits between the browser and the web application to intercept and verify messages.

Some of the following ZAP features are worth mentioning.

Fuzzer
Automatic and passive scanner
Supports multiple scripting languages
Forced View

7. Wapiti

Wapiti crawls the web pages of a given target and looks for scripts and data entry forms to see if it is vulnerable.

This is not a source code security check, but rather a black box check.

It supports GET and POST HTTP methods, HTTP and HTTPS proxies, multiple authentications, etc.

8. Vega

Vega is developed by Subgraph, a multi-platform software written in Java to find XSS, SQLi, RFI and many other vulnerabilities.

Vega got comfortable GUI and is capable of performing automatic scanning by logging into the application with given credentials.

If you are a developer, you can use the vega API to create new attack modules.

9. SQLmap

As you can guess from the name, with it you can perform penetration testing on a database to find flaws.

It works with Python 2.6 or 2.7 on any OS. If you want, then sqlmap will be more useful than ever.

10. Grabber

This little Python-based tool does a few things quite well.

Some of the features of Grabber:

JavaScript Source Code Analyzer
Cross-site scripting, SQL injection, blind SQL injection
Testing PHP Applications Using PHP-SAT

11. Golismero

A framework for managing and running some popular security tools such as Wfuzz, DNS recon, sqlmap, OpenVas, robot analyzer, etc.).

Golismero can consolidate reviews from other tools and show one result.

12. OWASP Xenotix XSS

Xenotix XSS OWASP is an advanced framework for searching and exploiting cross-site scripting.

It has built-in three smart fusers for fast scanning and improved results.

13. Metascan

Scanner for searching web application vulnerabilities from domestic developers

Category: .
Author: Maksadkhan Yakubov, Bogdan Shklyarevsky.

This article discusses the problems of administering web resources, as well as methods, methods and recommendations for secure administration and protection against hacking and cyber attacks.

The first step in designing, creating or operating a secure website is to ensure that the server hosting it is as secure as possible.

The main component of any web server is the operating system. Ensuring its safety is relatively simple: just install it in time Latest updates security systems.

It is worth remembering that hackers also tend to automate their attacks by using malware that goes through one server after another, looking for a server where the update is out of date or has not been installed. Therefore, it is recommended to ensure that updates are installed promptly and correctly; Any server that has outdated versions of updates installed may be subject to attack.

You should also update all software running on the web server on time. Any software not related to necessary components(for example, a DNS server or remote administration tools such as VNC or Remote Desktop Services) should be disabled or removed. If remote administration tools are required, be careful not to use default passwords or passwords that can be easily guessed. This note applies not only to remote administration tools, but also to user accounts, routers, and switches.

Next important point is antivirus software. Its use is a mandatory requirement for any web resource, regardless of whether it is used as a Windows or Unix platform. When combined with a flexible firewall, antivirus software becomes one of the most effective ways protection against cyber attacks. When a web server becomes the target of an attack, the attacker immediately tries to download hacking tools or malicious software in order to exploit security vulnerabilities. In the absence of high-quality anti-virus software, a security vulnerability can go undetected for a long time and lead to undesirable consequences.

The most the best option When protecting information resources, there is a multi-level approach. On the front flank are the firewall and operating system; the antivirus behind them is ready to fill any gaps that arise.

Based on parameters operating system and web server functionality, the following general techniques for protecting against cyber attacks can be cited:

  • Do not install unnecessary components. Each component carries with it a separate threat; the more there are, the higher the total risk.
  • Keep your operating system and applications up to date with security updates.
  • Use antivirus, turn it on automatic installation updates and regularly check that they are installed correctly.

Some of these tasks may seem difficult, but remember that it only takes a single security hole to attack. Potential risks in this case include theft of data and traffic, blacklisting of the server's IP address, damage to the organization's reputation, and website instability.

According to the degree of criticality of vulnerabilities, as a rule, there are 5 levels that determine the state of the vulnerability. this moment there is a web resource (Table 1). Typically, attackers, based on their goals and qualifications, try to gain a foothold on the hacked resource and disguise their presence.

A site hack cannot always be recognized by external signs (mobile redirect, spam links on pages, other people’s banners, defacement, etc.). If the site is compromised, these external signs may not be present. The resource can operate normally, without interruptions, errors, or being included in antivirus blacklists. But this does not mean that the site is safe. The problem is that it is difficult to notice the fact of hacking and downloading hacker scripts without conducting a security audit, and the web shells, backdoors and other hacker tools themselves can remain on the hosting for quite a long time and not be used for their intended purpose. But one day a moment comes when they begin to be severely exploited by an attacker, resulting in problems for the site owner. For spam or posting phishing pages, the site is blocked on the hosting (or part of the functionality is disabled), and the appearance of redirects or viruses on the pages is fraught with a ban from antiviruses and sanctions from search engines. In such a case, it is necessary to urgently “treat” the site, and then install protection against hacking so that the plot does not repeat itself. Often, standard antiviruses do not recognize some types of Trojans and web shells; the reason for this may be untimely updates or outdated software. When checking a web resource for viruses and scripts, you should use antivirus programs of different specializations, in this case a Trojan not found by one antivirus program can be detected by another. Figure 1 shows an example of an antivirus software scan report, and it is important to note the fact that other antivirus programs were unable to detect the malware.

Trojans such as “PHP/Phishing.Agent.B”, “Linux/Roopre.E.Gen”, “PHP/Kryptik.AE” are used by attackers to remote control computer. Such programs often enter a website through email, free software, other websites or chat room. Most of the time, such a program acts as a useful file. However, it is a malicious Trojan that collects users' personal information and transfers it to attackers. In addition, it can automatically connect to certain websites and download other types of malware onto the system. To avoid detection and removal, "Linux/Roopre.E.Gen" may disable security features. This Trojan program is developed using rootkit technology, which allows it to hide inside the system.

  • "PHP/WebShell.NCL" is a Trojan horse program that can perform various functions, such as deleting system files, loading malware, hide existing components or downloaded personal information and other data. This program can bypass general anti-virus scanning and enter the system without the user's knowledge. This program is capable of installing a backdoor for remote users to take control of an infected website. Using this program, an attacker can spy on a user, manage files, install additional software, and control the entire system.
  • "JS/TrojanDownloader.FakejQuery. A" - a Trojan program, the main targets of which are sites developed using the CMS “WordPress” and “Joomla”. When an attacker hacks a website, they run a script that simulates the installation of WordPress or Joomla plugins and then injects malicious JavaScript code into the header.php file.
  • "PHP/small.NBK" - is a malicious application that allows hackers to gain remote access to computer system, allowing them to modify files, steal personal information, and install more malicious software. These types of threats, called Trojan Horses, are usually downloaded by an attacker or downloaded by another program. They may also appear due to the installation of infected applications or online games, as well as when visiting infected sites.

Unfortunately, hacker scripts are not detected by external signs or by external scanners. Therefore, neither search engine antiviruses nor antivirus software installed on the webmaster’s computer will report site security problems. If scripts are located somewhere in the site’s system directories (not in the root or images) or are injected into existing scripts, they will also not be accidentally noticed.

Figure 1. Example of an antivirus software scan report

Therefore, the following recommendations may be necessary measures to protect web resources:

  1. Regular backup all content file system, databases and event logs (log files).
  2. Regularly updating the content management system to the latest stable version of the CMS (content management system).
  3. Using complex passwords. Password requirements: The password must contain at least eight characters, and upper and lower case characters, as well as special characters, must be used when creating the password.
  4. It is mandatory to use security add-ons or plugins to prevent attacks like XSS attack or SQL injection.
  5. The use and installation of add-ons (plugins, templates or extensions) should only be done from trusted sources or official developer websites.
  6. Scanning the file system at least once a week with anti-virus programs and using up-to-date database signatures.
  7. Provide for the use of the CAPTCHA mechanism to protect the website from hacking by brute-force passwords during authorization and entering data into any request form (form feedback, search, etc.).
  8. Restrict the ability to enter administrative panel website control after a certain number of unsuccessful attempts.
  9. Correctly configure the website security policy through the web server configuration file, taking into account parameters such as:
  • limit the number of IP addresses used by the administrator to access the administrative control panel of the website in order to prevent access to it from unauthorized IP addresses;
  • prevent any tags from being transmitted by any means other than text formatting (e.g. p b i u) to prevent XSS attacks.
  1. Moving files containing information about database access, FTP access, etc. from default directories to others and then renaming these files.

Even for a less experienced hacker, it’s quite easy to hack a Joomla website if you don’t provide protection. But, unfortunately, webmasters often put off protecting their site from hacking until later, considering it a non-essential matter. Restoring access to your site will take much more time and effort than taking measures to protect it. The security of a web resource is the task not only of the developer and hoster, who is obliged to ensure maximum security of the servers, but also of the site administrator.

Introduction

IN modern business Web technologies have gained enormous popularity. Most sites large companies are a set of applications that have interactivity, personalization tools, and means of interaction with customers (online stores, remote banking services), and often - means of integration with the company’s internal corporate applications.

However, once a website becomes available on the Internet, it becomes a target for cyber attacks. Most in a simple way attacks on a website today is to exploit the vulnerabilities of its components. And the main problem is that vulnerabilities have become quite common on modern websites.

Vulnerabilities are an imminent and growing threat. They are, for the most part, the result of security defects in the web application code and misconfiguration of website components.

Let's give some statistics. According to data from the report on cyber threats for the first half of 2016 High-Tech Bridge releases web security trends of the first half of 2016, prepared by High-Tech Bridge:

  • over 60% of web services or APIs for mobile applications contain at least one dangerous vulnerability that allows the database to be compromised;
  • 35% of sites vulnerable to XSS attacks are also vulnerable to SQL injections and XXE attacks;
  • 23% of sites contain the POODLE vulnerability, and only 0.43% - Heartbleed;
  • cases of exploitation of dangerous vulnerabilities (for example, allowing SQL injection) during RansomWeb attacks have increased 5 times;
  • 79.9% of web servers have misconfigured or insecure http headers;
  • Today's required updates and fixes are installed on only 27.8% of web servers.

To protect web resources, specialists information security use a different set of tools. For example, SSL certificates are used to encrypt traffic, and a Web Application Firewall (WAF) is installed on the perimeter of web servers, which require serious configuration and long self-learning. An equally effective means of ensuring website security is to periodically check the security status (search for vulnerabilities), and the tools for carrying out such checks are website security scanners, which are also mentioned we'll talk in this review.

Our website already had a review dedicated to web application security scanners - “”, which reviewed products from market leaders. In this review, we will no longer touch on these topics, but will focus on a review of free website security scanners.

The topic of free software is especially relevant today. Due to the unstable economic situation in Russia, many organizations (both commercial and public sector) are currently optimizing their IT budgets, and there is often not enough money to purchase expensive commercial products for analyzing system security. At the same time, there are many free (free, open source) utilities for searching for vulnerabilities that people simply do not know about. Moreover, some of them are not inferior in functionality to their paid competitors. Therefore, in this article we will talk about the most interesting free website security scanners.

What are website security scanners?

Website security scanners are software (hardware and software) tools that search for defects in web applications (vulnerabilities) that lead to violation of the integrity of system or user data, their theft, or gaining control over the system as a whole.

Using website security scanners, you can detect vulnerabilities in the following categories:

  • coding stage vulnerabilities;
  • vulnerabilities in the implementation and configuration phase of a web application;
  • vulnerabilities of the website operation stage.

Vulnerabilities at the coding stage include vulnerabilities associated with incorrect processing of input and output data (SQL injections, XSS).

Vulnerabilities at the website implementation stage include vulnerabilities associated with incorrect settings of the web application environment (web server, application server, SSL/TLS, framework, third-party components, presence of DEBUG mode, etc.).

Vulnerabilities in the operational phase of a website include vulnerabilities associated with the use of outdated software, simple passwords, storing archived copies on a web server in public access, availability of publicly available service modules (phpinfo), etc.

How website security scanners work

In general, the operating principle of a website security scanner is as follows:

  • Collection of information about the object under study.
  • Audit of website software for vulnerabilities using vulnerability databases.
  • Identifying system weaknesses.
  • Formation of recommendations for their elimination.

Categories of website security scanners

Website security scanners, depending on their purpose, can be divided into the following categories (types):

  • Network scanners - this type scanners reveals available network services, installs their versions, determines the OS, etc.
  • Scanners for searching for vulnerabilities in web scripts- this type of scanner searches for vulnerabilities such as SQL inj, XSS, LFI/RFI, etc., or errors (not deleted temporary files, directory indexing, etc.).
  • Exploit Finders- this type of scanner is designed for automated search for exploits in software and scripts.
  • Injection automation tools- utilities that specifically deal with searching for and exploiting injections.
  • Debuggers- tools for fixing errors and optimizing code in a web application.

There are also universal utilities that include the capabilities of several categories of scanners at once.

Below is a brief overview of free website security scanners. Since there are a lot of free utilities, only the most popular free tools for analyzing the security of web technologies are included in the review. When including a particular utility in the review, specialized resources on the topic of web technology security were analyzed:

A Brief Review of Free Website Security Scanners

Network scanners

Nmap

Scanner type: network scanner.

Nmap (Network Mapper) is a free and open source utility. It is designed to scan networks with any number of objects, determine the state of objects of the scanned network, as well as ports and their corresponding services. To do this, Nmap uses many different scanning methods, such as UDP, TCP connect, TCP SYN (half-open), FTP proxy (ftp breakthrough), Reverse-ident, ICMP (ping), FIN, ACK, Xmas tree, SYN and NULL- scanning.

Nmap also supports a wide range of additional features, namely: determining the operating system of a remote host using TCP/IP stack fingerprints, “invisible” scanning, dynamic calculation of latency and packet retransmission, parallel scanning, identifying inactive hosts using parallel ping polling, scanning using false hosts, detecting the presence of packet filters, direct (without using a portmapper) RPC scanning, scanning using IP fragmentation, as well as arbitrary indication of IP addresses and port numbers of scanned networks.

Nmap has received Security Product of the Year status from magazines and communities such as Linux Journal, Info World, LinuxQuestions.Org and Codetalker Digest.

Platform: The utility is cross-platform.

More details from Nmap scanner can be consulted .

IP Tools

Scanner type: network scanner.

IP Tools is a protocol analyzer that supports filtering rules, filtering adapter, packet decoding, protocol description and much more. Detailed information each package is contained in a style tree, a right-click menu allows you to scan the selected IP address.

In addition to the packet sniffer, IP Tools offers a complete set of network tools, including statistics adapter, IP traffic monitoring and much more.

You can find out more about the IP-Tools scanner.

Skipfish

Cross-platform web vulnerability scanner Skipfish from programmer Michal Zalewski performs a recursive analysis of a web application and its dictionary-based check, after which it creates a site map annotated with comments about the detected vulnerabilities.

The tool is being developed internally by Google.

The scanner performs a detailed analysis of the web application. It is also possible to create a dictionary for subsequent testing of the same application. Skipfish's detailed report contains information about detected vulnerabilities, the URL of the resource containing the vulnerability, and the request sent. In the report, the obtained data is sorted by severity level and vulnerability type. The report is generated in html format.

It is worth noting that the Skipfish web vulnerability scanner generates a very large amount of traffic, and scanning takes a very long time.

Platforms: MacOS, Linux, Windows.

You can find out more about the Skipfish scanner.

Wapiti

Scanner type: scanner for searching for vulnerabilities in web scripts.

Wapiti is a console utility for auditing web applications. It works on the “black box” principle.

Wapiti functions as follows: first, the WASS scanner analyzes the site structure, searches for available scripts, and analyzes parameters. Wapiti then turns on the fuzzer and continues scanning until all vulnerable scripts are found.

The Wapiti WASS scanner works with the following types of vulnerabilities:

  • File disclosure (Local and remote include/require, fopen, readfile).
  • Database Injection (PHP/JSP/ASP SQL Injections and XPath Injections).
  • XSS (Cross Site Scripting) injection (reflected and permanent).
  • Command Execution detection (eval(), system(), passtru()…).
  • CRLF Injection (HTTP Response Splitting, session fixation...).
  • XXE (XmleXternal Entity) injection.
  • Use of know potentially dangerous files.
  • Weak .htaccess configurations that can be bypassed.
  • Presence of backup files giving sensitive information (source code disclosure).

Wapiti is included in the utilities of the Kali Linux distribution. You can download the sources from SourceForge and use them on any distribution based on the Linux kernel. Wapiti supports GET and POST HTTP request methods.

Platforms: Windows, Unix, MacOS.

You can find out more about the Wapiti scanner.

Nessus

The Nessus scanner is a powerful and reliable tool that belongs to the family network scanners, allowing you to search for vulnerabilities in network services offered by operating systems, firewalls, filtering routers and other network components. To search for vulnerabilities, they are used as standard means testing and collecting information about the configuration and operation of the network, and special means, emulating the actions of an attacker to penetrate systems connected to the network.

You can find out more about the Nessus scanner.

bsqlbf-v2

Scanner type: injection automation tool.

bsqlbf-v2 is a script written in Perl. Brute forcer for blind SQL injections. The scanner works with both integer values ​​in the URL and string values.

Platforms: MS-SQL, MySQL, PostgreSQL, Oracle.

You can find out more about the bsqlbf-v2 scanner.

Debuggers

Burp Suite

Scanner type: debugger.

Burp Suite is a collection of relatively independent, cross-platform applications written in Java.

The core of the complex is the Burp Proxy module, which performs the functions of a local proxy server; the remaining components of the set are Spider, Intruder, Repeater, Sequencer, Decoder and Comparer. All components are interconnected into a single whole in such a way that data can be sent to any part of the application, for example, from Proxy to Intruder to conduct various checks on the web application, from Intruder to Repeater for more thorough manual analysis of HTTP headers.

Platforms: cross-platform software.

You can find out more about the Burp Suite scanner.

Fiddler

Scanner type: debugger.

Fiddler is a debug proxy that logs all HTTP(S) traffic. The tool allows you to examine this traffic, set a breakpoint and “play” with incoming or outgoing data.

Functional features of Fiddler:

  • Ability to control all requests, cookies, parameters transmitted by Internet browsers.
  • Function for changing server responses on the fly.
  • Ability to manipulate headers and requests.
  • Function for changing the channel width.

Platforms: cross-platform software.

You can find out more about the Fiddler scanner.

N-Stalker Web Application Security Scanner X Free Edition

Scanner type: scanner for searching for vulnerabilities in web scripts, exploit search tool.

An effective tool for web services is N-Stealth Security Scanner from N-Stalker. The company sells a more fully featured version of N-Stealth, but it's free trial version quite suitable for simple evaluation. The paid product has more than 30 thousand web server security tests, but also free version detects more than 16 thousand specific gaps, including vulnerabilities in such widely used web servers as Microsoft IIS and Apache. For example, N-Stealth looks for vulnerable Common Gateway Interface (CGI) and Hypertext Preprocessor (PHP) scripts and uses attacks to penetrate SQL Server, typical cross-site scenarios and other gaps in popular web servers.

N-Stealth supports both HTTP and HTTP Secure (HTTPS - using SSL), matches vulnerabilities against the Common Vulnerabilities and Exposures (CVE) dictionary and the Bugtraq database, and generates decent reports. N-Stealth is used to find the most common vulnerabilities in web servers and helps identify the most likely attack vectors.

Of course, for a more reliable assessment of the security of a website or applications, it is recommended to purchase a paid version.

You can find out more about the N-Stealth scanner.

conclusions

Testing websites to identify vulnerabilities is a good preventative measure. Currently, there are many commercial and freely available website security scanners. At the same time, scanners can be both universal (comprehensive solutions) and specialized, designed only to identify certain types of vulnerabilities.

Some free scanners are quite powerful tools and show great depth and good quality website checks. But before using free utilities to analyze the security of websites, you need to make sure of their quality. Today there are already many methods for this (for example, Web Application Security Scanner Evaluation Criteria, OWASP Web Application Scanner Specification Project).

Only comprehensive solutions can provide the most complete picture of the security of a particular infrastructure. In some cases, it is better to use several security scanners.

1. Goal and objectives

The purpose of the work is to develop algorithms for increasing the security of access to external information resources from corporate educational networks, taking into account their characteristic security threats, as well as the characteristics of the user population, security policies, architectural solutions, and resource support.

Based on the goal, the following tasks are solved in the work:

1. Perform an analysis of the main threats to information security in educational networks.

2. Develop a method for limiting access to unwanted information resources in educational networks.

3. Develop algorithms that allow scanning web pages, searching for direct connections and downloading files for further analysis of potentially malicious code on sites.

4. Develop an algorithm for identifying unwanted information resources on websites.

2. Relevance of the topic

Modern intelligent training systems are Web-based and provide their users with the ability to work with various types local and remote educational resources. Problem safe use information resources (IR) posted on the Internet are constantly becoming increasingly relevant. One of the methods used to solve this problem is to limit access to unwanted information resources.

Operators providing Internet access to educational institutions are required to ensure that access to unwanted information is limited. The restriction is carried out by filtering by operators using lists that are regularly updated in accordance with the established procedure. However, given the purpose and user audience of educational networks, it is advisable to use a more flexible, self-learning system that will dynamically recognize unwanted resources and protect users from them.

In general, access to unwanted resources carries the following threats: propaganda of illegal and asocial actions, such as: political extremism, terrorism, drug addiction, distribution of pornography and other materials; distracting students from using computer networks for educational purposes; difficulty accessing the Internet due to overload of external channels with limited bandwidth. The resources listed above are often used to inject malware and their associated threats.

Existing systems for restricting access to network resources have the ability to check not only individual packets for compliance with specified restrictions, but also their content - content transmitted through the network. Currently, content filtering systems use the following methods for filtering web content: by DNS name or specific IP address, by keywords within web content and by file type. To block access to a specific Web site or group of sites, you must specify multiple URLs that contain inappropriate content. URL filtering provides thorough control over network security. However, it is impossible to predict in advance all possible inappropriate URLs. In addition, some web sites with dubious content do not work with URLs, but exclusively with IP addresses.

One way to solve the problem is to filter content received via the HTTP protocol. The disadvantage of existing content filtering systems is the use of statically generated access control lists. To fill them, developers of commercial content filtering systems hire employees who divide content into categories and rank records in the database.

To eliminate the shortcomings of existing content filtering systems for educational networks, it is relevant to develop web traffic filtering systems with dynamic determination of the category of a web resource based on the content of its pages.

3. Perceived scientific novelty

An algorithm for restricting access of users of intelligent learning systems to unwanted resources on Internet sites, based on the dynamic formation of access lists to information resources through their delayed classification.

4. Planned practical results

The developed algorithms can be used in systems for restricting access to unwanted resources in intelligent computer learning systems.

5. Review of research and development

5.1 Overview of research and development on the topic at the global level

The work of such famous scientists as: H.H. is devoted to the problems of ensuring information security. Bezrukov, P.D. Zegzda, A.M. Ivashko, A.I. Kostogryzov, V.I. Kurbatov K. Lendver, D. McLean, A.A. Moldovyan, H.A. Moldovyan, A.A. Malyuk, E.A. Derbin, R. Sandhu, J.M. Carroll, and others. At the same time, despite the overwhelming volume of text sources in corporate and open networks, in the field of developing methods and systems for information security, research aimed at analyzing security threats and studying limiting access to unwanted resources in computer training with access to Web.

In Ukraine, the leading researcher in this area is V.V. Domarev. . His dissertation research is devoted to the problems of creating complex information security systems. Author of the books: “Safety information technologies. Methodology for creating protection systems”, “Security of information technologies. Systematic approach”, etc., author of more than 40 scientific articles and publications.

5.2 Review of research and development on the topic at the national level

At Donetsk National Technical University, development of models and methods for creating an information security system corporate network Khimka S.S. was involved in the enterprise, taking into account various criteria. . The protection of information in educational systems was occupied by Yu.S. .

6. Problems of restricting access to web resources in educational systems

The development of information technology currently allows us to talk about two aspects of describing resources: Internet content and access infrastructure. The access infrastructure is usually understood as a set of hardware and software, providing data transmission in the IP packet format, and content is defined as a combination of presentation form (for example, as a sequence of characters in a certain encoding) and content (semantics) of information. Among the characteristic properties of such a description, the following should be highlighted:

1. independence of content from the access infrastructure;

2. continuous qualitative and quantitative changes in content;

3. the emergence of new interactive information resources (“live journals”, social media, free encyclopedias, etc.), in which users directly participate in the creation of online content.

When solving problems of access control to information resources, the issues of developing security policies, which are resolved in relation to the characteristics of the infrastructure and network content, are of great importance. The higher the level of description of the information security model, the more access control is focused on the semantics of network resources. Obviously, MAC and IP addresses (link and network layer interaction) of network device interfaces cannot be tied to any category of data, since the same address can represent different services. Port numbers (transport layer), as a rule, give an idea of ​​the type of service, but do not qualitatively characterize the information provided by this service. For example, it is not possible to classify a particular Web site into one of the semantic categories (media, business, entertainment, etc.) based solely on transport layer information. Security information security at the application level it comes close to the concept of content filtering, i.e. access control taking into account the semantics of network resources. Consequently, the more content-oriented the access control system is, the more differentiated the approach in relation to different categories of users and information resources can be implemented with its help. In particular, a semantically oriented control system can effectively limit the access of students in educational institutions to resources that are incompatible with the learning process.

Possible options for the process of obtaining a web resource are presented in Fig. 1

Figure 1 - The process of obtaining a web resource via HTTP protocol

To ensure flexible control over the use of Internet resources, it is necessary to introduce an appropriate policy for the use of resources by an educational organization in the operator company. This policy can be implemented either manually or automatically. “Manual” implementation means that the company has a special staff who monitor the activity of educational institution users in real time or using logs from routers, proxy servers or firewalls. Such monitoring is problematic because it requires a lot of labor. To provide flexible control over the use of Internet resources, the company must provide the administrator with a tool to implement the organization's resource use policy. Content filtering serves this purpose. Its essence lies in the decomposition of information exchange objects into components, analysis of the contents of these components, determining the compliance of their parameters with the accepted policy for the use of Internet resources and taking certain actions based on the results of such analysis. In the case of filtering web traffic, the objects of information exchange are understood to be web requests, the contents of web pages, and files transferred upon user request.

Users of the educational organization gain access to the Internet exclusively through a proxy server. Every time you try to gain access to a particular resource, the proxy server checks whether the resource is included in a special database. If such a resource is placed in the database of prohibited resources, access to it is blocked, and the user is given a corresponding message on the screen.

If the requested resource is not in the database of prohibited resources, then access to it is granted, but a record of visiting this resource is recorded in a special service log. Once a day (or at other intervals), the proxy server generates a list of the most visited resources (in the form of a list of URLs) and sends it to experts. Experts (system administrators), using the appropriate methodology, check the resulting list of resources and determine their nature. If the resource is of a non-target nature, the expert classifies it (porn resource, gaming resource) and makes a change to the database. After making all the necessary changes, the updated version of the database is automatically sent to all proxy servers connected to the system. The filtering scheme for non-target resources on proxy servers is shown in Fig. 2.

Figure 2 - Basic principles of filtering non-target resources on proxy servers

The problems with filtering non-target resources on proxy servers are as follows. With centralized filtration, high performance equipment of the central unit is required, large throughput communication channels at the central node, failure of the central node leads to complete failure of the entire filtration system.

With decentralized filtering “in the field” directly on the organization’s workstations or servers, the cost of deployment and support is high.

When filtering by address at the stage of sending a request, there is no preventive reaction to the presence of unwanted content, and difficulties in filtering “masked” websites.

When filtering by content, it is necessary to process large amounts of information when receiving each resource, and the complexity of processing resources prepared using tools such as Java, Flash.

7. Information security of web resources for users of intelligent learning systems

Let's consider the possibility of controlling access to information resources using a common solution based on the hierarchical principle of integrating access control tools to Internet resources (Fig. 3). Restricting access to unwanted IR from IOS can be achieved through a combination of technologies such as firewalling, the use of proxy servers, analysis of anomalous activity to detect intrusions, bandwidth limitation, filtering based on content analysis, filtering based on access lists. In this case, one of the key tasks is the formation and use of up-to-date access restriction lists.

Filtering of unwanted resources is carried out in accordance with current regulatory documents based on lists published in accordance with the established procedure. Restriction of access to other information resources is carried out on the basis of special criteria developed by the operator of the educational network.

User access below the specified frequency, even to a potentially unwanted resource, is acceptable. Only in-demand resources are subject to analysis and classification, that is, those for which the number of user requests has exceeded a specified threshold. Scanning and analysis are carried out some time after the number of requests exceeds the threshold value (during the period of minimal load on external channels).

It is not just single web pages that are scanned, but all resources associated with them (by analyzing the links on the page). As a result, this approach allows you to determine the presence of links to malware during resource scanning.

Figure 3 - Hierarchy of access control tools to Internet resources

(animation, 24 frames, 25 KB)

Automated classification of resources is carried out on the corporate server of the client - the owner of the system. The classification time is determined by the method used, which is based on the concept of delayed resource classification. This assumes that user access below a specified frequency, even to a potentially unwanted resource, is acceptable. This avoids costly on-the-fly classification. Only in-demand resources are subject to analysis and automated classification, that is, resources for which the frequency of user requests has exceeded a specified threshold. Scanning and analysis are carried out some time after the number of requests exceeds the threshold value (during the period of minimal load on external channels). The method implements a scheme for dynamically constructing three lists: “black” (ChSP), “white” (BSP) and “gray” (GSP). Resources on the black list are prohibited from access. The white list contains verified allowed resources. The “grey” list contains resources that were in demand by users at least once, but were not classified. The initial formation and further “manual” adjustment of the “black” list is carried out on the basis of official information about the addresses of prohibited resources provided by the authorized government body. The initial content of the “white” list consists of resources recommended for use. Any request for a resource not on the black list is granted. If this resource is not on the “white” list, it is placed on the “gray” list, where the number of requests to this resource is recorded. If the frequency of requests exceeds a certain threshold value, an automated classification of the resource is carried out, based on which it is included in the “black” or “white” list.

8. Algorithms for determining the information security of web resources for users of intelligent training systems

Access restriction algorithm. Restrictions on access to unwanted resources on Internet sites are based on the following definition of the concept of the risk of access to unwanted IR in IOS. The risk of access to the unwanted i-th IR, classified as the k-th IR class, will be a value proportional to the expert assessment of the damage caused by the unwanted IR of a given type of IOS or the user’s identity and the number of accesses to this resource for a given period of time:

By analogy with the classical definition of risk as the product of the probability of a threat being realized and the cost of the damage caused, this definition interprets risk as the mathematical expectation of the amount of possible damage from access to an unwanted IR. In this case, the amount of expected damage is determined by the degree of impact of the IR on the personalities of users, which in turn is directly proportional to the number of users who experienced this impact.

In the process of analyzing any web resource, from the point of view of the desirability or undesirability of access to it, it is necessary to consider the following main components of each of its pages: content, that is, text and other (graphic, photo, video) information posted on this page; content posted on other pages of the same website (you can get internal links from the content of loaded pages by regular expressions); connections to other sites (both in terms of possible download viruses and Trojans), and from the point of view of the presence of unwanted content. An algorithm for restricting access to unwanted resources using lists is shown in Fig. 4.

Figure 4 - Algorithm for restricting access to unwanted resources

Algorithm for identifying unwanted Web pages. To classify content - web page texts - it is necessary to solve the following problems: specifying classification categories; extracting information from source texts that can be analyzed automatically; creation of collections of classified texts; construction and training of a classifier working with the obtained data sets.

The training set of classified texts is analyzed, identifying terms - the most frequently used word forms in general and for each classification category separately. Each source text is represented as a vector, the components of which are the characteristics of the occurrence of a given term in the text. In order to avoid vector sparsity and reduce their dimension, it is advisable to reduce word forms to their initial form using morphological analysis methods. After this, the vector should be normalized, which allows us to achieve a more correct classification result. For one web page, two vectors can be generated: for the information displayed to the user, and for the text provided to search engines.

There are various approaches to constructing web page classifiers. The most commonly used are: Bayesian classifier; neural networks; linear classifiers; support vector machine (SVM). All of the above methods require training on a training collection and testing on a testing collection. For binary classification, you can choose a naive Bayes solution, which assumes that the characteristics in the vector space are independent of each other. We will assume that all resources must be classified as desirable and undesirable. Then the entire collection of web page text samples is divided into two classes: C=(C1, C2) and the prior probability of each class is P(Ci), i=1,2. With a sufficiently large collection of samples, we can assume that P(Ci) is equal to the ratio of the number of samples of class Ci to the total number of samples. For some sample D to be classified, from the conditional probability P(D/Ci), according to Bayes’ theorem, the value P(Ci /D) can be obtained:

taking into account the constancy of P(D) we obtain:

Assuming that terms in vector space are independent of each other, we can obtain the following relation:

In order to more accurately classify texts whose characteristics are similar (for example, to distinguish between pornography and fiction that describes erotic scenes), weighting coefficients should be introduced:

If kn=k; if kn is less than k, kn.=1/|k|. Here M is the frequency of all terms in the sample database, L is the number of all samples.

9. Directions for improving algorithms

In the future, it is planned to develop an algorithm for analyzing links in order to detect the introduction of malicious code into the code of a web page and compare the Bayesian classifier with the support vector machine.

10. Conclusions

An analysis of the problem of restricting access to web resources in educational systems has been carried out. The basic principles of filtering non-target resources on proxy servers were selected based on the formation and use of current access restriction lists. An algorithm has been developed for restricting access to unwanted resources using lists, which makes it possible to dynamically generate and update access lists to IR based on an analysis of their content, taking into account the frequency of visits and user population. To identify unwanted content, an algorithm based on a naive Bayes classifier has been developed.

List of sources

  1. Winter V. M. Security of global network technologies/ V. Zima, A. Moldovyan, N. Moldovyan. - 2nd ed. - St. Petersburg: BHV-Petersburg, 2003. - 362 p.
  2. Vorotnitsky Yu. I. Protection from access to unwanted external information resources in scientific and educational computer networks/ Yu. I. Vorotnitsky, Xie Jinbao // Mat. XIV Int. conf. "Comprehensive information protection." - Mogilev, 2009. - pp. 70-71.

The best web services with which you can examine sites for vulnerabilities. HP estimates that 80% of all vulnerabilities are caused by incorrect web server settings, use of outdated software, or other problems that could have been easily avoided.

The services in the review help identify such situations. Typically, scanners check against a database of known vulnerabilities. Some of them are quite simple and only check open ports, while others work more carefully and even try to perform SQL injection.

WebSAINT

SAINT is a well-known vulnerability scanner, on the basis of which the WebSAINT and WebSAINT Pro web services are made. As an Approved Scanning Vendor, the service carries out ASV scanning of websites of organizations for which this is required under the terms of PCI DSS certification. It can work according to a schedule and conduct periodic checks, and generates various reports based on scanning results. WebSAINT scans TCP and UDP ports on specified addresses on the user's network. The “professional” version adds pentests and web application scanning and custom reports.

ImmuniWeb

The ImmuniWeb service from High-Tech Bridge uses a slightly different approach to scanning: in addition to automatic scanning, it also offers manual pentests. The procedure begins at the time specified by the client and takes up to 12 hours. The report is reviewed by company employees before being sent to the client. It specifies at least three ways to eliminate each identified vulnerability, including options for changing the source code of the web application, changing firewall rules, and installing a patch.

You have to pay more for human labor than for automatic check. A full scan with ImmuniWeb pentests costs $639.

BeyondSaaS

BeyondTrust's BeyondSaaS will cost even more. Customers are offered a subscription for $3,500, after which they can conduct an unlimited number of audits throughout the year. A one-time scan costs $700. Websites are checked for SQL injections, XSS, CSRF and operating system vulnerabilities. The developers state that the probability of false positives is no more than 1%, and in the reports they also indicate options for correcting problems.

BeyondTrust offers other vulnerability scanning tools, including the free Retina Network Community, which is limited to 256 IP addresses.

Dell Secure Works

Dell Secure Works is perhaps the most advanced of the web scanners reviewed. It runs on QualysGuard Vulnerability Management technology and checks web servers, network devices, application servers and DBMS both within the corporate network and on cloud hosting. The web service complies with PCI, HIPAA, GLBA and NERC CIP requirements.




Top