Setup & config options
Apache 2.4+ LAMP server
The game & not the islands
Setup & config options
North Atlantic : Macaronésia
Nine Azorean islands🚫 No ads & tracking
Most of the articles, descriptions and instructions written here are applicable to the most common Debian-based Linux derivatives. Depending on the respective operating system, there may be minor or major discrepancies.
This website is for educational purposes only. Please do not deploy anything in manufacturing plants.
No warranty or compensation is given for loss of data or hardware.
It should be also mentioned that this modest web server is hosted on a Raspberry Pi type 4B at home.
This article is related to Webalizer log analyser.
Precautions, cutback, elimination, prevention.
Referrer spam (also known as referral spam, log spam or referrer bombing[1]) is a kind of spamdexing (spamming aimed at search engines). The technique involves making repeated web site requests using a fake referrer URL to the site the spammer wishes to advertise. Sites that publish their access logs, including referer statistics, will then inadvertently link back to the spammer's site. These links will be indexed by search engines as they crawl the access logs, improving the spammer's search engine ranking. Except for polluting their statistics, the technique does not harm the affected sites.
🔧 Wikipedia
»... , the technique does not harm the affected sites.«
Caution That is not correct !
Worse referral links may harm the affected sites. You can get bad reputations from major, popular search engines. Clickable links even may direct your visitors or yourself to infected websites.
You'll need access & full administrative rights to deal with :
/var/www/html # http root directory /var/www/html/.htaccess # directive file, server behaviour /var/www/html/robots.txt # directive file, webcrawlers /var/www/html/OutputDir/ # Webalizer output directory /var/www/html/OutputDir/webalizer.current # Webalizer main database /etc/webalizer/webalizer.conf # Webalizer configuration file /var/log/apache2/access.log # Apache server access log file /var/log/apache2/error.log # Apache server error log file
Rules of thumb Do never ever use the so-called bulk submission services, even if they offer it for free. Sooner or later you are confrontated with a lot of scammers and spammers. Extra early in your mail box. At first go for the »Big Three Bing/Yahoo! Google«. Always submit your websites in the ordinary manner.
Reputable search engines shall obey the robots.txt
file. Instruct the search engine crawlers to not crawl your statistic pages, they will not publish the links in their directories. Set this on top in your robots.txt
Keep in your minds: same as the .htaccess
the robots.txt
functions like a batch file.
User-agent: * Disallow: /OutputDir/
Do never use the Webalizer's standard output directory '/webalizer'. Spammers could search for that.
Move the content from '/webalizer' to your HDD and copy it back into the new directory.
root@raspberry:~# nano /etc/webalizer/webalizer.conf
OutputDir /var/www/html/OutputDir # whatever you name it
Leave the META-tag with "noindex, nofollow, none".
HTMLHead <meta name="robots" content="noindex, nofollow, none">
Referrer option determines if entries in the referrer table should be plain text or a HTML link.
LinkReferrer no # standard format, plain text # never use the HTML link option HideURL /OutputDir IgnoreURL /OutputDir
Understanding Apache's server »access.log« file. Read from the left to the right.
303.202.101.321 # decimal IP address of the client ba04d64a # hexadecimal IP address of the client or 303-202-101-321 # IP address of the client in combination with .client.example.com # identity of the client determined by identd # on the client’s machine. Returns a hyphen (-) # if this information is not available [28/Dec/2017:10:34:12] # time that the request was received "GET /pic.png http/1.1" # request line http method used & source 200 # status code, the server sends back 5867 # size of the object requested "http://spam.com.ua/" # the referral link. Returns a hyphen (-) # if this information is not available "Mozilla/5.0 (...)" # the User-agent. Returns a hyphen (-) # if this information is not available
Once you got infected by a evil SEO-service provider, so what's next? Take the prescription in ❸ first.
Consult /var/log/apache2/access.log
and /var/log/apache2/error.log
for investigation.
Study https://httpd.apache.org/docs/2.4+/logs.html to Apache v2.4+.
Limit the »bad bots« activities by the .htaccess
directives. This only limits the bandwidth taken.
You may limt the access from various IPs, domains and top level domains (TLD).
Set this almost on top in the .htaccess
. Results in error code 403 »Access forbidden«.
<RequireAll> Require not host domaina.com Require not host domainb.com Require not host domainc.com Require not ip 111.222.333.444 Require not ip 555.666.777 Require not ip 888.999 Require all granted </RequireAll>
Apart from that in case you got spam in your forum, guestbook or any board as well then deploy :
<Limit POST> Require not host domaina.com Require not host domainb.com Require not host domainc.com Require not ip 111.222.333.444 Require not ip 555.666.777 Require not ip 888.999 Require all granted </Limit>
Once you changed the OutputDir, redirect the unwanted to a harmless external page.
Redirect /OutputDirOld https://duckduckgo.com/about ErrorDocument 403 https://duckduckgo.com/about
That's not all. Now we try harder and step ahead to evil's core.
root@raspberry:~# nano /etc/webalizer/webalizer.conf
Scroll down and look for the examples to »IgnoreReferrer«.
Here you can set whatever you desire to keep off as referrals.
Reject top level domains (TLD), domains, IPs, certain expressions appearing in domain names.
IgnoreReferrer .ru # hides a top level domain (ccTDL) IgnoreReferrer .gs IgnoreReferrer .blog IgnoreReferrer westio. # hides a domain IgnoreReferrer essaydates. # hides a domain IgnoreReferrer casino # hides anything with expression »casino« IgnoreReferrer pharma IgnoreReferrer cheap IgnoreReferrer http:// # hides all non-https domains IncludeURL google.* # pass if referrer from a »friendly« bot IncludeURL duckduckgo.* IncludeURL yahoo.* IncludeURL bing.*
Quit good question. Fact has it's a matter of time until you get it solved.
Meanwhile you can manually tidy-up. webalizer.current
is a standard ASCII file.
Study it exactly from the top to the bottom before you know what you want to wipe off.
Prior perform a backup !
root@raspberry:~# nano /var/www/html/OutputDir/webalizer.current
28-Dec 2017Updated 08-Feb 2021
The SetEnvIf
and SetEnvIfNoCase
directives can be used in the following contexts in your global Apache (2.4+) configuration file. E.g. if you get lots of visits from search engine spiders (bots), certain IPs or socalled »referrer spammers«.
I figured out that another method got very effective against referrer spam.
Please follow up the internal link.
Study regularly Apache's access.log
and as well as error.log
.
04-Mar 2018
»ufw« is a front end application for »iptables«. Here you get the basic handling to your personal firewall - but effective one - to IPv4 & IPv6. The "ufw" is a comfortable command line application for managing your personal "iptables" rules in Linux.
Follow this link: Install & configure the socalled "ufw" | Uncomplicated firewall for Linux web servers.
07-Jun 2018
»IgnoreReferrer name« = Ignores all referrers that match »name« even if just partly.
Now open the terminal text editor and edit the configuration file /etc/OutputDir/webalizer.conf
.
user@raspberry:~ $ sudo su
[sudo] Password for user: ******
root@raspberry:/home/user# nano /etc/webalizer/webalizer.conf
You may copy the snippets and paste into the appropriate sections.
Open file in new tab | Download the text file
20-Jul 2019
Updated 16-Nov 2021