Raspberry Pi as server

Setup & config options

Apache 2.4+ LAMP server

»Windward« server

The game & not the islands
 Setup & config options

Some off-topics
Free weather app. Weather widget

Lat. 52.27, Long. 8.01

 

Meteorological service

Front desk clerk

🚫  No ads & tracking

Tag cloud
QRC bookmark

QR quick response code




Github VSCO Vimeo | Madeira | Madeirense Watch on Youtube

The prologue


Most of the articles, descriptions and instructions written here are applicable to the most common Debian-based Linux derivatives. Depending on the respective operating system, there may be minor or major discrepancies.
This website is for educational purposes only. Please do not deploy anything in manufacturing plants.
No warranty or compensation is given for loss of data or hardware.

It should be also mentioned that this modest web server is hosted on a Raspberry Pi type 4B at home.


Webalizer analyzer | Referrer ghost spam


Referrer ghost spam

This article is related to Webalizer log analyser.
Precautions, cutback, elimination, prevention.

Prologue


Referrer spam (also known as referral spam, log spam or referrer bombing[1]) is a kind of spamdexing (spamming aimed at search engines). The technique involves making repeated web site requests using a fake referrer URL to the site the spammer wishes to advertise. Sites that publish their access logs, including referer statistics, will then inadvertently link back to the spammer's site. These links will be indexed by search engines as they crawl the access logs, improving the spammer's search engine ranking. Except for polluting their statistics, the technique does not harm the affected sites.
🔧 Wikipedia


»... , the technique does not harm the affected sites.«


Caution ! That is not correct !

Worse referral links may harm the affected sites. You can get bad reputations from major, popular search engines. Clickable links even may direct your visitors or yourself to infected websites.


Preparation


You'll need access & full administrative rights to deal with :

     
     /var/www/html                               # http root directory
     /var/www/html/.htaccess                     # directive file, server behaviour
     /var/www/html/robots.txt                    # directive file, webcrawlers
     
     /var/www/html/OutputDir/                    # Webalizer output directory
     /var/www/html/OutputDir/webalizer.current   # Webalizer main database
     
     /etc/webalizer/webalizer.conf               # Webalizer configuration file
     
     /var/log/apache2/access.log                 # Apache server access log file
     /var/log/apache2/error.log                  # Apache server error log file
     

Precautions


Rules of thumb Do never ever use the so-called bulk submission services, even if they offer it for free. Sooner or later you are confrontated with a lot of scammers and spammers. Extra early in your mail box. At first go for the »Big Three Bing/Yahoo! Google«. Always submit your websites in the ordinary manner.


Reputable search engines shall obey the robots.txt file. Instruct the search engine crawlers to not crawl your statistic pages, they will not publish the links in their directories. Set this on top in your robots.txt


Keep in your minds: same as the .htaccess the robots.txt functions like a batch file.


     User-agent: *
     Disallow: /OutputDir/
     

Do never use the Webalizer's standard output directory '/webalizer'. Spammers could search for that.

Move the content from '/webalizer' to your HDD and copy it back into the new directory.


root@raspberry:~# nano /etc/webalizer/webalizer.conf


     OutputDir /var/www/html/OutputDir     # whatever you name it
     

Leave the META-tag with "noindex, nofollow, none".


     HTMLHead <meta name="robots" content="noindex, nofollow, none">
     

Referrer option determines if entries in the referrer table should be plain text or a HTML link.


     LinkReferrer  no                      # standard format, plain text
                                           # never use the HTML link option

     HideURL       /OutputDir
     IgnoreURL     /OutputDir
     

Intermezzo


Understanding Apache's server »access.log« file. Read from the left to the right.


     303.202.101.321            # decimal IP address of the client
     ba04d64a                   # hexadecimal IP address of the client
     
                                    or
                               
     303-202-101-321            # IP address of the client in combination with
     .client.example.com        # identity of the client determined by identd 
                                # on the client’s machine. Returns a hyphen (-) 
                                # if this information is not available
                               
     [28/Dec/2017:10:34:12]     # time that the request was received
     
     "GET /pic.png http/1.1"    # request line http method used & source
     
     200                        # status code, the server sends back
     
     5867                       # size of the object requested
     
     "http://spam.com.ua/"      # the referral link. Returns a hyphen (-) 
                                # if this information is not available
                              
     "Mozilla/5.0 (...)"        # the User-agent. Returns a hyphen (-) 
                                # if this information is not available
     

Cutback


Once you got infected by a evil SEO-service provider, so what's next? Take the prescription in first.


Consult /var/log/apache2/access.log and /var/log/apache2/error.log for investigation.


Study https://httpd.apache.org/docs/2.4+/logs.html to Apache v2.4+.


Limit the »bad bots« activities by the .htaccess directives. This only limits the bandwidth taken.


You may limt the access from various IPs, domains and top level domains (TLD).


Set this almost on top in the .htaccess. Results in error code 403 »Access forbidden«.


     <RequireAll>
        Require not host domaina.com
        Require not host domainb.com
        Require not host domainc.com
        Require not ip 111.222.333.444
        Require not ip 555.666.777
        Require not ip 888.999
        Require all granted
     </RequireAll>
     

Apart from that in case you got spam in your forum, guestbook or any board as well then deploy :


     <Limit POST>
        Require not host domaina.com
        Require not host domainb.com
        Require not host domainc.com
        Require not ip 111.222.333.444
        Require not ip 555.666.777
        Require not ip 888.999
        Require all granted
     </Limit>
     

Once you changed the OutputDir, redirect the unwanted to a harmless external page.

     
     Redirect /OutputDirOld https://duckduckgo.com/about
     ErrorDocument 403 https://duckduckgo.com/about
     

That's not all. Now we try harder and step ahead to evil's core.


root@raspberry:~# nano /etc/webalizer/webalizer.conf


Scroll down and look for the examples to »IgnoreReferrer«.
Here you can set whatever you desire to keep off as referrals.
Reject top level domains (TLD), domains, IPs, certain expressions appearing in domain names.

     
     IgnoreReferrer   .ru              # hides a top level domain (ccTDL)
     IgnoreReferrer   .gs
     IgnoreReferrer   .blog
     IgnoreReferrer   westio.          # hides a domain
     IgnoreReferrer   essaydates.      # hides a domain
     
     IgnoreReferrer   casino           # hides anything with expression »casino«
     IgnoreReferrer   pharma
     IgnoreReferrer   cheap
     
     IgnoreReferrer   http://          # hides all non-https domains


	 
     IncludeURL       google.*         # pass if referrer from a »friendly« bot
     IncludeURL       duckduckgo.*
     IncludeURL       yahoo.*
     IncludeURL       bing.*
     

Elimination mission impossible


Quit good question. Fact has it's a matter of time until you get it solved.
Meanwhile you can manually tidy-up. webalizer.current is a standard ASCII file.


Study it exactly from the top to the bottom before you know what you want to wipe off.
Prior perform a backup !


root@raspberry:~# nano /var/www/html/OutputDir/webalizer.current


28-Dec 2017Updated 08-Feb 2021


Prevent ghost spam by the »env=!dontlog« directive


The SetEnvIf and SetEnvIfNoCase directives can be used in the following contexts in your global Apache (2.4+) configuration file. E.g. if you get lots of visits from search engine spiders (bots), certain IPs or socalled »referrer spammers«.


I figured out that another method got very effective against referrer spam.

Please follow up the internal link.


Study regularly Apache's access.log and as well as error.log.


04-Mar 2018


Set up the UFW firewall against worse SEO-service provider


»ufw« is a front end application for »iptables«. Here you get the basic handling to your personal firewall - but effective one - to IPv4 & IPv6. The "ufw" is a comfortable command line application for managing your personal "iptables" rules in Linux.


Follow this link: Install & configure the socalled "ufw" | Uncomplicated firewall for Linux web servers.


07-Jun 2018


Webalizer | Personalized spam silencer


»IgnoreReferrer name« = Ignores all referrers that match »name« even if just partly.


Now open the terminal text editor and edit the configuration file /etc/OutputDir/webalizer.conf.

The configuration file is a series of keywords and values, where empty lines and lines beginning with hash marks (#) are ignored.


user@raspberry:~ $ sudo su
[sudo] Password for user: ******
root@raspberry:/home/user# nano /etc/webalizer/webalizer.conf


You may copy the snippets and paste into the appropriate sections.



Open file in new tab | Download the text file


20-Jul 2019
Updated 16-Nov 2021

dosboot.org 2024 | Design and layout handmade in Northwest Europe