Raspberry Pi as server

Setup & config options

Apache 2.4+ LAMP server

»Windward« server

The game & not the islands
 Setup & config options

Some off-topics
Free weather app. Weather widget

Lat. 52.27, Long. 8.01

 

Meteorological service

Front desk clerk

🚫  No ads & tracking

Tag cloud
QRC bookmark

QR quick response code




Github VSCO Vimeo | Madeira | Madeirense Watch on Youtube

The prologue


Most of the articles, descriptions and instructions written here are applicable to the most common Debian-based Linux derivatives. Depending on the respective operating system, there may be minor or major discrepancies.
This website is for educational purposes only. Please do not deploy anything in manufacturing plants.
No warranty or compensation is given for loss of data or hardware.

It should be also mentioned that this modest web server is hosted on a Raspberry Pi type 4B at home.


The Raspberry Pi mini-computer board as multi-purpose server deployed
A competent allrounder for domestic purposes and micro-enterprises


Raspberry Pi : Apache as multiple web server. Print server, scan server, backup and NAS server. Raspberry Pi : Apache as multiple web server. Print server, scan server, backup and NAS Server.

Raspberry Pi is a series of small single-board computers (SBCs) developed in the United Kingdom by the Raspberry Pi Foundation in association with Broadcom. The mini-computer with its armv7l processor has quickly become the favourite of hobbyists. Projects can be started with suitable Linux distributions. Even an aged RasPi e.g. the models 2B and 2B+ can definitely serve to simple tasks quite well.


»robots.txt« | Prologue

Apache v2.4+ web server | »robots.txt« and xml-sitemaps, RSS feeds


Transfer the »robots.txt« in the top-level directory (root) of your Apache v2.4+ web server e.g.


/var/www/hmtl/


Use all lower case for the filename : »robots.txt« - never Robots.TXT or something similar.


»robots.txt« informs search engine spiders (bots) how to interact with indexing your web content.

If you do not have any »robots.txt« file placed, the web server log will return 404 error whenever a bot tries to access. Upload a blank text file named »robots.txt« if you deserve to stop getting 404 error messages.


Some search engines allow you to specify the address of a XML-sitemap, but if your site is small you do not need to create an XML-sitemap.


Please also note that if your »robots.txt« contains errors and spiders won’t be able to recognize the commands they will continue crawling thru your domain structure.


Subdomains | How to deal with the »robots.txt« correctly ❓


Two subdomains lead to /images/ and /contents/.


Alter your main .htaccess file in /root and the system redirects in future both from the folders /images/ and /contents/ to /root.


        Redirect 301 /contents/robots.txt http://example.com/robots.txt
        Redirect 301 /images/robots.txt http://example.com/robots.txt
	

10-Jan 2021
Updated 08-Feb 2021


The plain robots.txt | Longer and shorter version


The longer edition proofed by the »Bing Webmaster Tools« :


        #    robots.txt
        #    https://yourdomain.tld
        #    27-Jul 2022

        # Deny bots those do not benefit

        User-agent: AdsBot-Google
        User-agent: AdsBot-Google-Mobile
        User-agent: APIs-Google
        User-agent: Amazonbot
        User-agent: Baiduspider
        User-agent: Barkrowler
        User-agent: Bytespider
        User-agent: DomainStatsBot
        User-agent: facebookexternalhit
        User-agent: Facebot
        User-agent: linkdexbot
        User-agent: MJ12bot
        User-agent: PetalBot
        User-agent: SBSearch
        User-agent: Silk
        User-agent: Sogou
        User-agent: YandexBot
        User-agent: YandexImages
        Disallow: /

        # Global rules

        User-agent: Applebot
        User-agent: Bingbot
        User-agent: BingPreview
        User-agent: coccocbot-image
        User-agent: coccocbot-web
        User-agent: DuckDuckBot
        User-agent: DuckDuckBot-Https
        User-agent: DuckDuckGo-Favicons-Bot
        User-agent: Ecosia-Bot
        User-agent: Exabot
        User-agent: Google-Read-Aloud
        User-agent: Googlebot
        User-agent: Googlebot-Image
        User-agent: Googlebot-News
        User-agent: Googlebot-Video
        User-agent: ia_archiver
        User-agent: IonCrawl
        User-agent: Mediapartners-Google
        User-agent: MojeekBot
        User-agent: Msnbot
        User-agent: Neevabot
        User-agent: Quantify
        User-agent: SeekportBot
        User-agent: SeznamBot
        User-agent: Slurp
        User-agent: Storebot-Google
        User-agent: Twitterbot
        Disallow: /cgi-bin/
        Disallow: /directory/
        Disallow: /directory-2/
        Disallow: /page.html/
        Disallow: /page-2.php/
        Disallow: /pages/page-3.asp

        # Specific rules

        User-agent: WellKnownBot
        Allow: /.well-known/security.txt
        Allow: /ads.txt
        Allow: /humans.txt
        Allow: /robots.txt
        Disallow: /

        # Sitemaps

        Sitemap: https://dosboot.org/sitemap.xml
        Sitemap: https://dosboot.org/urllist.txt
        Sitemap: https://dosboot.org/sitemap-images.xml
        Sitemap: https://dosboot.org/urllist-images.txt
        Sitemap: https://dosboot.org/sitemap-videos.xml
        Sitemap: https://dosboot.org/urllist-videos.txt

        #    End Of File
        

The shorter edition proofed by the »Bing Webmaster Tools« :


        #    robots.txt
        #    https://yourdomain.tld
        #    27-Jul 2022

        # Deny bots those do not benefit

        User-agent: AdsBot-Google
        User-agent: AdsBot-Google-Mobile
        User-agent: APIs-Google
        User-agent: Amazonbot
        User-agent: Baiduspider
        User-agent: Barkrowler
        User-agent: Bytespider
        User-agent: DomainStatsBot
        User-agent: facebookexternalhit
        User-agent: Facebot
        User-agent: linkdexbot
        User-agent: MJ12bot
        User-agent: PetalBot
        User-agent: SBSearch
        User-agent: Silk
        User-agent: Sogou
        User-agent: YandexBot
        User-agent: YandexImages
        Disallow: /

        # Global rules

        User-agent: *
        Disallow: /cgi-bin/
        Disallow: /directory/
        Disallow: /directory-2/
        Disallow: /page.html/
        Disallow: /page-2.php/
        Disallow: /pages/page-3.asp

        # Specific rules

        User-agent: WellKnownBot
        Allow: /.well-known/security.txt
        Allow: /ads.txt
        Allow: /humans.txt
        Allow: /robots.txt
        Disallow: /

        # Sitemaps

        Sitemap: https://dosboot.org/sitemap.xml
        Sitemap: https://dosboot.org/urllist.txt
        Sitemap: https://dosboot.org/sitemap-images.xml
        Sitemap: https://dosboot.org/urllist-images.txt
        Sitemap: https://dosboot.org/sitemap-videos.xml
        Sitemap: https://dosboot.org/urllist-videos.txt

        #    End Of File
        

»robots.txt« | Which configuration does this server deploy ❓


This is not any secret : https://dosboot.org/robots.txt


21-Jan 2021


Structure of a XML-sitemap


        <!-- sitemap.xml -->
<?xml version="1.0" encoding="UTF-8"?/> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url> <loc>http://www.yourdomain.tld/index.php</loc> <lastmod>2017-10-13 00:27:25+00:00</lastmod> <changefreq>weekly</changefreq> <priority>1.00</priority> </url>
<url> <loc>http://www.yourdomain.tld/contents/admin.html</loc> <lastmod>2017-10-13 00:27:25+00:00</lastmod> <changefreq>weekly</changefreq> <priority>0.50</priority> </url>
</urlset>

Structure of a XML-sitemap to images
Primary used by Googlebot-Image exclusively


        <!-- sitemap-images.xml -->
<?xml version="1.0" encoding="UTF-8"?/> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<!-- just one image in your page -->
<url> <loc>http://www.yourdomain.tld/contents/admin.html</loc> <image:image> <image:loc>http://www.yourdomain.tld/images/loony.jpg</image:loc> </image:image> </url>
<!-- more than just one image in your page -->
<url> <loc>http://www.yourdomain.tld/index.php</loc> <image:image> <image:loc>http://www.yourdomain.tld/images/fiddler.png</image:loc> </image:image> <image:image> <image:loc>http://www.yourdomain.tld/images/backpiper.gif</image:loc> </image:image> </url>
</urlset>

How to PING your *.xml sitemaps to MS Bing, Yahoo! & Google ❓


Assuming the sitemaps are located in http://yourdomain.tld/ use pings to the URLs below to get the updated *.xml files fetched.


Bing (MSN) / Yahoo! Slurp


        https://www.bing.com/webmaster/ping.aspx?sitemap=
        https://yourdomain.tld/sitemap.xml
        

18-Sep 2018 : Anonymous URL Submission Tool Being Retired
Saying Goodbye is never easy, but the time has come to announce the withdrawal of anonymous non-signed in support Bing's URL submission tool. Webmaster will still be able to log in and access Submit URL tool in Bing Webmaster Tool ... Bing blogs



Strangewise the submmissions of simple text-based feeds urllist.txt to Bing is still accepted.


Google Inc.


        https://www.google.com/webmasters/sitemaps/ping?sitemap=
            https://yourdomain.tld/sitemap.xml
https://www.google.com/webmasters/sitemaps/ping?sitemap= https://yourdomain.tld/sitemap-images.xml

Google sitemap notification.


Attention! You should never ever overdo with anonymous submissions.


09-Oct 2017
Updated 04-Jul 2021


Structures of a blank »ads.txt« and a blank »sellers.json«


Here are two examples to escape any 404 document error messages.

The both files are targeted to all non-direct / non-resellers.


Put the files in /root of your web server and the bots are satisfied.


»ads.txt«


        # ads.txt        https://example.com/
        example.com, * , DIRECT, *
        example.com, * , RESELLER, *
        # not any advertisements here to find
        

»sellers.json«


        {
        "contact_email": "",
        "contact_address": "",
        "version": "1.0",
        "identifiers": [
        {
        "name": "",
        "value": ""
        }
        ],
        
        "sellers": [
        {
        "seller_id": "",
        "seller_type": "",
        "name": "",
        "domain": "example.com",
        "comment": "nothing to sell/resell here"
        }
        ]
        }
	

23-May 2021


The structure of a really simple RSS feed


        <?xml version="1.0" encoding="UTF-8"?>
        <rss version="2.0">

        xmlns:content="http://purl.org/rss/1.0/modules/content/"
        xmlns:wfw="http://wellformedweb.org/CommentAPI/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:atom="http://www.w3.org/2005/Atom"

        <channel>

	       <title>Main header</title>
	       <link>http://example.com</link>
	       <description>Use a short header description</description>
	       <language>en-gb</language>

	       <item>
	       <title>First article</title>
	       <description>Use a short description</description>
	       <!-- Link to article -->
	       <link>http://example.com/one.html</link>
	       <!-- Date published -->
	       <pubDate>Wed, 30 Apr 2018 00:00:00 +0200</pubDate>
	       </item>

	       <item>
	       <title>Second article</title>
	       <description>Use a short description</description>
	       <!-- Link to article -->
	       <link>http://example.com/second.html</link>
	       <!-- Date published -->
	       <pubDate>Wed, 30 Apr 2018 00:00:00 +0200</pubDate>
	       </item>

        </channel>

        </rss>
        

The <pubDate> code is not needed, but be aware that the format is strict.
The +0200 indicates that time 2 hours ahead of GMT.


If you have one character missing - e.g. a closing tag - you get a white screen within your web browser.
Even if you are adding characters other than standard letters (&) and numerals.


20-Jul 2018

dosboot.org 2024 | Design and layout handmade in Northwest Europe