Make XML sitemap off-line with perl
Disclaimer:
The use of this code is at your own risk.
Description:
Create an XML sitemap for google.
Google does use xml sitemaps to index websites.
It just is a list of url's of your website in format.
See https://www.google.com/sitemaps/protocol.html
How to use:
1: Be sure perl language is installed on your W32 system.
http://strawberryperl.com/
2: Put this script in the root directory of your site.
https://comweb.nl/perl/makeSitemapXML/makeSitemapXML.zip
Need to unzip it contains this instruction and the makeSitemapXML.pl file
Not forget to edit line 16
3: open the comand prompt and cd to this directory.
4: execute perl makeSitemapXML.pl > mySitemapName.xml
5: upload the mySitemapName.xml to your Internet site.
6: Register the mySitemapName.xml at google for indexing your site.
Purpose:
I wrote this because I could not fine any usefull tools for easily creating xml sitemaps
My site does contain more than 800 html links
It is just a quick fix
\Performance:
It seems to work perfectly.
But is a beta
Disadvantage: be carefull to not upload more than needed from your hard drive.
It indexes curent and all directories above
Donation:
If you want to support my work, thanks
Source Code description:
Will add this later
See the instructions in the source file
1: #!/usr/bin/perl
2:
3: use strict;
4: use warnings;
5: use Cwd 'abs_path';
6: use POSIX;
7:
8: # (c)2019 Comweb NL All RIGHTS RESERVED
9: # RJHMM van den Bergh sales2@comweb.nl
10: # use at own risk, no liability
11:
12: # usage cd to/Your/HTML/root/direcory/on/your/win32/system
13: # perl makeXML.pl > myXMLmap.xml
14:
15: # EDIT THIS ONE IF NEEDED
16: my $prefix="https://www.comweb.nl";
17:
18: print <<HEADER;
19:
20: <urlset
21: xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
22: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
23: xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
24: http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
25:
26: <!-- Created by Perl script provided by Comweb Netherlands -->
27: <!-- (c)Comweb.nl RJHM van den Bergh -->
28: <!-- Free for non commercial use -->
29:
30: HEADER
31:
32: my $startlocationDirectory=abs_path(".");
33: my $level=0; # directory level
34:
35: getdir($startlocationDirectory);
37:
38: sub getdir
39: {
40: my $dir = shift;
41: my $path="";
42: opendir (my $dh, $dir) || die "Cannot opendir: $dir: $!";
43:
44: while( my $file = readdir($dh))
45: {
46: next if ($file =~ m[^\.{1,2}$]); # Ignore . and ..
47: if (-d $file) # is directory?
48: {
49: # directory found add a comment in the sitemap for readability
50: print "<!-- DIR $prefix/$file -->\n";
51:
52: # go one directory level up
53: $path = $dir .'/'. $file; # note: $file here is a directory name
54: chdir($path); $level++; # change working directory
55: $path=getcwd(); # get working directory
56: $path =~ s/\\/\//g ; # change \ (w32) to / (Internet url)
57:
58: getdir($path); # recursive call of this function
59:
60: # getting out of the directory
61: # so go level down
62: chdir(".."); $level--; # change working directory
63: $path=getcwd(); # get working directory
64: $path =~ s/\\/\//g ; # change \ (w32) to / (Internet url)
65:
66: } elsif (-f $file) #no directory perhaps is a file?
67: {
68: # it is not a directory but a file
69: # only look for certain types of extensions
70: if ($file =~ m\(htmL|htm|txt|shtml|zip|java|php|jpg|png|jpeg|gif|avi|pdf|mpg|mpeg)$\i)
71: {
72: # print the <url> (mandantory)
73: print "<url>\n";
74:
75: # calculate Internet location url
76: my $filepath=getcwd(); # get working directory
77: $filepath =~ s/\\/\//g ; # change \ (w32) to / (Internet url)
78: # need to replace first part of path c:\.... with https://...... ($prefix)
79: my $location = $filepath;
80: $location =~ s/$startlocationDirectory/$prefix/g;
81: $location .="/$file"; # not forget to append the filename
82:
83: # some characters are not allowed in urls like & < > and whitespace
84: my $replace="&";
85: $location =~ s/\&/$replace/g ; #subtitute & with &
86: $replace="%20";
87: $location =~ s/ /$replace/g ; #substute whitespace with %20
88:
89: # print the <loc>https://....myurl</loc> (mandantory)
90: print " <loc>$location</loc>\n";
91:
92: # calculate a priority
93: my $priority=ceil((1*0.9**$level)*100)/100;
94: # print the <priority>x.x</priority> tag
95: print " <priority>$priority</priority>\n";
96:
97: # close the <url> tag
98: print "</url>\n";
99: }
100: }
101: }
102: closedir ($dh) || die "Failed to close directory: $!";
103: }
104:
105: # print footer with closing </urlset>
106: print <<FOOTER;
107:
108: </urlset>
109:
110: <!-- Created by Perl script provided by Comweb Netherlands -->
111: <!-- (c)2019 Comweb.nl RJHM van den Bergh -->
112: <!-- Free for non commercial use -->
113: <!-- use at own risk, beta version -->
114:
115: FOOTER