making them store-and-forward friendly.
@menu
+* Index files for freqing: FreqIndex.
* Postfix::
* Web feeds: Feeds.
* Web pages: WARCs.
* BitTorrent and huge files: BitTorrent.
+* Downloading service: DownloadService.
* Git::
* Multimedia streaming: Multimedia.
@end menu
+@node FreqIndex
+@section Index files for freqing
+
+In many cases you do not know exact files list on remote machine you
+want to freq from. Because files can be updated there. It is useful to
+run cron-ed job on it to create files listing you can freq and search
+for files in it:
+
+@example
+0 4 * * * cd /storage ; tmp=`mktemp` ; \
+ tree -f -h -N --du --timefmt \%Y-\%m-\%d |
+ zstdmt -19 > $tmp && chmod 644 $tmp && mv $tmp TREE.txt.zst ; \
+ tree -J -f --timefmt \%Y-\%m-\%d |
+ zstdmt -19 > $tmp && chmod 644 $tmp && mv $tmp TREE.json.zst
+@end example
+
@node Postfix
@section Integration with Postfix
-This section is taken from @url{http://www.postfix.org/nncp_README.html,
+This section is taken from @url{http://www.postfix.org/UUCP_README.html,
Postfix and UUCP} manual and just replaces UUCP-related calls with NNCP
ones.
@itemize
-@item You need an @ref{nncp-mail} program that extracts the sender
+@item You need an @ref{nncp-exec} program that extracts the sender
address from mail that arrives via NNCP, and that feeds the mail into
the Postfix @command{sendmail} command.
@item Define a @command{pipe(8)} based mail delivery transport for
delivery via NNCP:
-@verbatim
+@example
/usr/local/etc/postfix/master.cf:
nncp unix - n n - - pipe
- flags=F user=nncp argv=nncp-mail -quiet $nexthop $recipient
-@end verbatim
+ flags=F user=nncp argv=nncp-exec -quiet $nexthop sendmail $recipient
+@end example
-This runs the @command{nncp-mail} command to place outgoing mail into
+This runs the @command{nncp-exec} command to place outgoing mail into
the NNCP queue after replacing @var{$nexthop} by the the receiving NNCP
node and after replacing @var{$recipient} by the recipients. The
-@command{pipe(8)} delivery agent executes the @command{nncp-mail}
+@command{pipe(8)} delivery agent executes the @command{nncp-exec}
command without assistance from the shell, so there are no problems with
shell meta characters in command-line parameters.
@item Specify that mail for @emph{example.com}, should be delivered via
NNCP, to a host named @emph{nncp-host}:
-@verbatim
+@example
/usr/local/etc/postfix/transport:
example.com nncp:nncp-host
.example.com nncp:nncp-host
-@end verbatim
+@end example
See the @command{transport(5)} manual page for more details.
@item Enable @file{transport} table lookups:
-@verbatim
+@example
/usr/local/etc/postfix/main.cf:
transport_maps = hash:$config_directory/transport
-@end verbatim
+@end example
@item Add @emph{example.com} to the list of domains that your site is
willing to relay mail for.
-@verbatim
+@example
/usr/local/etc/postfix/main.cf:
relay_domains = example.com ...other relay domains...
-@end verbatim
+@end example
See the @option{relay_domains} configuration parameter description for
details.
@itemize
-@item You need an @ref{nncp-mail} program that extracts the sender
+@item You need an @ref{nncp-exec} program that extracts the sender
address from mail that arrives via NNCP, and that feeds the mail into
the Postfix @command{sendmail} command.
@item Specify that all remote mail must be sent via the @command{nncp}
mail transport to your NNCP gateway host, say, @emph{nncp-gateway}:
-@verbatim
+@example
/usr/local/etc/postfix/main.cf:
relayhost = nncp-gateway
default_transport = nncp
-@end verbatim
+@end example
Postfix 2.0 and later also allows the following more succinct form:
-@verbatim
+@example
/usr/local/etc/postfix/main.cf:
default_transport = nncp:nncp-gateway
-@end verbatim
+@end example
@item Define a @command{pipe(8)} based message delivery transport for
mail delivery via NNCP:
-@verbatim
+@example
/usr/local/etc/postfix/master.cf:
nncp unix - n n - - pipe
- flags=F user=nncp argv=nncp-mail -quiet $nexthop $recipient
-@end verbatim
+ flags=F user=nncp argv=nncp-exec -quiet $nexthop sendmail $recipient
+@end example
-This runs the @command{nncp-mail} command to place outgoing mail into
+This runs the @command{nncp-exec} command to place outgoing mail into
the NNCP queue. It substitutes the hostname (@emph{nncp-gateway}, or
-whatever you specified) and the recipients before executing the command.
-The @command{nncp-mail} command is executed without assistance from the
-shell, so there are no problems with shell meta characters.
+whatever you specified) and the recipients before execution of the
+command. The @command{nncp-exec} command is executed without assistance
+from the shell, so there are no problems with shell meta characters.
@item Execute the command @command{postfix reload} to make the changes
effective.
supports them too.
After installing @command{rss2email}, create configuration file:
-@verbatim
-% r2e new rss-robot@address.com
-@end verbatim
+
+@example
+$ r2e new rss-robot@@address.com
+@end example
+
and add feeds you want to retrieve:
-@verbatim
-% r2e add https://git.cypherpunks.ru/cgit.cgi/nncp.git/atom/?h=master
-@end verbatim
+
+@example
+$ r2e add https://git.cypherpunks.ru/cgit.cgi/nncp.git/atom/?h=master
+@end example
+
and run the process:
-@verbatim
-% r2e run
-@end verbatim
+
+@example
+$ r2e run
+@end example
@node WARCs
@section Integration with Web pages
Simple HTML web page can be downloaded very easily for sending and
viewing it offline after:
-@verbatim
-% wget http://www.example.com/page.html
-@end verbatim
+
+@example
+$ wget http://www.example.com/page.html
+@end example
But most web pages contain links to images, CSS and JavaScript files,
required for complete rendering.
@url{https://www.gnu.org/software/wget/, GNU Wget} supports that
documents parsing and understanding page dependencies. You can download
the whole page with dependencies the following way:
-@verbatim
-% wget \
+
+@example
+$ wget \
--page-requisites \
--convert-links \
--adjust-extension \
--random-wait \
--execute robots=off \
http://www.example.com/page.html
-@end verbatim
+@end example
+
that will create @file{www.example.com} directory with all files
necessary to view @file{page.html} web page. You can create single file
compressed tarball with that directory and send it to remote node:
-@verbatim
-% tar cf - www.example.com | xz -9 |
- nncp-file - remote.node:www.example.com-page.tar.xz
-@end verbatim
+
+@example
+$ tar cf - www.example.com | zstd |
+ nncp-file - remote.node:www.example.com-page.tar.zst
+@end example
But there are multi-paged articles, there are the whole interesting
sites you want to get in a single package. You can mirror the whole web
site by utilizing @command{wget}'s recursive feature:
-@verbatim
-% wget \
+
+@example
+$ wget \
--recursive \
--timestamping \
-l inf \
--no-parent \
[...]
http://www.example.com/
-@end verbatim
+@end example
There is a standard for creating
@url{https://en.wikipedia.org/wiki/Web_ARChive, Web ARChives}:
@strong{WARC}. Fortunately again, @command{wget} supports it as an
output format.
-@verbatim
-% wget \
+
+@example
+$ wget \
--warc-file www.example_com-$(date '+%Y%M%d%H%m%S') \
--no-warc-compression \
--no-warc-keep-log \
[...]
http://www.example.com/
-@end verbatim
+@end example
+
That command will create uncompressed @file{www.example_com-XXX.warc}
web archive. By default, WARCs are compressed using
@url{https://en.wikipedia.org/wiki/Gzip, gzip}, but, in example above,
-we have disabled it to compress with stronger @command{xz}, before
-sending via @command{nncp-file}.
+we have disabled it to compress with stronger and faster
+@url{https://en.wikipedia.org/wiki/Zstd, zstd}, before sending via
+@command{nncp-file}.
There are plenty of software acting like HTTP proxy for your browser,
allowing to view that WARC files. However you can extract files from
that archive using @url{https://pypi.python.org/pypi/Warcat, warcat}
utility, producing usual directory hierarchy:
-@verbatim
-% python3 -m warcat extract \
+
+@example
+$ python3 -m warcat extract \
www.example_com-XXX.warc \
--output-dir www.example.com-XXX \
--progress
-@end verbatim
-
-Also you can create separate NNCP node those mail receiver will be the
-script downloading website's page and send you its WARC representation
-as a file. You can configure @option{sendmail} option like this:
-
-@verbatim
-% cat /usr/local/etc/nncp.yaml
-[...]
- stargrave.org:
- [...]
- sendmail: ["/bin/sh", "/path/to/warcer.sh"]
-[...]
-@end verbatim
-
-And @file{warcer.sh} contents are:
-
-@verbatim
-#!/bin/sh -ex
-
-user_agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
-
-name="$1"
-read cmdline
-
-tmp=$(mktemp -d)
-cd $tmp
-warc_name=$name-$(date '+%Y%M%d%H%m%S')
-wget \
- --page-requisites \
- --convert-links \
- --adjust-extension \
- --restrict-file-names=ascii \
- --span-hosts \
- --random-wait \
- --execute robots=off \
- --user-agent "$user_agent" \
- --reject '*.woff*,*.ttf,*.eot,*.js' \
- --tries 10 \
- --warc-file $warc_name \
- --no-warc-compression \
- --no-warc-keep-log \
- $cmdline || :
-xz -9 "$warc_name".warc
-nncp-file "$warc_name".warc.xz $NNCP_SENDER:
-rm -r $tmp
-@end verbatim
-
-Now you can queueu that node to send you some website's page:
-
-@verbatim
-% echo http://www.nncpgo.org/Postfix.html |
- nncp-mail remote.node nncp-postfix-page
-@end verbatim
+@end example
@node BitTorrent
@section BitTorrent and huge files
accelerate HTTP*/*FTP downloads by segmented multiple parallel
connections.
-You can queue you files after they are completely downloaded:
-@verbatim
-% cat send-downloaded.sh
-#!/bin/sh
-
-if [ "$2" -eq 0 ]; then
- # This could be downloaded .torrent file itself
- exit 0
-fi
-
-if [ "$2" -gt 1 ]; then
- # This is directory downloaded with BitTorrent
- wholedir="$(dirname "$3")"
- name=$(basename "$wholedir")
- cd "$wholedir"/..
- tartmp=$(mktemp ./finished.XXXXXX)
- tar cf $tartmp "$name"
- nncp-file -chunked $(( 1024 * 100 )) $tartmp remote:"$name".tar
- rm $tartmp
-else
- nncp-file -chunked $(( 1024 * 100 )) "$3" remote:
-fi
-
-% aria2c \
- --on-download-complete send-downloaded.sh \
- http://example.org/file.iso \
- http://example.org/file.iso.asc
-% aria2c \
- --on-bt-download-complete send-downloaded.sh \
- http://example.org/file.torrent
-@end verbatim
+You can queue you files after they are completely downloaded.
+@file{aria2-downloaded.sh} contents:
+
+@verbatiminclude aria2-downloaded.sh
Also you can prepare
@url{http://aria2.github.io/manual/en/html/aria2c.html#files, input file}
with the jobs you want to download:
-@verbatim
-% cat jobs
+
+@example
+$ cat jobs
http://www.nncpgo.org/download/nncp-0.11.tar.xz
out=nncp.txz
http://www.nncpgo.org/download/nncp-0.11.tar.xz.sig
out=nncp.txz.sig
-% aria2c \
- --on-download-complete send-downloaded.sh \
+$ aria2c \
+ --on-download-complete aria2-downloaded.sh \
--input-file jobs
-@end verbatim
+@end example
+
and all that downloaded (@file{nncp.txz}, @file{nncp.txz.sig}) files
will be sent to @file{remote.node} when finished.
+@node DownloadService
+@section Downloading service
+
+Previous sections tell about manual downloading and sending results to
+remote node. But one wish to remotely initiate downloading. That can be
+easily solved with @ref{CfgExec, exec} handles.
+
+@verbatim
+exec: {
+ warcer: ["/bin/sh", "/path/to/warcer.sh"]
+ wgeter: ["/bin/sh", "/path/to/wgeter.sh"]
+ aria2c: [
+ "/usr/local/bin/aria2c",
+ "--on-download-complete", "aria2-downloaded.sh",
+ "--on-bt-download-complete", "aria2-downloaded.sh"
+ ]
+}
+@end verbatim
+
+@file{warcer.sh} contents:
+
+@verbatiminclude warcer.sh
+
+@file{wgeter.sh} contents:
+
+@verbatiminclude wgeter.sh
+
+Now you can queue that node to send you some website's page, file or
+BitTorrents:
+
+@example
+$ echo http://www.nncpgo.org/Postfix.html |
+ nncp-exec remote.node warcer postfix-whole-page
+$ echo http://www.nncpgo.org/Postfix.html |
+ nncp-exec remote.node wgeter postfix-html-page
+$ echo \
+ http://www.nncpgo.org/download/nncp-0.11.tar.xz
+ http://www.nncpgo.org/download/nncp-0.11.tar.xz.sig |
+ nncp-exec remote.node aria2c
+@end example
+
@node Git
@section Integration with Git
everything you need.
Use it to create bundles containing all required blobs/trees/commits and tags:
-@verbatim
-% git bundle create repo-initial.bundle master --tags --branches
-% git tag -f last-bundle
-% nncp-file repo-initial.bundle remote.node:repo-$(date % '+%Y%M%d%H%m%S').bundle
-@end verbatim
+
+@example
+$ git bundle create repo-initial.bundle master --tags --branches
+$ git tag -f last-bundle
+$ nncp-file repo-initial.bundle remote.node:repo-$(date % '+%Y%M%d%H%m%S').bundle
+@end example
Do usual working with the Git: commit, add, branch, checkout, etc. When
you decide to queue your changes for sending, create diff-ed bundle and
transfer them:
-@verbatim
-% git bundle create repo-$(date '+%Y%M%d%H%m%S').bundle last-bundle..master
+
+@example
+$ git bundle create repo-$(date '+%Y%M%d%H%m%S').bundle last-bundle..master
or maybe
-% git bundle create repo-$(date '+%Y%M%d').bundle --since=10.days master
-@end verbatim
+$ git bundle create repo-$(date '+%Y%M%d').bundle --since=10.days master
+@end example
Received bundle on remote machine acts like usual remote:
-@verbatim
-% git clone -b master repo-XXX.bundle
-@end verbatim
+
+@example
+$ git clone -b master repo-XXX.bundle
+@end example
+
overwrite @file{repo.bundle} file with newer bundles you retrieve and
fetch all required branches and commits:
-@verbatim
-% git pull # assuming that origin remote points to repo.bundle
-% git fetch repo.bundle master:localRef
-% git ls-remote repo.bundle
-@end verbatim
+
+@example
+$ git pull # assuming that origin remote points to repo.bundle
+$ git fetch repo.bundle master:localRef
+$ git ls-remote repo.bundle
+@end example
Bundles are also useful when cloning huge repositories (like Linux has).
Git's native protocol does not support any kind of interrupted download
bundle, you can add an ordinary @file{git://} remote and fetch the
difference.
+Also you can find the following exec-handler useful:
+
+@verbatiminclude git-bundler.sh
+
+And it allows you to request for bundles like that:
+@code{echo some-old-commit..master | nncp-exec REMOTE bundler REPONAME}.
+
@node Multimedia
@section Integration with multimedia streaming
and @emph{YouTube}.
When you multimedia becomes an ordinary file, you can transfer it easily.
-@verbatim
-% youtube-dl \
- --exec 'nncp-file {} remote.node:' \
+
+@example
+$ youtube-dl \
+ --exec 'nncp-file @{@} remote.node:' \
'https://www.youtube.com/watch?list=PLd2Cw8x5CytxPAEBwzilrhQUHt_UN10FJ'
-@end verbatim
+@end example