narf, version 0.93.01
cleanfeed-inn 0.95 modified for MD5 body hashes
(or as patches against 0.95)
our filter (last updated December 9 1997)
To start with you need a copy of narf and a perl filter for it. Narf is somewhat tuned for our filter (and has only been run with it), but with minor if any customization it should work with any INN perl filter (a good source of links to them is Jeremy Nixon's Anti-Spam Software page). Unlike perl INN filters, narf's filters can use all of perl's features; in particular, they can use use to load modules. Narf requires perl 5.004. Our filter and the MD5-modified cleanfeed-inn both also require an installed copy of the perl MD5 module from CPAN.
You should also have read the main narf page, particularly the how it works section.
Narf needs to know the paths to various directories and files. Most important is $FILTDIR, the directory where the filter_innd.pl filter is; this must be set in narf itself and cannot be set at runtime by a command-line argument. Although you can set most others by command-line arguments (see the comments at the start of the script for details), you probably should set your normal defaults in the script.
There are a number of other defaults and options to be set in the configuration section. You should pay particular attention to the normal value of $verbose, since logging accepted articles can create very big logs. Our logs often hit sixty or seventy megabytes a day and while we find the information useful you may opt for a slimmer log.
Make sure that the file you have chosen for narf's saved list of recognized EMP signatures is writeable by your news user. If it is not, no recognized EMP signatures will be saved across runs. If the file doesn't exist yet, make sure that the news user can create the file in that directory; you may want to touch and chown the file the first time around.
If you will be feeding posts by local users through narf, make sure that $DUMPDIR is defined and is a directory that the news user can create files in.
Reconfigure your NNTP daemon to write incoming batches and articles into the source batch directory you've editing into narf. If you already have local posts filtered through some spam checking NNTP filter, you may want to arrange for it to write POST transfers from your users into a different spot than IHAVE transfers from your peers. Narf does an acceptable job of filtering local posts if configured correctly, and we do strongly recommend that you filter local posts somehow.
If narf rejects a post that it identifies as local, it always saves a copy of the rejected article in $DUMPDIR under a naming scheme designed to make such saved articles be easily spotted. This does rely on $DUMPDIR being configured; if you are going to feed local posts through narf, we strongly recommend that you do this. Because of how narf works, there is no indication to the user posting that their article has been rejected; all they will see is that their articles are not showing up.
Narf requires you to provide a filter to run articles through to determine whether to accept or reject them. Narf can use INN perl filters unmodified, although it will perform better if you modify either it or the filter or both. A discussion on how to write an INN perl filter is beyond the scope of this document; if you are interested, start from one of the existing ones. The filter you choose must be installed as filter_innd.pl in $FILTDIR.
Our filter is oriented towards aggressive spam discarding, including lots of judgement calls about things we don't want. It's suitable mainly for people who want to be as aggressive as us or want to see what can be done.
Jeremy Nixon's cleanfeed filter is designed to be more safe in rejecting only things that are pretty certain to be spam (although still healthily aggressive); a discussion of what it rejects is here. We recommend that you use our small revision of it to have it use MD5 body hashes as its primary method of detecting new spam, since this seems to work very well; you can get that version here (0.95 based version), or fetch a current copy of cleanfeed-inn and see if our diffs apply cleanly. We have successfully run narf with both this filter and an unmodified cleanfeed 0.95 in tests but do not use either in production (since we prefer our own filter).
Every filter and especially our filter contains policy decisions that you may not agree with. You should at least skim the source and make sure that you agree with everything it is doing and all the sorts of spam that it is killing. Your chosen filter may also contain options that you need to tweek, or sizing information for how much data it stores in memory. If you are using our filter (which we recommend, of course) then you probably want to read about why we have our filter do what it does.
If you do not do this and your chosen filter discards all of your newsfeed, you have only yourself to blame.
You should examine and if necessary edit narf if you are not using our filter. Potentially troublesome spots are its recognition of what spam rejections are safe to block cancels for and its method of saving any recognized EMP signatures your filter uses. If you are using a filter derived from cleanfeed-inn 0.95, both of these will work and you do not need to do anything; otherwise, see the narf source code.
There are a number of perl code tweaks that you can make to a filter to make what narf logs more interesting; generally they involve defining subroutes in the filter that narf calls. Subroutines to define:
As shipped both narf and our filter default to rejecting cancels when they can. Fuller explanations of what they do this for is in their main pages, here for narf and here for our filter. If you do not want this behavior you need to change the setting of $cancreject in narf and $block_cancels in our filter. Changing only narf (or the filter) will not automatically also shut the other off.
Narf should be run as your news user and it needs to be automatically restarted on reboot. Edit whatever startup scripts you have to arrange for this. When starting, narf's standard output and standard error should be redirected into a file. Anything that appears there is either a goof in editing your filter or a serious problem.
narfsum (last updated November 16 1997)
narfhippo (last updated November 22 1997)
If you intend to do anything with the log narf produces, you may want to start with our log summary software. Narfsum is a shell and awk script that produces our rejection volume reports, while narfhippo is a perl program that produces our rejection sources reports.
Narfsum was written by P Kern, while narfhippo (modeled on the SpamHippo reports) was written by Chris Siebenmann.
Before starting narf for the first time you should read about narf operation.
This page is part of our narf pages.