mars 10, 2010

UTF8 with Perl CGI.pm and Ajax

It took me about three hours to figure out something this simple, so I'm posting this here!

If your doing ajax stuff with CGI.pm in perl, you'll want to exchange your data in UTF8, to do so you need only one special line of code*:

my $json = decode utf8=>$q->param('json');

This will make perl work natively in UTF8 with $json.

If you want to put this (or part of it) in a database, you don't need to do anything special because the DBD driver will convert your string from UTF8 to the database encoding (UCS-2 in my case).

*Of course, you'll also need to use Encode;.

Posted by gfk at 1:55 PM | Comments (0) | TrackBack

janvier 23, 2008

msdb is huge!

On our SQL Server 2005 DB, I found out that the msdb.dbo.sysmaintplan_logdetail table was more than 6 GB, so I ran the following SQL code to reduce it to 60 MB. It took about 10 minutes to run the query, so don't panic.

I mainly post it here to remember what to run next time I have this problem, but it could be useful to someone else. :-)

ALTER TABLE [dbo].[sysmaintplan_log] DROP CONSTRAINT [FK_sysmaintplan_log_subplan_id];
ALTER TABLE [dbo].[sysmaintplan_logdetail] DROP CONSTRAINT [FK_sysmaintplan_log_detail_task_id];

truncate table msdb.dbo.sysmaintplan_logdetail;
truncate table msdb.dbo.sysmaintplan_log;

ALTER TABLE [dbo].[sysmaintplan_log] WITH CHECK ADD CONSTRAINT [FK_sysmaintplan_log_subplan_id] FOREIGN KEY([subplan_id])
REFERENCES [dbo].[sysmaintplan_subplans] ([subplan_id]); 

ALTER TABLE [dbo].[sysmaintplan_logdetail] WITH CHECK ADD CONSTRAINT [FK_sysmaintplan_log_detail_task_id] FOREIGN KEY([task_detail_id])
REFERENCES [dbo].[sysmaintplan_log] ([task_detail_id]) ON DELETE CASCADE;

From: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2573257&SiteID=1

Posted by gfk at 10:29 AM | Comments (0) | TrackBack

août 26, 2007

MythDora: MythTV that works!

I tried once more to install MythTV on an old PC (P4 1.6 GHz, 512 MB RAM) that has been gattering dust at home for months. I tried using KnoppMyth in the past with ressounding failure. This time, I decided to try mythDora.

I'm really impressed with mythDora, I only had to pop the DVD in the drive and boot the PC and it worked on the first try! Well, not the first since I had screwed up my hard drive config, but it did on the second try! Well, not the second since the installed froze for no reason 20 minutes in the installation process, but it worked on the third try! Which is pretty good compared to my earlier experiences with KnoppMyth: about 15 failures to install in that many frustrating hours.

Here are the details of my PC:

And I'm using this for satellite reception:

Everything was recognized automagically except for the USB-UIRT IR Blaster. The remote control that came with the Hauppauge PVR-250 didn't work at first because I had answered yes to the "Will you use an IR Blaster" question. I had to delete /etc/modprobe.d/irblaster and reboot and it worked fine.

IR Blaster

I had some troubles setting up the IR Blaster (USB-UIRT), but after a couple hours, I had it working. Here's a summary of how I did it:

ln /usr/sbin/lircd /usr/sbin/irblaster
cp /etc/init.d/lircd /etc/init.d/irblaster
wget http://lirc.sourceforge.net/remotes/coolsat/Pro_4000 -O /etc/irblaster.conf

In /etc/init.d/irblaster, replace every instance of "lircd" by "irblaster". Then put this into /etc/sysconfig/irblaster:

IRBLASTER_OPTIONS=" --driver=usb_uirt_raw --device=/dev/ttyUSB0 --output=/dev/lircd1 --listen=8766 --pidfile=/var/run/irblaster.pid /etc/irblater.conf"

Then put the following into /usr/local/bin/change_chan.sh:

 
#!/bin/sh
REMOTE_NAME=Coolsat_Pro_4000
for digit in $(echo $1 | sed -e 's/./& /g'); do
 irsend --device=/dev/lircd1 SEND_ONCE $REMOTE_NAME $digit
 sleep 0.333
done
irsend --device=/dev/lircd1 SEND_ONCE $REMOTE_NAME O

I'm not sure why (and this is what gave me so much troubles), but I need to read from /dev/lircd1 for the irblaster to work. So I added irw /dev/lircd1 >/dev/null 2>&1 & in /etc/rc.d/rc.local. It's not very elegant, but it works...

Munin

I wanted to have a better idea of the ressources taken by MythTV, so I installed munin-node on the PVR. I already have a munin server on my LAN.

yum install munin-node 
echo "allow ^10\.10\.16\.2$" >> /etc/munin/munin-node.conf
/etc/init.d/munin-node start 
Add this line to /etc/sysconfig/iptables:
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 4949 -j ACCEPT
/etc/init.d/iptables restart

Posted by gfk at 4:50 PM | Comments (0) | TrackBack

septembre 16, 2006

QL: Où s'en va votre argent?

Le Québécois Libre a publié récemment un lien vers un site gouvernemental qui permet d'obtenir des informations relatives aux organismes subventionnés par le gouvernement québécois.

Ce site est malheureusement difficile à naviguer et ne permet pas de manipuler les informations aussi facilement qu'une base de données permettrait de le faire. J'ai donc décidé de faire un web scraper pour « gratter » l'information présente sur ce site et la mettre dans un format plus facile à manipuler.

Perl à la rescousse

J'ai utilisé deux scripts perl pour télécharger les quelques 400 pages de données et ensuite extraire l'information présente et la placer dans un fichier CSV.

Le résultat

J'obtiens 5685 subventions, plutôt que le 5057 annoncé par le site. Coté argent, mon total est de 580 513 358,87$ plutôt que le 631 151 405,31$. C'est assez étrange que j'aie plus de subventions totalisant moins que le décompte gouvernemental, considérons que c'est causé par une erreur de ma part plutôt qu'une erreur du site.

Le document est disponible en format Excel 2000 [1 Mo] pour le grand public, en format Excel 2000 Compressé en ZIP [322 Ko] et en format CSV Compressé en ZIP [191 Ko] pour ceux qui voudraient l'importer dans une base de données.

Le grand gagnant!

Si on trie le document par ordre décroissant de montant total de subventions, le grand gagnant est Boscoville 2000, une « ferme agrotouristique et écologique » a reçu une subvention de 2 500 000$.

Le club des millionnaires

Voici les 25 organismes qui ont reçu une subvention de plus d'un million de dollars.

Organisme Ministère Montant Total
Boscoville 2000 MSSS 2 500 000,00 $
Centre de crise de Québec MSSS 2 174 334,00 $
L'institut de recherche du centre universitaire de santé de McGill MSSS 2 146 969,00 $
P.L.A.C.E. Rive Sud, Projet local d'aide en création d'emploi inc. MESSF(Emploi) 2 027 576,00 $
Club des petits déjeuners du Québec MESSF(Solidarité sociale) 1 600 000,00 $
Association I.R.I.S. MSSS 1 347 162,00 $
Centre des femmes de Montréal MESSF(Emploi) 1 336 970,00 $
Carrefour jeunesse-emploi de l'Outaouais MESSF(Emploi) 1 304 577,00 $
La maison Jean Lapointe inc. MSSS 1 259 929,00 $
GIT Société inc. MESSF(Emploi) 1 254 275,00 $
Centre d'intégration en emploi Laurentides (C.I.E. Laurentides) MESSF(Emploi) 1 229 189,00 $
Programme d'encadrement clinique et d'hébergement P.E.C.H. MSSS 1 221 462,00 $
Centre de crise Le transit MSSS 1 215 403,00 $
Maison Les étapes inc. MSSS 1 212 761,00 $
Comité régional d'intégration au travail inc. MESSF(Emploi) 1 202 755,00 $
Tracom inc. MSSS 1 197 550,00 $
Le centre d'aide 24/7 MSSS 1 149 604,00 $
Centre de recherche d'emploi de l'Est (CREE) inc. MESSF(Emploi) 1 125 591,00 $
Association d'entraide Ville-Marie inc. MSSS 1 092 459,00 $
Groupe conseil St-Denis inc. MESSF(Emploi) 1 092 124,00 $
Makitautik centre d'hébergement communautaire de Kangirsuk MSP 1 078 401,00 $
Centre de formation Option-Travail Ste-Foy MESSF(Emploi) 1 044 199,00 $
Rond-point jeunesse au travail MESSF(Emploi) 1 041 556,00 $

Posted by gfk at 12:20 PM | Comments (0)

juin 10, 2006

Fun with perl and Google Video

A couple weeks ago, I started playing with Google Video, it's a great ressource, I really love it. The only thing that bugs me with this tool is that you can't download the video to your hard drive. You can download a Google Video Player (GVP) file that's effectively a pointer to the video, but you can't download the actual MPG or AVI.

Well, I recently found out that you could. So I modified a perl module to make it very easy to do!

CPAN has a perl module called WWW::Google::Video. However, it seems that the module was made before Google decided to put a GVP file for download. So I extendted the module to support the GVP file.

I sent my modified version to the author (莉洛) but until it makes its way into CPAN, I'll keep a local copy for your downloading pleasure.

I also added a demo CGI in the documentation, you should check it out.

Here's the source code for the demo:

#!/usr/bin/perl -T

use strict;
use warnings "all";
use WWW::Google::Video;
use CGI qw/:standard/;

print header;

if (param()) {
  die "Not a Google Video URL." unless param('url') =~ m#\Qhttp://video.google.com/videoplay?\E#;
  my $vid=new WWW::Google::Video;
  $vid->fetch(param('url'));

  print start_html('Fun with perl and with Google Video: '.$vid->{title}),"\n";
  print h1('Fun with perl and Google Video: '.$vid->{title}),"\n";
  print p,$vid->{description},p,"\n";
  print p,"[ ",a({href=>'http://video.google.com/videoplay?docid='.$vid->{docid}},"See on Google Video")," | ";
  print a({href=>$vid->{avi}},"Download as DivX AVI")," ]",p,"\n";
  print p,"You will need the ",a({href=>'http://divx.com/'},"DivX Player")," in order to play the downloaded AVI.",p,"\n";

  foreach(@{ $vid->{pic} }){
    print img({src=>$_}),"\n";
  }
} else {
  print start_html('Fun with perl and Google Video'),"\n";
  print h1('Fun with perl and Google Video'),"\n";
}

print start_form('get'),"\n";
print "Enter the Google Video URL that you want to play:",p,textfield('url','',65),p,"\n";
print submit,"\n",end_form,"\n";

print end_html,"\n";

Posted by gfk at 10:09 AM | Comments (0)

février 28, 2006

Simple query_string based http accelerator for a dynamic web page

A particular web site is serving pretty fancy statistics using SQL queries generated on the fly by an ASP page. The ASP page is receiving parameters using GET parameters. As the site is getting more popular, the load of the SQL server is rising, and the time needed to execute one big SQL query is now around 30 seconds.

After an analysis of the requests, I found out that 80% of the requests were for the same three stats -- another example of the principle of locality. So I thought of using a reverse caching proxy (http accelerator) to reduce the load on the server. I tried squid and mod_proxy. However, both programs understandably don't cache the data from pages that are given GET parameters. So I had to make my own solution: Perl to the rescue!

This script sits on a web server and receives queries with GET parameters. It does a SHA1 checksum of the parameters and will use this checksum as the filename for the cache file. If a cache file for these params exists and the modification date does not exceed the expiration time, it will send that cache file to the client. Otherwise, it will query the dynamic page for the data and store the result in cache.

It is meant to be simple and specialized for my own application. You will certainly have to teak it, but it should be easy to do since it's so simple.

The stats are generated by the ASP page using real-time data, but he data does not change very fast -- the most active users usually check the site 2 or 3 times a day. So I decided to use an expiration time of 1 hour. You should adjust this to suit your needs.

This script can query any kind of dynamic web page, be it ASP, PHP or anything else.

#
# Simple query_string based http accelerator for a dynamic web page
#
# Copyright (C) 2005 Guillaume Filion <guillaume@filion.org>
#
# Version 1.0 2006-02-28 Initial release
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License version 2
# as published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
#

use strict;
use Digest::SHA1 qw(sha1_base64);
use LWP::UserAgent;

# What dynamic web page will provide the responses.
# May not be a good idea to make this publicly accessible.
use constant SRC => 'http://localhost/stats.asp';

# Directory where the cached versions of the pages will be stored.
# There's no mechanism for deleting the old cached pages, so I have a 
# cron job empty this directory every night.
use constant DIR => 'D:/modstats/cache/';

# How long, in seconds, do we keep the cached versions of the pages.
use constant EXPIRATION => 3600; # 3600 secs = 1 hour

chdir(DIR);

my $param = $ENV{'QUERY_STRING'};
die "Tainted query_string: ($param)" unless $param =~ /[\w+\=[\w\%]+&?]*/;
my $digest = sha1_base64($param);
$digest =~ s/\//-/g; # Base64 is not your friend if you're using it for filenames...

# Check if we have a cached response for this query
my $load=1;
my $file = "$digest.html";
if (-s $file) {
	my $cdate = (stat $file)[9];
	if (time - $cdate > EXPIRATION) {
		unlink($file);
		$load=1;
	} else {
		$load=0;
	}
}

my $content;
if ($load) {
	# Load the page and save it into the cache
	my $ua = LWP::UserAgent->new;
	my $response = $ua->get(SRC . '?' . $param);
	$content = $response->content;
	
	if ($response->is_success()) {
		open(CACHE,">$file") or print("Cannot write to file $file: $!");
		print CACHE $content; 
		close(CACHE);
	} else {
		print $response->status_line;
	}
} else {
	# Load the reponse from the cache
	open(CACHE,"<$file") or print("Cannot open file $file: $!");
	{ local $/; undef $/; $content = <CACHE>; }
	close(CACHE);
}

# Send it back to the client.
print "Content-type:text/html\n\n$content";

Posted by gfk at 4:42 PM | Comments (0)

janvier 6, 2006

APT NO_PUBKEY error

For a couple day, I've been getting this error when running sudo aptitude update on debian:

W: GPG error: http://gulus.usherbrooke.ca testing Release: The 
following signatures couldn't be verified because the public 
key is not available: NO_PUBKEY 010908312D230C5F
W: You may want to run apt-get update to correct these problems

The solution is to install the new keys for 2006 with this command:

wget http://ftp-master.debian.org/ziyi_key_2006.asc -O - | \
apt-key add -

Props to pthichat for the solution.

Posted by gfk at 4:09 PM | Comments (0)

décembre 31, 2005

CPAN.pm on OpenSlug

Dring the holidays, I installed and run BackupPC on a Linksys NSLU2. The result was too slow (too little RAM, too much swapping) to be useful -- backups would take five times longer than with a PC with enough RAM.

However, I learned a few tricks along the way. The most interesting of them is how to run CPAN.pm on the SLUG.

Installing and configuring OpenSlug

The first thing is to install OpenSlug. The documentation is pretty clear, so I won't rewrite it. Just take a look at it and come back here when you're at step 8 of Initialising OpenSlug.

OpenSlug uses ipkg, which is similar to apt on Debian. The first thing to do is to update your slug.

ipkg update
ipkg upgrade

I personally can't live without screen, sudo and nano and I'm using them to install CPAN so it might be a good idea to do the same.

ikpg install screen sudo nano

Installing perl

Now with the real thing! We start by installing perl.

ipkg install perl perl-dev perl-misc perl-modules

Unlike other distros, OpenSlug has packaged every single perl module in its own package, so don't be surprised to see a gazillion packages being installed.

Installing CPAN

A lot of CPAN modules need a compiler. So we'll need to install a native compilation environment.

There are two ways to do this, the full blown one is to install openslug-native, which will install all sorts of utilities for application developement.

ipkg install openslug-native

However, openslug-native installs some packages that you don't need (python for example), so if you want to install less packages, you can only install those:

ipkg install gcc gcc-symlinks make binutils libc6-dev coreutils libexpat-dev

Install ccache

Lots of perl modules want ccache to compile properly. At the time, it's not available as a ipkg package, so you'll need to compile it. It's available at http://ccache.samba.org/

wget http://ccache.samba.org/ftp/ccache/ccache-2.4.tar.gz
tar zxvf http://ccache.samba.org/ftp/ccache/ccache-2.4.tar.gz
cd ccache-2.4
./configure
make
make install

CPAN.pm

Any perl installation wouldn't be complete without CPAN. And since the version of perl included in OpenSlug is particularly thin, CPAN can be used to make it a little more healty.

When I first wrote this article, the version of CPAN that's included with OpenSlug did not work. I sent a patch to the author (Andreas Koenig) and it has been included in version 1.81. Go take a look on the CPAN site and download the lastest version of CPAN.

http://www.cpan.org/modules/by-module/CPAN/CPAN-1.83.tar.gz
tar zxvf CPAN-1.83.tar.gz
cd CPAN-1.83/
perl Makefile.PL
make
make test
make install

Use CPAN

Once CPAN is intalled, you should be able to use it to install a good dose of packages. If you haven't done so already, opening a screen by typing screen would be a good idea. This should take quite a long time -- a couple hours. When entering CPAN, it will ask for some configurations, it would be a good idea to set the Policy on building prerequisites to follow.

perl -MCPAN -e shell
install Bundle::libnet
reload cpan
install Bundle::CPAN
quit

Congratulations, you now have CPAN, and all the power of perl modules, on your slug.

Posted by gfk at 11:43 AM | Comments (0)

octobre 24, 2005

ConnexLink 900 MHz link: PPP

With the slug happily running OpenSlug, it was now time to set up a Serial/PPP link between my linux server (ali) and the slug.

It was one of the first times in about every project that I have done where everything worked on the first try!

USB-Serial Converter

Since the slug doesn't have a serial port, the first thing that I had to do was to connect my USB-Serial converter. I reused an old one used for connecting my Palm to my PowerBook.

I then installed the necessary module (pl2303) with ipkg install kernel-module-pl2303 followed by a modprobe pl2303. Since I'm using the memstick version of OpenSlug, I have almost no log file and therefore no way of knowing if it had work, but a lsmod listed both the pl2303 and the usbserial modules loaded -- a good sign.

Serial/PPP link

It was time to set up the serial link. I plugged the USB-Serial adapter to ali using a serial cross-over cable. The first thing that I had to do was to install the required software with ipkg install ppp ppp-tools kernel-module-ppp-async kernel-module-ppp-deflate kernel-module-ppp-generic. I then followed the instructions of the Serial-Laplink-HOWTO.

I was surprised to see how easy it was, I just copied and pasted the code example, changed ppp_laplink_server to ali's IP address and ppp_laplink_client to the IP address that I wanted to give to the slug via the PPP link (different than the one it got from the ethernet). Voilà! It worked on the first try!


root@ali:/etc/ppp# /usr/sbin/pppd /dev/ttyS1 nodetach
Using interface ppp0
Connect: ppp0 <--> /dev/ttyS1
PAP peer authentication succeeded for gfk
Deflate (15) compression enabled
found interface eth0 for proxy arp
local IP address 192.168.0.2
remote IP address 192.168.0.43
-----
root@slug:/etc/ppp# /usr/sbin/pppd /dev/ttyUSB0 nodetach
Using interface ppp0
Connect: ppp0 <--> /dev/ttyUSB0
Remote message: Login ok
PAP authentication succeeded
Deflate (15) compression enabled
local IP address 192.168.0.43
remote IP address 192.168.0.2

Benchmarks

I tested the reliability of the link with a 15 minutes ping flood:

gfk@ali:~$ sudo ping -f 192.168.0.43
PING 192.168.0.43 (192.168.0.43) 56(84) bytes of data.
.
--- 192.168.0.43 ping statistics ---
74602 packets transmitted, 74601 received, 0% packet loss, time 997772ms
rtt min/avg/max/mdev = 15.099/19.923/39.925/0.239 ms, pipe 3, ipg/ewma 13.374/19.929 ms

Then I measured the bandwidth with iperf. Note that those speeds are all half-duplex.

gfk@ali:~$ iperf -c 192.168.0.43
------------------------------------------------------------
Client connecting to 192.168.0.43, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 5] local 192.168.0.2 port 43315 connected with 192.168.0.43 port 5001
[ 5] 0.0-10.0 sec 3.65 MBytes 3.05 Mbits/sec

I was a bit stunned by these results -- getting 3 Mbps on a 0.12 Mbps link is a little too good to be true -- but I realised that iperf must have been using really simple data for its test. And everyone should know that simple data compresses really well. When using "real world" data or even compressed data, I get more realistic figures:

gfk@ali:~$ iperf -c 192.168.0.43 -F maildir.tar
------------------------------------------------------------
Client connecting to 192.168.0.43, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 6] local 192.168.0.2 port 43312 connected with 192.168.0.43 port 5001
[ 6] 0.0-10.4 sec 400 KBytes 314 Kbits/sec
gfk@ali:~$ iperf -c 192.168.0.43 -F maildir-best.tar.gz
------------------------------------------------------------
Client connecting to 192.168.0.43, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 6] local 192.168.0.2 port 43310 connected with 192.168.0.43 port 5001
[ 6] 0.0-11.5 sec 136 KBytes 96.9 Kbits/sec

These numbers are still very impressing, in my planning I estimated that I would get about 37% of speed gain, but these "real world" results give around 224% of speed gain. If anyone has an explanation, I'd be happy to hear it.

Auto reconnect

The Serial-Laplink-HOWTO talks about a getty-like installation of pppd. getty is the program that allows you to use the PC screen as a terminal, it is started automatically when linux boots and restarted every time it's closed. That's pretty much what we're looking for our ppp link!

To use this, I added this line to /etc/inittab on ali:

pd:2345:respawn:/usr/sbin/pppd /dev/ttyS1 nodetach

On the slug, I needed to load the pl2303 (USB-Serial converter) module before starting the PPP link, so I made a simple script /usr/local/ppp_link:


#!/bin/sh
/sbin/modprobe pl2303
/usr/sbin/pppd /dev/ttyUSB0 nodetach

Then I added this line to /etc/inittab on the slug:

pd:2345:respawn:/usr/local/ppp_link

With this setup, the PPP link is started on boot and is automatically restarted if it goes down.

With this working so well, the next step will be to order a pair of ConnexLink Radio Modems!

Posted by gfk at 3:20 PM | Comments (0)

octobre 22, 2005

ConnexLink 900 MHz link: NSLU2

The first step that I wanted to do in this project was to see if I could use a NSLU2 to act as a ethernet-serial bridge on the remote side.

The NSLU2 is a single board computer made by Linksys. It was designed to act as a Network Attached Storage, but smart folks found out that it was running Linux and figured out ways to make it behave like a tiny linux server. And best of all, you can get one for less than US$100. The NSLU2 has been affectuously nicknamed the slug by the linux users.

I had already bought a slug a couple months ago, knowing that it would come in handy one day or another -- it turns out I was right. I didn't do much with it back then, so when I decided to use it for the 900 MHz link, I first had to learn how it worked.

I went to the NSLU2-Linux web site and installed Unslung, the recommended firmware for new users. The process was simple but a bit long, but when it was done I was able to long into the NSLU2 and use it as a linux server.

However, when I tried to install PPP, I found out that it was not available as a package for Unslung. I asked on the NSLU2-Linux mailing list and people suggested that I rather used OpenSlug, a less user-friendly but more powerful version of the firmware.

That's what I did and I actually found it easier to use, it doesn't have all the features of the original slug but I don't need them anyway.

Overclocking the slug

For some unknown reason, Linksys decided to run the slug at 133 MHz, despite the fact that Intel, the maker of the chip, designed it to run at 266 MHz -- twice as fast. Fortunately, the smart folks at NSLU2-Linux found out an easy way of doing this. I only needed to remove one miniature resistor. I used an exacto blade to cut the solder on the left side of the resistor, then carefully inserted the tip of the blade under the resistor to lift it, then the solder on the right side of the resistor broke and the chip felt on the table.

I sent a spray of compressed air one the board to make sure that no solder residue would get stuck somewhere a cause a short circuit. The whole operation took about 30 seconds; I found it much harder to open the case than to remove the resistor.

Force Power-On

When I plug the slug, it doesn't start automatically, I need to press the power button. This is not something that's easy to do if the slug is in a sealed box on top of a 20' pole. Fortunately, one again, the smart folks at NSLU2-Linux have come up with a solution.

Unfortunately, it doesn't seem very easy for someone with poor soldering skills like me. I'm not sure what I'll do. I may try to do it myself or ask a more talented solderer...

Posted by gfk at 3:24 PM | Comments (0)

octobre 14, 2005

ConnexLink 900 MHz link: Planning

I rediscovered the Aerocomm ConnexLink modems today, I had read about them in the past, but for some reason I didn't find them interesting. I certainly find them interesting today.

I would like to use them for a 2.9 Km link in a rural area. I did a path analysis of the link, I would have 30 dBi of fade margin without considering interference or obstruction (mainly by trees), this link definitely looks feasible.

The ConnexLink is a serial radio modem, to use it for Internet connectivity, I'd need a convertion from TCP/IP to serial -- I guess using PPP. I'm looking into setting a Linux router with real-time compression on both ends of such a link.

Throughput Estimate

The manufacturer gives an over-the-air throughput of 78 Kbps. I know that PPP does data compression automagically. I did a little experiment to get an idea of what the gzip compression can do. All my mailboxes are stocked in Maildir format, most of them containing plain text messages. The PC I'm doing this experiment on is an AMD Athlon XP2000+ (1.3GHz) with 512 MB RAM running Linux Debian testing.


gfk@ali:~$ tar -c Maildir/* > maildir.tar
gfk@ali:~$ ls -lh maildir.tar
-rw-r--r-- 1 gfk gfk 252M Oct 13 21:02 maildir.tar
gfk@ali:~$ time gzip --best maildir.tar
real 1m2.976s
user 0m49.160s
sys 0m2.660s
gfk@ali:~$ ls -lh maildir.tar.gz
-rw-r--r-- 1 gfk gfk 155M Oct 13 21:04 maildir-best.tar.gz
gfk@ali:~$ gunzip maildir.tar.gz
gfk@ali:~$ time gzip --fast maildir.tar
real 0m46.027s
user 0m32.470s
sys 0m3.080s
gfk@ali:~$ ls -lh maildir.tar.gz
-rw-r--r-- 1 gfk gfk 161M Oct 13 21:07 maildir.tar.gz

That gives between 36% and 38% compression. For a 78 Kbps link that would raise it to about 110 Kbps for plain text.

My mother is paying $25 per month for a 128 Kbps link to the Internet with videotron, if I'm able to set up a PtP link from my house to hers with this setup, it would pay for itself pretty fast.

Price list

To give me an idea of how much such an adventure would cost me, I made a preliminary part list.

Radios

Aerocomm ConnexLink

Price: US$250 for both radios at Mouser

Serial-Ethernet bridge

Linksys NSLU2 Requires a 256 MB USB Memory Stick.

Price: CAN$115 for the NSLU2 at FutureShop CAN$50 for the memory sitck at WallMart.

Weatherproof enclosure

7"x6"x2" NEMA 6 (Submersible) box made by Pacific Wireless

Price: US$21.75 at FAB-Corp.

Antenna

Cushcraft PC9013N 13 elements 900 MHz yagi

Price: CAN$135 at Hutton.

POE Injector

IEEE 802.3af compliant POE injector/spliter.

Price: US$24.95 for the injector, US$24.95 for the splitter at Pacific Wireless

Connectors

Weatherproof ethernet cable passthrough Price: US$10 (10 pack)

Pigtail RPSMA Male-N Female Bulkhead pigtail. Price: CAN$30

Cables

15' LMR-400 with N-Male connectors on both ends. Price: CAN$60 (2x)

50' Weatherproof CAT5e cable. Price: ?

Posted by gfk at 3:26 PM | Comments (0)

juin 15, 2005

Temperature regulator for a spa

In the spring of 1996, my father bought a three places spa. Back then, spas were not really known so he bought it for the bargain price of CAN $800. In 2005, it costs about CAN $4000 for a similar model. The only thing that a 2005 version of the spa has that the 1996 version doesn't is an electronic temperature regulator, it rather has a bimetal thermostat. The bimetal thermostat is very inaccurate and hard to set up, we would usually set it up in the spring and not touch it for the rest of the summer because of how hard it was to set up correctly. This means that we heated the spa 24/7 between May and September when we actually used it for a couple hours a week on average. Also, from the point of view of comfort, when the temperature of the spa was set up to 37°C, it would move from 35°C - too cold - to 39°C - too hot.

In the late 90's, we contacted the pool and spa shop that sold us the spa to know if it would be possible to buy the thermostat of their newest models to put on our old one. They basically gave us a blank stare, but finally told us that the controller costs CAN $500 and they has no idea if it was going to work. At the time I was in University, and I had lots of electronics courses, so I was confident that I could come up with an home-made solution that would be much more affortable and more flexible. It actually took me quite a while to make it, mainly because of my proverbial laziness, but I finally put it into work in the spring of 2004.

Components

There are three main components in this project:

Temperature sensor

Temperature sensor

I'm using a 1-Wire temperature sensor (DS18B20) made by Dallas Semiconductor/Maxim. Contrary to what its name implies, the 1-Wire devices actually require three of them, as it turns out that it needs one wire for communication plus a positive and a ground. Simple devices like the DS18B20 can work in parasite mode, so they actually need 2 wires, a communication wire and a ground.

Usually, 1-Wire networks use telephone wires with RJ11 connectors. In my setup, the RJ11 connector was connected to the serial port of a PC in my basement using a adpater (DS9097U) made by Dallas Semi. I received both the adapter and the sensor as free samples from Dallas Semi.

Relay controller

Relay controller

While the 1-Wire devices include a remote switch, it was not trivial for me to make it work with a 12 V relay. I tried for a while with the DS2405, but finally opted to go with a relay board controlled by the PC's parallel port. I ordered the kit 74 made by KitsRus for about CAN$ 50 assembled. I'm using an open source program made by James Cameron (Quozl) to control it from Linux. You might want to check out Quozl's page in details, as it lists many interesting projects, the Emu Fat Monitor is especially interesting.

If I had to redo the project, I would consider using this 1-Wire relay board by AAG Electronica. While this is more expensive than the kit 74, it would free up the parallel port on the PC.

100 feet of 4 conductor microphone cable is run between the PC in the basement and the spa on the deck. Two of the wires are used for the temperature sensor, and the two others carry a 12 V signal from the relay board to a 10 A relay located under the spa. The actual current used by the spa heater is 6 A.

Brain aka The Computer

The computer is an upgraded PC running Debian Linux. At the time of writing it is powered by an AMD Athlon XP 2000+, has 512 MB RAM and two 80 GB drives in a RAID 1 configuration. Of course, this is several million times more powerful than what is needed to run the temperature controller. Actually, the PC is running a BackupPC server, a mail server, a DNS server, a database server (mySQL) and a few other things as well. I recently bought a NSLU2 single board computer and I'm considering hacking it to run linux and use it as the brain of the system. I'll keep you posted on this...

The spa temperature is read from the sensors using DigiTemp, an open source program for Linux. Each minute, three temperature samples are taken, then the median of the samples is stored into a mySQL table, along with the date and time. When this is done, a perl script calculates the average of the last 15 readings (minutes) then compares this to a "goal" value set from the web interface. It then decides whether to activate the relay using Quozl's program. I use a 15 minutes moving average rather than just the last temperature reading to add some latency to the system with the hope that it will prevent it from oscillating too fast between on and off.

Web interface

The web interface is made with PHP and jpgraph, it shows a graph of the spa temperature for last 12 hours, how many minutes was the heater on, the derivative of the temperature, and more. Notice that the temperature line is blue when the heater is inactive (off) and red when the heater is active (on).

In the future I might make an automatic mode where the computer gets the weather forecast from Internet and uses that to decide what temperature to keep the spa at. For example, if the forecast says that it will be sunny with more than 25°C, I set the spa to 35°C, if it's going to rain or the temperature will be below 25°C, it sets the spa to 25°C. I'm not sure how usefull it would be though...

Conclusion

It was really worth it. We now usually keep the temperature at 35°C on hot days, and at 30°C or 25°C on colder ones, when we're not going to use the spa. About an hour before entering the water, we log on the web interface and set the temperature to 37.5°C. With the web interface it's possible to log on to the spa from anywhere in the world - with the correct password - and set its temperature. Half an hour before leaving work, I log on the spa, set it to 37.5°C and it's ready when I arrive home! I also estimate that we save at least 30% of the cost of heating the spa, which is a few hundred dollars per summer. We also improved the insulation, so it's hard to tell what measure is the most effective in cost savings, but the result is that it's much cheaper and more comfortable than before to have that spa!

Posted by gfk at 7:51 PM | Comments (0)

mai 2, 2005

FreeBSD ports and php session

Having problems with sessions in the FreeBSD port of php4? Read on.

If you're getting this PHP4 error on FreeBSD,

Fatal error: Call to undefined function: session_start()

You need to install the FreeBSD port package called "php4-session"

Simple as that.

Posted by gfk at 9:01 PM | Comments (0)

avril 4, 2005

HTTP Load Balancing and Fail Over using DNS

This article describes how to set up two web servers and two load balancers in a cheap and efficient manner. Three scenarios are presented, the first where we have one Internet Service Provider (ISP), another one with two ISPs and the last one with two ISPs but with one prefered over the other.

In the web site business, we usually try to keep the best uptime possible. A great way of doing this is by having two identical web servers. In order to make two web servers look like one to the outside world, we need a load balancer. Having only one load balancer, however, introduces a new single point of failure, which is what we're trying to remove by adding a new web server. That's why we'll be using two load balancers. Other articles describe ways of doing this with CARP, I describe it using DNS, both solutions have strengths and weaknesses:

SystemStrengthsWeaknesses
DNS
  • Stateless
  • DNS is well established and tested
  • The load balancers work in parallel
  • Not really what DNS was designed for
  • Will introduce a delay (caused by a DNS timeout) when a load balancer is down
  • We need to use a small Time To Live (TTL) value for the web server's hostname, so that will increase the DNS traffic.
  • In the worst case scenario, a client will not be able to reach the web server for about 30 seconds.
CARP
  • Designed for this exact purpose.
  • Designed by the OpenBSD guys
  • In the worst case scenario, a client will not be able to reach the web server for a few seconds (existing connections will be cut).
  • Introduces states in the system
  • Another daemon is running on each server
  • Only one load balancer can work at a time (the other is a hot backup)
  • Relatively young technology/implementation

UPDATE (2005-04-22): After writing this article, I had the chance to read this analysis that says "that, while a majority of clients and local DNS servers honor DNS TTLs, a significant fraction does not (up to 47% of clients and local DNS servers collectively, and 14% of local DNS servers in our measurements). Moreover, those that violate TTLs do so by a large amount, in excess of 2 hours." That pretty much puts the last nail in the DNS approach's coffin. However as pointed out by the analysis, if you have a limited number of clients, you can make sure that they configure their DNS correctly so that they will respect your TTL values.

One ISP

 ISP1 -----<+>----------------------------------------
         Router      |       |       |       |
                     |       |       |       |
                   [LB1]   [LB2]   [WWW1]--[WWW2]

LB1 and LB2 are small servers (likely a Soekris) that have two daemons running on each one: a DNS server (tinydns) and a HTTP load balancer (pen)

WWW1 and WWW2 are two web servers, with a dedicated link in between for real-time synchronisation.

The nameservers on the registrar's list and in each DNS zone described below are: lb1.example.tld and lb2.example.tld.

LB1

This is our first load balancer/DNS server. It has a complete zone for example.tld, with a special ressource record (RR) that makes www.example.tld points to itself.

DNS RR: www.example.tld. 30 IN CNAME lb1.example.tld

tinydns syntax: Cwww.example.tld:lb1.example.tld:30

Pen syntax: pen -j /var/empty -u nobody -h lb1:80 www1:80 www2:80

LB2

This is a mirror of LB1, it makes www.example.tld point to itself.

DNS RR: www.example.tld. 30 IN CNAME lb2.example.tld

tinydns syntax: Cwww.example.tld:lb2.example.tld:30

Pen syntax: pen -j /var/empty -u nobody -h lb2:80 www1:80 www2:80

In this scenario, the only single point of failure is the network. To avoid this problem, we must use another ISP, as shown in the example below.

Two ISPs without preference between ISPs

 ISP1 -----<+>-------------------------------------------
         Router      |       |       |       |
                     |       |       |       |
                   [LB1]   [LB2]  [WWW1]--[WWW2]
                     |       |       |       |
         Router      |       |       |       |
 ISP2 -----<+>-------------------------------------------

LB1 and LB2 have four daemons running on each server: two DNS server (tinydns) and two load balancers (pen). One DNS server and one load balancer per interface.

WWW1 and WWW2 are two web servers, with a dedicated link in between for real-time synchronisation. They have three interfaces, one for each ISP and one for the synchronisation.

The nameservers on the registrar's list and in each zone below are: lb1-isp1.example.tld, lb2-isp1.example.tld, lb1-isp2.example.tld, and lb2-isp2.example.tld.

LB1-ISP1 (ISP1 network interface on LB1)

DNS RR: www.example.tld. 30 IN CNAME lb1-isp1.example.tld

tinydns syntax: Cwww.example.tld:lb1-isp1.example.tld:30

Pen syntax: pen -j /var/empty -u nobody -h lb1-isp1:80 www1-isp1:80 www2-isp1:80

LB1-ISP2 (ISP2 network interface on LB1)

DNS RR: www.example.tld. 30 IN CNAME lb1-isp2.example.tld

tinydns syntax: Cwww.example.tld:lb1-isp2.example.tld:30

Pen syntax: pen -j /var/empty -u nobody -h lb1-isp2:80 www1-isp2:80 www2-isp2:80

LB2-ISP1 (ISP1 network interface on LB2)

DNS RR: www.example.tld. 30 IN CNAME lb2-isp1.example.tld

tinydns syntax: Cwww.example.tld:lb2-isp1.example.tld:30

Pen syntax: pen -j /var/empty -u nobody -h lb1-isp1:80 www1-isp1:80 www2-isp1:80

LB2-ISP2 (ISP2 network interface on LB2)

DNS RR: www.example.tld. 30 IN CNAME lb2-isp2.example.tld

tinydns syntax: Cwww.example.tld:lb2-isp2.example.tld:30

Pen syntax: pen -j /var/empty -u nobody -h lb1-isp2:80 www1-isp2:80 www2-isp2:80

This scenario has the great advantages of being completely state-less (in a conceptual way) and having no single point of failure - except human error, maibe...

Two ISPs with preference between ISPs

   ISP1 -----<+>-------------------------------------------
 Prefered   Router      |       |       |       |
                        |       |       |       |
                      [LB1]   [LB2]  [WWW1]--[WWW2]
                        |       |       |       |
            Router      |       |       |       |
   ISP2 -----<+>-------------------------------------------

In some cases, one ISP is cheaper than the other, so for example if ISP1 was cheaper, you'd rather use it as much as possible. It's very similar to the scenario without preference, except for the DNS servers on the side of ISP2. I'll only show the configuration for these servers.

You need to have a way of figuring out if ISP1 is working. For this, we must introduce the concept of states, so this scenario is not "state-less" anymore.

If ISP1 is working (state ISP1UP), then we should only use it, otherwise we should use ISP2 (state ISP1DOWN). An easy way of doing this would be by doing a ping every 30 seconds to an external server from the ISP1 interface of each DNS servers (LB1 and LB2). This is actually the hardest step of the whole process, because ISP1 might be up only for some people, and it would be pretty hard to figure out when to consider it down or up.

LB1-ISP2 (ISP2 network interface on LB1)

DNS pseudo RR:
if (ISP1UP) then
    www.example.tld. 30 IN CNAME lb1-isp1.example.tld
else
    www.example.tld. 30 IN CNAME lb1-isp2.example.tld
end if

LB2-ISP2 (ISP2 network interface on LB2)

DNS pseudo RR:
if (ISP1UP) then
    www.example.tld. 30 IN CNAME lb2-isp1.example.tld
else
    www.example.tld. 30 IN CNAME lb2-isp2.example.tld
end if

This way, when ISP1 is running, ISP2 will only be transporting about half of the DNS queries, which should not amount to a lot of bandwidth compared to the HTTP traffic.

Posted by gfk at 7:03 PM | Comments (1)

novembre 12, 2004

How to kill a unix process

Were you ever told to kill a Unix process using "kill -9 pid"? Did you know that that was wrong? Here's a copy of a Usenet posting by Randal L. Schwartz explaining why.

I keep a printed copy of the message posted on the wall.

Newsgroup:comp.unix.questions
Date:2001-01-20 (20 Jan 2001 09:58:20 -0800)
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Dangerous use of Kill -9 (was Re: kill users script?)

>>>>> "dipwad" == dipwad writes:

dipwad> kill -9 $PID
...

dipwad> Maybe this will get you started in the direction you wish to take.

Only if you want to make a script which has many unintended consequences.

As I've said before...

No no no. Don't use kill -9.

It doesn't give the process a chance to cleanly:

1) release IPC resources (shared memory, semaphores, message queues)

2) clean up temp files

3) inform its children that it is going away

4) reset its terminal characteristics

and so on and so on and so on.

Generally, send 15 (SIGTERM), and wait a second or two, and if that
doesn't work, send 2 (SIGINT), and if that doesn't work, send 1
(SIGHUP). If that doesn't, REMOVE THE BINARY because the program is
badly behaved!

Don't use kill -9. Don't bring out the combine harvester just to tidy
up the flower pot.

Just another Useless Use of Usenet,
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Posted by gfk at 6:34 PM | Comments (0)

novembre 10, 2004

The history of Solaris

A while ago, Philip Pokorny explained to me what was up with Solaris (SunOS) version numbers. It's worth keeping as a reference.

In case you were looking for a longer history, check out this excerpt from the Solaris 8 System Administrator Guide.

Sun's operating system was originally based on BSD and called SunOS.
Numbered 4.x in line with the BSD numbering convention.

When Sun started working with AT&T and switched to the SysV code base, they renamed their OS Solaris.

In a fit of marketing confusion, they decided to rename SunOS 4.3 ->
Solaris 1.0. The new AT&T code base was called Solaris 2.0 but uname
reports it as 5.0 so that scripts that compared uname versions would see
that Solaris 2 was newer (5 > 4) than SunOS 4.

Solaris version 2.0, 2.1, 2.2, 2.3, were all generally pretty bad. 2.4
was the first Solaris release that was stable enough for my previous
employer to use. 2.5 was much better and 2.5.1 was rock solid.

Solaris 2.6 was pretty good, but major changes were coming.

With Solaris 2.7, the marketting types got involved again and renamed
the OS. They decided they couldn't compete with Red Hat 6 and Windows
NT 4.1 with a name like Solaris 2.7 so they renamed it "Solaris 7", but
uname *still* says 5.7.

The tradition continues with Solaris 8 (uname -> 5.8)

Posted by gfk at 8:51 PM | Comments (1)

novembre 9, 2004

Simple email virus scanning with Sanitizer and ClamAV

This article is the result of hours of trying to figure out how to clean up my email box. I was tired of receiving several spams and virii every day. I managed to set up spamassassin pretty easily, but I had several frustrating problems with automated antivirus email scanning.

Note that this article was written in 2003, so it might not be accurate anymore. By the way, I'm now using qpsmtpd to scan my incoming mail.

Simple email virus scanning with Sanitizer and ClamAV

I spent several hours trying to make Amavis work, at times it would even eat my emails without saying anything. When I finally managed to make it work on my system, its daemon would use 18 MB of RAM, which is a lot for a program that only starts an external antivirus program.

I finally abandoned Amavis and searched for something else that would simply work (and work simply). Surprisingly, it only took me about half an hour on Google to find out what I was looking for. Should I have done that before trying Amavis, it would have saved me of lot of pain. Here's what I found.

Anony Sanitizer

Anony Sanitizer is a relatively simple perl script that is called from procmail, it is designed to manage email attachments. It decides what to do with them depending on their filenames using regular expressions. It works with an external antivirus program, it is really easy to make it work with about any command line virus scanner out there.

Here's the formal description available on its web site.

The Anomy sanitizer is what most people would call an email virus scanner. That description is not totally accurate, but it does cover one of the more important jobs that the sanitizer can do for you - it can scan email attachments for viruses. Other things it can do:

  • Disable potentially dangerous HTML code, such as javascript, within incoming email.
  • Protect you from email-based break-in attempts which exploit bugs in common email programs (Outlook, Eudora, Pine, ...).
  • Block or "mangle" attachments based on their file names. This way if you don't need to receive e.g. visual basic scripts, then you don't have to worry about the security risk they imply (the ILOVEYOU virus was a visual basic program). This lets you protect yourself and your users from whole classes of attacks, without relying on complex, resource intensive and outdated virus scanning solutions.

The sanitizer is designed not to waste important system resources (CPU, memory, disk space) unnecessarily, and does so by treating it's input as a stream which is scanned and rewritten a little bit at a time.

One of the core ideas behind the design of the sanitizer, is that just because a message contains an infected attachment doesn't mean that the rest of it shouldn't be delivered. Email often contains important information, and it is vital that a tool like this interrupt the normal flow of communication as little as possible. It's common courtesy to inform the user of any changes that are made. The Anomy sanitizer tries to follow these rules.

The sanitizer is based on solid foundations - most of the ideas implemented in the first versions of the sanitizer were ported from John D. Hardin's email security through procmail package. The sanitizer, like the code it is based on, is Free Software in the GNU sense of the term - the sanitizer may be modified and redistributed according to the terms of the GNU General Public License.

Sanitizer development is for the most part sponsored by FRISK Software International. Please consider buying their anti-virus products to show your appreciation.

Now that I had something to parse the messages and fire up the antivirus, I only needed an antivirus program.

Clam AntiVirus

I spent a lot of time trying every antivirus program listed on Amavis' web site and was disappointed by every single one of them. I finally found satisfaction when searching on packages.debian.org, where I found Clam AntiVirus (ClamAV for short).

ClamAV is a simple antivirus program using the OpenAntiVirus.org virus signatures. It's written in C and, like Sanitizer, it's licensed under the GPL. In my humble opinion, the only thing it lacks is a way to repair infected files like other antivirus programs do.

Here's the formal description from its web site.

Clam AntiVirus is an anti-virus toolkit for UNIX. The main purpose of this software is the integration with mail servers (attachment scanning). The package provides a flexible and scalable multi-threaded daemon, a command line scanner, and a tool for automatic updating via Internet. The programs are based on a shared library distributed with the Clam AntiVirus package, which you can use in your own software. Clam AV uses a virus database from OpenAntiVirus.org, we also help with signature generating.

Installation

I did the installation on Debian so I just ran apt-get install sanitizer clamav clamav-testfiles and everything was installed and configured.

On other systems, you may have to use RPMs or to compile the source, check out ClavAV docs and Sanitizer docs for more information.

Configure Clam Antivirus

On Debian this step is done by the installer, but you'll have to do it manually for other operating systems.

The only thing you need to do to configure Clam Antivirus is to have its virii signatures downloaded every night.

Set up a cronjob to start this every night: sudo -u clamav /usr/bin/freshclam --quiet -l /var/log/clam-update.log

The definitions will be saved to directory /var/lib/oav-virussignatures

Test ClamScan

To make sure that your Antivirus installation is working correctly, do the following steps.

ali:# cd /usr/share/clamav-testfiles/
ali:# /usr/bin/clamscan --recursive --unzip --unrar --tar --tgz
Archive:  /usr/share/clamav-testfiles/test2.zip
  inflating: clamtest                
/tmp/fe5346b6358e37b5/clamtest: ClamAV-Test-Signature FOUND
/usr/share/clamav-testfiles/test2.zip: Infected Archive FOUND
/usr/share/clamav-testfiles/README: OK
/usr/share/clamav-testfiles/test1: ClamAV-Test-Signature FOUND
/usr/share/clamav-testfiles/ha: OK

UNRAR 2.71 freeware Copyright (c) 1993-2000 Eugene Roshal


Extracting from /usr/share/clamav-testfiles/test3.rar

Extracting test2.zip Ok
All OK
Archive: /tmp/5928df3a8015eea3/test2.zip
inflating: clamtest
/tmp/b1c9441d58318672/clamtest: ClamAV-Test-Signature FOUND
/tmp/5928df3a8015eea3/test2.zip: Infected Archive FOUND
/usr/share/clamav-testfiles/test3.rar: Infected Archive FOUND

----------- SCAN SUMMARY -----------
Known viruses: 7271
Scanned directories: 4
Scanned files: 5
Infected files: 3
Data scanned: 0.00 Mb
I/O buffer size: 131072 bytes
Time: 2.403 sec (0 m 2 s)

If it reports 3 infected files like in the example above, then it is working properly. You can remove the clamav-testfiles if you want, they won't be needed anymore.

Configure Sanitizer

Sanitizer works with policies, you define policies for attachments depending on their filenames -- mainly their extension -- using regular expressions. The actions that you can do for each policy is: Accept, Mangle, Save or Drop.

I decided to define two simple policies:

  1. Drop any attachment of type: pif, com, cmd, bat
  2. Scan any attachment of type: exe, zip, tar, tgz, tar.gz, rar, doc, xls, pps, ppt
    1. Drop it if it contains a virus
    2. Accept it if it didn't contained a virus
    3. Save it in quarantine in case of an error.



The default policy is to accept attachments.

This policy is quite lax, that's what I wanted, you may want to make your policy more strict.

Here's the Sanitizer configuration (/etc/sanitizer.conf) file that I'm using.

#
# All feature switches.
#
header_info         = X-Virus-Scanned: Scanned by Sanitizer/ClamAV
header_url          = 0      # Don't show the Mailtools URL
feat_verbose        = 0      # Warn user about unscanned parts, etc.
feat_log_inline     = 1      # Inline logs: 0 = Off, 1 =  Maybe, 2 = Force
feat_log_stderr     = 0      # Don't print log to standard error
feat_log_xml        = 0      # Don't use XML format for logs.
feat_log_trace      = 0      # Omit trace info from logs.
feat_log_after      = 0      # Don't add any scratch space to part headers.
feat_files          = 1      # Enable filename-based policy decisions.
feat_force_name     = 0      # Dont't force all parts (except text/html parts) to
                             # have file names.
feat_boundaries     = 0      # Don't replace all boundary strings with our own
                             # NOTE:  Always breaks PGP/MIME messages!
feat_lengths        = 1      # Protect against buffer overflows and null
                             # values.
feat_scripts        = 1      # Defang incoming shell scripts.
feat_html           = 1      # Defang active HTML content.
feat_webbugs        = 1      # Web-bugs are allowed.
feat_trust_pgp      = 0      # Don't scan PGP signed message parts.
feat_uuencoded      = 1      # Sanitize inline uuencoded files.
feat_forwards       = 1      # Sanitize forwarded messages
feat_testing        = 0      # This isn't a test-case configuration.
feat_fixmime        = 1      # Fix invalid MIME, if possible.
feat_paranoid       = 0      # Don't be excessively paranoid about MIME headers


# Create temporary or saved files using this template.
# An attachment named "dude.txt" might be saved as
#
# /var/quarantine/att-dude-txt.A9Y
#
# Note: The directory must exist and be writable by
# the user running the sanitizer.
#
file_name_tpl = /var/quarantine/att-$F.$$$

#
# Policies
#

# We have two policies, in addition to the default which is
# to accept attachments.
file_list_rules = 2
file_default_policy = accept
file_default_filename = unnamed.file

# Policy num. 1
# Delete obviously executable attachments. This list is VERY
# incomplete! This is a perl regular expression, see "man
# perlre" for info. The (?i) prefix makes the regexp case
# insensitive.
#
# There is only one policy, since we aren't using an external
# scanner.
#
file_list_1 = (?i)\.(pif|com|cmd|bat)$
file_list_1_policy = drop
file_list_1_scanner = 0

# Policy num. 2
# Scan all files for Viruses, using clamscan
# Always define FOUR potential policies, which depend on the
# exit code returned by the scanner. Which code means what is
# defined in the scanner line, which must contain THREE entries.
# The fourth policy is used for "anything else".
#
# "accept" if the file is clean (exit status 0)
# "mangle" if the file was dirty, but is now clean (clamav doesn't
# support file desinfection, so we use the fictive exit status 66)
# "drop" if the file is still dirty (exit status 1)
# "save" if clamscan returns some other exit code or an error occurs.
#
file_list_2 = (?i)\.(exe|zip|tar|tgz|tar\.gz|rar|doc|xls|pps|ppt)$
file_list_2_policy = accept:mangle:drop:save
file_list_2_scanner = 0:66:1:/usr/bin/clamscan --recursive --unzip --unrar --tar --tgz %FILENAME

I believe that the syntax of policy 2 needs a little explanation. The first argument (file_list_2) states which files should be processed by this policy. The second (file_list_2_policy) and third (file_list_2_scanner) arguments are interdependent. The second argument specifies what to do with the attachment depending on what the virus scanner exit code is. As you can see, in this configuration, if clamscan returns 0 we accept the attachment; if it returns 66 we will mangle the attachment; if it returns 1 we drop the attachment and it if returns something else we save the attachment in quarantine.

Test sanitizer

I created an email message containing a fake virus signature to test sanitizer. Save this file and pass it through sanitizer using the following command.

sanitizer /etc/sanitizer.conf < fake-email-virus.txt > fake-email-sane.txt 

The infected attachment should be replaced by the following text in the sane version.


NOTE: An attachment was deleted from this part of the message,
because it failed one or more checks by the virus scanning system.
See the attached sanitization log for more details or contact your
system administrator.

The removed attachment's name was:
test1.doc

It might be a good idea to contact the sender and warn them that
their system is infected.

If you get similar results, this means that Sanitizer is properly configured. Congratulations! Only one thing left, tell procmail to use sanitizer.

Configuring procmail to use sanitizer

Add this on the top of your .procmailrc file.

:0fw:
| /usr/bin/sanitizer /etc/sanitizer.conf

Since it is pretty short, let me explain what it does.

The first line. The :0 marks the beginning of a new rule. The f means that this rule is a filter. The w tells procmail to wait for sanitizer to finish. The last colon tells procmail to use a lockfile to insure that only one message will be processed at a time. This is important to use a lockfile because otherwise, if you were to receive a batch of several emails at once, procmail would process all these emails in parallel and effectively making a Denial of Service Attack on the system. Believe me, I learned that the hard way! ;-)

The second line is more simple, the pipe (|) means that the message should be directed to the standard input of sanitizer. Then we have the path to sanitizer and to its configuration.

That's it, you now have simple virus scanner attached to your inbox. You only need to test it to be sure that it works properly.

Test the whole thing

To test your new configuration, you fire up your favorite mailer program and send yourself three emails: one containing only text (no attachment), one containing a non-infected attachment and one containing one of the clamav-testfiles as attachment. Make sure that you receive these emails and that they are modified according to your sanitizer configuration.

Posted by gfk at 7:25 PM | Comments (0)