circle.ch weblog by Urs Gehrig

 Search
A weblog about libre software, law, technology, politics and the like.
2013-05-11T15:38:07
Academic, Android, Apache, Apple, Art, Best Of, Biotech, Civil Society, Content Management, Cooking, Copyright, Creative Commons, Crosspost, Culture, Database, Deutsch, DRM, Economy, Education, Event, Gadget, General, Geodata, Government, Health, Howto, Humor, Innovation, Intellectual Property, Java, Language, LaTeX, Law, Linux, Media, Moblog, Mozilla, Music, Office, Open Content, Open Source, P2P, PHP, Podcast, Politics, Privacy, Projects, Random Thought, Rant, Science, Search, Social Network, Software, Sport, Talks, Technology, Technology Transfer, Travelling, Weblog, Wiki, Wireless and Mobile, XML

17. October 2003

Bayesian Filtering of Spam
@ 07:54:23

The hit rate of this blog is now about 3 per day: I am talking about comment spam. This is no fun at all. I intended to turn off the back-channels like comments and trackback eventually pingback too. But that is no fun. So I started to look around for spam blocking solutions. Beside blacklists - mentioned earlier here - BayesianClassification caught my eyes. So I added a new wiki page to have some starting point. Also most solutions are intended to fight email spam there could be a combination with blacklists I guess.

There is currently no PHP implementation so I started to convert the C implementation of Paul Graham's "A Plan for Spam" [1] by Craig Morrison [2]. Unfortunately I got stuck because the PECL sqlite extension does not yet include the sqlite_compile and sqlite_step functions which are used in Craig's version to do some fun stuff with SQLite:
sqlite_compile() is used as a precursor to sqlite_step(). It takes an
SQL statement and "compiles" it into a VM (virtual machine) that sqlite
uses for each successive call to sqlite_step().

What it does is allow me to query a database without using a callback
function. Each call to sqlite_step() returns the next row in the result
set from the initial sqlite_compile() call.

I needed to do that, because I have to do lookups in the same database
and I could not do those lookups recursively inside the callback which
would be needed when using sqlite_exec(). The use of the virtual
machines allows me to maintain a seperate state for each lookup that I
need to do.
Craig was answering me via email as I asked him if there is a possible workaround to come along without the mentioned functions. Thanks to Craig.

[1] http://www.paulgraham.com/spam.html
[2] http://sourceforge.net/projects/bayesiancfilter

Comments (0) Permalink del.icio.us

The URL to TrackBack this entry is:
   http://circle.ch/blog/b2trackback.php/1194

Comments closed.



Werbung:

Beiträge von Dritten:

Nachfolgende Titel verweisen auf von mir gelesene Weblogs.

Feeds:

Blog Content
Blog Comments

WikiAgenda:

Comments:

Good question, but...
Hi, thank you very...
Unter http://www.s...
Ich weiss mir nich...
ThanQ matthias. Th...
in case you just w...
ich liebe dir, urs...
hi there, sorry i...
Hoi Leo. I haven'...
Do you know the si...

Archives:

Blog stack:

Bill Humphries
monorom
Wendy M. Seltzer
Christian Stocker
Roger Fischer
Sandro Zic
Wez Furlong
Ben Hammersley
George Schlossnagle
Joichi Ito
Lawrence Lessig
Derek Slater
Karl-Friedrich Lenz
John Palfrey
Bernhard A.M. Seefeld
Gregor J. Rothfuss
Rainer Langenhan
Elke Engel
Sebastian Bergmann
Simon Willison
Jeremy Zwaodny
Udo Vetter
Axel A. Horns
Miguel de Icaza
Andreas Halter
Silvan Zurbrügg
Hannes Gassert
Markus Koller


$Date: 2005/11/05 11:14:30 $