Page 1 of 2

comment spam / new search server / etc

PostPosted: Wed Oct 12, 2005 1:10 pm
by slug
Over the last couple of months I've been spending a lot of time fighting the comment spammers.
Two months ago, the comment spam was exploding and a good 20% of comments were ads for online poker or pharmaceuticals.
Our defense has evolved somewhat. At first I was just searching for and deleting the spam after it happened. This is why for a while, the comment count displayed on the profile page was incorrect.
Gradually we compiled a list of obvious spam and reworked the code to catch the spam before it goes into a comment or your guestbook. Also when I do go delete spam I find, the comment counts are updated properly.
We still have some improvements to make to the spam shield, but it looks like only about 1% of comments are spam now.

About a month ago we moved the search to a new server and performance has greatly improved. Now we'll be able to work on improving the search features instead of just trying to keep it running.

Emily has been working a various features including the improved favorites page. She's finishing up a similar feature that I think has the potential to be really useful, or at least fun.

My main project currently is improving overall site performance.
We're getting close to finally getting a serious database machine.
Lots of meetings recently with Sun, Oracle, providers, and banks to make it happen. The system we're looking at should be at least twice as powerful as the current machine and is expandable much beyond that.
Due to cash limitations, most of our main database machines have had around a year of life before they become overloaded. I expect this machine to have at least 3-5 years. During this I've been making lots of tiny effiency improvements, but things are still slow during peak hours.
Believe me, this causes us much pain since we know everyone viewing the site is frustrated. We're doing everything we can right now to solve this problem for good.

Meanwhile, I've also been working on things such as a much improved statistics system. You'll be able to see graphs of daily hitcounts on any individual image or gallery, as well as aggregate statistics. Some of you may remember seeing these graphs a couple years ago, but they never went public because they were far too much of a system hog.
Recently, I redesigned how the statistics are collected and with the new database machine on the way, they should be no problem.

-slug

PostPosted: Wed Oct 12, 2005 4:20 pm
by arjunrc
Slug, any ETA on when we can see a Beta of the stats system ?
nedstat seems to be randomly showing popups every once in a while after it transitioned to webstats - and I'd love to rip it off as soon as I have a better real time stats option.

regds
arjun

PostPosted: Wed Oct 12, 2005 5:10 pm
by srijith
Gradually we compiled a list of obvious spam and reworked the code to catch the spam before it goes into a comment or your guestbook. Also when I do go delete spam I find, the comment counts are updated properly.
We still have some improvements to make to the spam shield, but it looks like only about 1% of comments are spam now.

Slug, you might want to take some hints from the various weblog antispam plugins that are performing pretty well. The two basic ideas that they use are

1) spammers post links that they want users to click and visit. Hence every time a spam is identified, the links in them are blacklisted so that next time a comment containing these URLs will never make it past the check.

2) spams are usually posted using compromised/infected machines. Thus, the code checks if the IP address from which the comment is being made is listed in any of the several RBLs and if it is, the comment is not allowed. There are workaround that spammers use for this , like using open proxies and Tor etc. but it takes them that extra bit of effort and that might be enough to deter them.

PostPosted: Thu Oct 13, 2005 11:37 am
by slug
Thanks srijith.

Option 1 is what I'm using now. I've collectd 3646 bad hosts now in our blacklist. Right now I do it all manually. The part I still need to add is some automatic blacklisting to block obvious spam hosts that contain keywords such as phentermine, viagra, poker, casino etc. Just need to set it up so that will automatically count it as spam, but then I can go back and review any automatic additions, and undelete the spam to avoid false positives.
Not too worried about false positives though, If even legitimate conversations about cialis and vioxx are disallowed on PBase, not many will be sad.

Hadn't thought of using Option 2 for comments. We use RBLs for our email.
RBLs might help, but after watching the IPs of spam comments, it seems somewhat hopeless. Even when someone posts 100 identical spam comments, often then come from 100 distinct IP addresses.

Anyway, the spam situation is mostly under control, after I add the keyword-based autoblacklisting of URLs, we'll get almost all of them.

-slug

PostPosted: Thu Oct 13, 2005 11:46 am
by slug
Arjun,
I'll let you know when there's something to look at for the stats.

There are three parts to the problem.
1 Efficient collection of the data.
2 Efficient lookup of the data.
3 Creating the graphs and displaying.

Part 1 is done, but we only have a few days of data now.
Part 3 is done. You might not think the graphs are pretty but they work, and this part can be improved/redone at any time to make things look better.

I just have to finish up Part 2 which especially difficult at the moment since the database is already struggling. Most likely, the current database server will be dedicated to stats once we upgrade to the new machine.
-slug

PostPosted: Thu Oct 13, 2005 10:00 pm
by jcboyd
Slug,
Many thanks for keeping us informed about this type of information. Please continue to use this venue to let your subscribers know what is good or bad that is happening. This will go a long way in keeping us happy. We hate being left in the dark.

thanks again
john

Re: comment spam / new search server / etc

PostPosted: Fri Oct 14, 2005 3:27 am
by cjmorgan
Thank-you Slug. Whatever can be done to improve and make
the system more consistent and efficient is very much appreciated.
So again, thank-you.
CJ

PostPosted: Sat Oct 15, 2005 3:29 am
by rtwo
Slug ... thank you so much for the extensive information... it helps me to better understand what you are dealing with and why there are problems. I am impressed with your willingness to let us know.
Keep it up, and again thank you for posting and for ALL your efforts.
Robin

PostPosted: Mon Nov 28, 2005 1:32 pm
by clickaway
thanks for keeping us informed slug.

looking forward to the new beast to improve performance.

and remind emily that her new favorites program is fabulous!

Ray

PostPosted: Tue Nov 29, 2005 6:05 am
by robertwhite
Thanks for keeping us informed that certainly goes a long way to keeping us happy . looking forward to seeing the new improvements
Thanks

PostPosted: Sat Dec 03, 2005 5:24 pm
by johnwaine
Thanks for the information. It really helps maintain morale and confidence in PBase.

Spam

PostPosted: Fri Dec 16, 2005 5:00 pm
by michelep
I never received spam before. Since Dec. 7th I have received 5.

Will this ever really be stopped once it starts ??

Michele :x

PostPosted: Tue Jan 17, 2006 9:55 am
by fpeeters
Same here. Did not receive any comment spam for three years, but since about a week I'm getting one or two per day.

hm

PostPosted: Tue Jan 24, 2006 1:19 am
by mikej86
I just deleted about 12 Guestbook spams from my guestbook and am really pissed off and offended that it happened in the first place. What is the point of this, half of my messages said one thing and half said another, but neither implied a specific product or service it was just nonsensical jargin. They were all from different email addresses as well. UGH!

-mike

worm virus.. please read

PostPosted: Mon Jan 30, 2006 5:25 pm
by reflectionsbyruth
I think spammers leaving messages in guest books is one way for them to harvest email addresses.
There is a major virus going around that seems to be hitting many.

Please read this.. the link is safe dont worry.. it was sent around by our security dept at work ;)

http://us.mcafee.com/virusInfo/default. ... s_k=138091