Board index PBase News comment spam / new search server / etc

News

comment spam / new search server / etc

slug
Site Admin
Site Admin
 
Posts: 598

comment spam / new search server / etc

Post Wed Oct 12, 2005 1:10 pm


Over the last couple of months I've been spending a lot of time fighting the comment spammers.
Two months ago, the comment spam was exploding and a good 20% of comments were ads for online poker or pharmaceuticals.
Our defense has evolved somewhat. At first I was just searching for and deleting the spam after it happened. This is why for a while, the comment count displayed on the profile page was incorrect.
Gradually we compiled a list of obvious spam and reworked the code to catch the spam before it goes into a comment or your guestbook. Also when I do go delete spam I find, the comment counts are updated properly.
We still have some improvements to make to the spam shield, but it looks like only about 1% of comments are spam now.

About a month ago we moved the search to a new server and performance has greatly improved. Now we'll be able to work on improving the search features instead of just trying to keep it running.

Emily has been working a various features including the improved favorites page. She's finishing up a similar feature that I think has the potential to be really useful, or at least fun.

My main project currently is improving overall site performance.
We're getting close to finally getting a serious database machine.
Lots of meetings recently with Sun, Oracle, providers, and banks to make it happen. The system we're looking at should be at least twice as powerful as the current machine and is expandable much beyond that.
Due to cash limitations, most of our main database machines have had around a year of life before they become overloaded. I expect this machine to have at least 3-5 years. During this I've been making lots of tiny effiency improvements, but things are still slow during peak hours.
Believe me, this causes us much pain since we know everyone viewing the site is frustrated. We're doing everything we can right now to solve this problem for good.

Meanwhile, I've also been working on things such as a much improved statistics system. You'll be able to see graphs of daily hitcounts on any individual image or gallery, as well as aggregate statistics. Some of you may remember seeing these graphs a couple years ago, but they never went public because they were far too much of a system hog.
Recently, I redesigned how the statistics are collected and with the new database machine on the way, they should be no problem.

-slug

arjunrc
 
Posts: 1003


Post Wed Oct 12, 2005 4:20 pm


Slug, any ETA on when we can see a Beta of the stats system ?
nedstat seems to be randomly showing popups every once in a while after it transitioned to webstats - and I'd love to rip it off as soon as I have a better real time stats option.

regds
arjun

srijith
Moderator
 
Posts: 2321
Location: Amsterdam


Post Wed Oct 12, 2005 5:10 pm


Gradually we compiled a list of obvious spam and reworked the code to catch the spam before it goes into a comment or your guestbook. Also when I do go delete spam I find, the comment counts are updated properly.
We still have some improvements to make to the spam shield, but it looks like only about 1% of comments are spam now.

Slug, you might want to take some hints from the various weblog antispam plugins that are performing pretty well. The two basic ideas that they use are

1) spammers post links that they want users to click and visit. Hence every time a spam is identified, the links in them are blacklisted so that next time a comment containing these URLs will never make it past the check.

2) spams are usually posted using compromised/infected machines. Thus, the code checks if the IP address from which the comment is being made is listed in any of the several RBLs and if it is, the comment is not allowed. There are workaround that spammers use for this , like using open proxies and Tor etc. but it takes them that extra bit of effort and that might be enough to deter them.

slug
Site Admin
Site Admin
 
Posts: 598


Post Thu Oct 13, 2005 11:37 am


Thanks srijith.

Option 1 is what I'm using now. I've collectd 3646 bad hosts now in our blacklist. Right now I do it all manually. The part I still need to add is some automatic blacklisting to block obvious spam hosts that contain keywords such as phentermine, viagra, poker, casino etc. Just need to set it up so that will automatically count it as spam, but then I can go back and review any automatic additions, and undelete the spam to avoid false positives.
Not too worried about false positives though, If even legitimate conversations about cialis and vioxx are disallowed on PBase, not many will be sad.

Hadn't thought of using Option 2 for comments. We use RBLs for our email.
RBLs might help, but after watching the IPs of spam comments, it seems somewhat hopeless. Even when someone posts 100 identical spam comments, often then come from 100 distinct IP addresses.

Anyway, the spam situation is mostly under control, after I add the keyword-based autoblacklisting of URLs, we'll get almost all of them.

-slug

slug
Site Admin
Site Admin
 
Posts: 598


Post Thu Oct 13, 2005 11:46 am


Arjun,
I'll let you know when there's something to look at for the stats.

There are three parts to the problem.
1 Efficient collection of the data.
2 Efficient lookup of the data.
3 Creating the graphs and displaying.

Part 1 is done, but we only have a few days of data now.
Part 3 is done. You might not think the graphs are pretty but they work, and this part can be improved/redone at any time to make things look better.

I just have to finish up Part 2 which especially difficult at the moment since the database is already struggling. Most likely, the current database server will be dedicated to stats once we upgrade to the new machine.
-slug

jcboyd
 
Posts: 640


Post Thu Oct 13, 2005 10:00 pm


Slug,
Many thanks for keeping us informed about this type of information. Please continue to use this venue to let your subscribers know what is good or bad that is happening. This will go a long way in keeping us happy. We hate being left in the dark.

thanks again
john
Photography Is More - Than Just Clicking The Shutter!
http://www.pbase.com/jcboyd

cjmorgan
 
Posts: 231

Re: comment spam / new search server / etc

Post Fri Oct 14, 2005 3:27 am


Thank-you Slug. Whatever can be done to improve and make
the system more consistent and efficient is very much appreciated.
So again, thank-you.
CJ

rtwo
 
Posts: 232


Post Sat Oct 15, 2005 3:29 am


Slug ... thank you so much for the extensive information... it helps me to better understand what you are dealing with and why there are problems. I am impressed with your willingness to let us know.
Keep it up, and again thank you for posting and for ALL your efforts.
Robin
Robin Reid
"Lets snap to it" <grin>

clickaway
 
Posts: 2689


Post Mon Nov 28, 2005 1:32 pm


thanks for keeping us informed slug.

looking forward to the new beast to improve performance.

and remind emily that her new favorites program is fabulous!

Ray

robertwhite
 
Posts: 206


Post Tue Nov 29, 2005 6:05 am


Thanks for keeping us informed that certainly goes a long way to keeping us happy . looking forward to seeing the new improvements
Thanks

johnwaine
 
Posts: 520


Post Sat Dec 03, 2005 5:24 pm


Thanks for the information. It really helps maintain morale and confidence in PBase.

michelep
 
Posts: 7

Spam

Post Fri Dec 16, 2005 5:00 pm


I never received spam before. Since Dec. 7th I have received 5.

Will this ever really be stopped once it starts ??

Michele :x

fpeeters
 
Posts: 1


Post Tue Jan 17, 2006 9:55 am


Same here. Did not receive any comment spam for three years, but since about a week I'm getting one or two per day.

mikej86
 
Posts: 27

hm

Post Tue Jan 24, 2006 1:19 am


I just deleted about 12 Guestbook spams from my guestbook and am really pissed off and offended that it happened in the first place. What is the point of this, half of my messages said one thing and half said another, but neither implied a specific product or service it was just nonsensical jargin. They were all from different email addresses as well. UGH!

-mike

reflectionsbyruth
 
Posts: 449

worm virus.. please read

Post Mon Jan 30, 2006 5:25 pm


I think spammers leaving messages in guest books is one way for them to harvest email addresses.
There is a major virus going around that seems to be hitting many.

Please read this.. the link is safe dont worry.. it was sent around by our security dept at work ;)

http://us.mcafee.com/virusInfo/default. ... s_k=138091

Next

Board index PBase News comment spam / new search server / etc

Who is online

Users browsing this forum: ClaudeBot and 2 guests