Home | Community | Message Board

Please support our sponsors.

General Interest >> Science and Technology

Welcome to the Shroomery Message Board! You are experiencing a small sample of what the site has to offer. Please login or register to post messages and view our exclusive members-only content. You'll gain access to additional forums, file attachments, board customizations, encrypted private messages, and much more!

Jump to first unread post. Pages: 1
 User Gallery

Registered: 05/08/03
Posts: 23,285
Loc: oakland
    #4449647 - 07/25/05 07:42 AM (12 years, 9 months ago)

for a forums system that i'm working on designing, i've been having a bit of trouble when it comes to having a 'censorship' system. basically, right now i have a large config file that is filled with swear-words - and use regexp to replace any instances of these words with **** (hehe smart enough to put the proper number of asterix, so the savy reader could figure out what swear-word has been bleeped). however, this runs in to problems with, for example, the word classic turning into cl***ic. now, i was thinking that i could have two config files - one words that should always be censored (fuck, shit, cunt, asshole, etc) and another with words that should only be censored if on their own (ass, for example). however, this would require me having a huge list of 'ass' words that should be censored (asshole, assface, assbreath, etc), otherwise they will be ignored due to 'ass' having some 'proper' usage in some words.

has anybody dealt with a problem like this before? any suggestions?

(and ps - yes i know censorship is crappy, but some clients of mine would like to have the option to turn it on...)


Post Extras: Print Post  Remind Me! Notify Moderator
Error: divide byzero

Folding@home Statistics
Registered: 04/27/01
Posts: 23,480
Loc: Caribbean
Last seen: 10 hours, 52 minutes
Re: censorship... [Re: Krishna]
    #4449675 - 07/25/05 08:11 AM (12 years, 9 months ago)

> has anybody dealt with a problem like this before? any suggestions?

The problem with censorship is that 'what is wrong' is in the eyes of the beholder.  For example, is the statement: "I rode my ass into work today" good or bad?  Assuming that the ass in question is an animal, then there is no vulgarity in the wording, but every censor I know would block the line.

For a really good system, I would work on a natural language parser to help identify the usage of the words, not simply the patterns of letters.  This is a rather large undertaking and probably not practical for your project, unless you are a PhD candidate and this is your dissertation. :smile:

If you want something quick and dirty, I would go with three files.  Prefixes, suffixes, and anywheres.  The prefix file would list patterns that are not allowed to start a word.  The suffix file would list patterns that are not allowed to end a word.  The anywhere file would list patterns that are not allowed anywhere.

You may want to match on the soundex of the word rather than the word itself.  This will help catch people that use words like 'fuxers'.  You will probably also want to make simple letter substitutions for letters that can be expressed as symbols.  For example, "@55h013"... (donno how anal you want your censor to be).

Finally, remember, legally speaking it is often better to be censor free than to claim censorship and mess up.  I have read cases where a site got sued for having a bad censor that let things slip through.  The site would have been fine had they not tried to censor at all (or at least made no claims towards trying to censor).  In our lawsuit friendly world, be careful.

Just another spore in the wind.

Post Extras: Print Post  Remind Me! Notify Moderator
 User Gallery

Registered: 05/08/03
Posts: 23,285
Loc: oakland
Re: censorship... [Re: Seuss]
    #4449705 - 07/25/05 09:10 AM (12 years, 9 months ago)

thanks seuss (hehe this post was mostly made hoping that you could come to the rescue). for the time being, we aren't being paid nearly enough to develop a good natural language processing system - but this 'three patern' system seems like a fairly good place to start at - that would at least take care of the problem of classic turning into cl***ic, while still turning asshole to ***hole.

hehe the problem always with developing applications on a client-based system - can't exactly say to them "well, it'd would greatly benefit our company to have done the r&d to create a good natural language processing system, so couldn't you add about 6 more zeros to the end of that cheque you cut for us?"



Post Extras: Print Post  Remind Me! Notify Moderator
Jump to top. Pages: 1

General Interest >> Science and Technology

Similar ThreadsPosterViewsRepliesLast post
* UK to censor online videos of 'non-conventional' sex acts tdubz 455 9 12/31/16 12:30 PM
by Call-Me-Bob
* Blackout Europe - MEP's voting on internet censorship Visionary Tools 405 1 04/21/09 04:51 PM
by Visionary Tools
* The Starchild Skull
( 1 2 3 4 all )
PookztA 3,982 69 08/11/10 07:05 PM
by Space Elf
* Defamation, Privacy Rights, and Wikipedia. Does Wikipedia's existance harm prominent people? johnm214 575 10 06/08/11 06:54 PM
by ChuangTzu
* Grand Theft Auto marked as AO now? drtyfrnk 1,036 12 07/23/05 08:44 PM
by browndustin
* Anonymous Proxies - SafeWeb financed by CIA LanaM 3,232 2 09/26/01 02:10 PM
by Serum
* Tor is about to get banned Zorro 3,058 13 04/08/13 09:36 PM
by ChuangTzu
* Alternative Science cybrbeast 990 13 07/07/05 02:48 AM
by TheCow

Extra information
You cannot start new topics / You cannot reply to topics
HTML is disabled / BBCode is enabled
Moderator: Lana, trendal, automan
449 topic views. 1 members, 1 guests and 1 web crawlers are browsing this forum.
[ Toggle Favorite | Print Topic | Stats ]
Search this thread:
Out-Grow.com - Mushroom Growing Kits & Supplies
Please support our sponsors.

Copyright 1997-2018 Mind Media. Some rights reserved.

Generated in 0.029 seconds spending 0.007 seconds on 19 queries.