Home | Community | Message Board


RVF Garden Supply
Please support our sponsors.

General Interest >> Science and Technology

Welcome to the Shroomery Message Board! You are experiencing a small sample of what the site has to offer. Please login or register to post messages and view our exclusive members-only content. You'll gain access to additional forums, file attachments, board customizations, encrypted private messages, and much more!

Jump to first unread post. Pages: 1
InvisibleKrishna
कृष्ण,LOL
 User Gallery

Registered: 05/08/03
Posts: 23,284
Loc: oakland
censorship...
    #4449647 - 07/25/05 07:42 AM (11 years, 4 months ago)

for a forums system that i'm working on designing, i've been having a bit of trouble when it comes to having a 'censorship' system. basically, right now i have a large config file that is filled with swear-words - and use regexp to replace any instances of these words with **** (hehe smart enough to put the proper number of asterix, so the savy reader could figure out what swear-word has been bleeped). however, this runs in to problems with, for example, the word classic turning into cl***ic. now, i was thinking that i could have two config files - one words that should always be censored (fuck, shit, cunt, asshole, etc) and another with words that should only be censored if on their own (ass, for example). however, this would require me having a huge list of 'ass' words that should be censored (asshole, assface, assbreath, etc), otherwise they will be ignored due to 'ass' having some 'proper' usage in some words.

has anybody dealt with a problem like this before? any suggestions?

(and ps - yes i know censorship is crappy, but some clients of mine would like to have the option to turn it on...)


--------------------




Post Extras: Print Post  Remind Me! Notify Moderator
OfflineSeussA
Error: divide byzero

Folding@home Statistics
Registered: 04/27/01
Posts: 23,480
Loc: Caribbean
Last seen: 23 days, 4 hours
Re: censorship... [Re: Krishna]
    #4449675 - 07/25/05 08:11 AM (11 years, 4 months ago)

> has anybody dealt with a problem like this before? any suggestions?

The problem with censorship is that 'what is wrong' is in the eyes of the beholder.  For example, is the statement: "I rode my ass into work today" good or bad?  Assuming that the ass in question is an animal, then there is no vulgarity in the wording, but every censor I know would block the line.

For a really good system, I would work on a natural language parser to help identify the usage of the words, not simply the patterns of letters.  This is a rather large undertaking and probably not practical for your project, unless you are a PhD candidate and this is your dissertation. :smile:

If you want something quick and dirty, I would go with three files.  Prefixes, suffixes, and anywheres.  The prefix file would list patterns that are not allowed to start a word.  The suffix file would list patterns that are not allowed to end a word.  The anywhere file would list patterns that are not allowed anywhere.

You may want to match on the soundex of the word rather than the word itself.  This will help catch people that use words like 'fuxers'.  You will probably also want to make simple letter substitutions for letters that can be expressed as symbols.  For example, "@55h013"... (donno how anal you want your censor to be).

Finally, remember, legally speaking it is often better to be censor free than to claim censorship and mess up.  I have read cases where a site got sued for having a bad censor that let things slip through.  The site would have been fine had they not tried to censor at all (or at least made no claims towards trying to censor).  In our lawsuit friendly world, be careful.


--------------------
Just another spore in the wind.


Post Extras: Print Post  Remind Me! Notify Moderator
InvisibleKrishna
कृष्ण,LOL
 User Gallery

Registered: 05/08/03
Posts: 23,284
Loc: oakland
Re: censorship... [Re: Seuss]
    #4449705 - 07/25/05 09:10 AM (11 years, 4 months ago)

thanks seuss (hehe this post was mostly made hoping that you could come to the rescue). for the time being, we aren't being paid nearly enough to develop a good natural language processing system - but this 'three patern' system seems like a fairly good place to start at - that would at least take care of the problem of classic turning into cl***ic, while still turning asshole to ***hole.

hehe the problem always with developing applications on a client-based system - can't exactly say to them "well, it'd would greatly benefit our company to have done the r&d to create a good natural language processing system, so couldn't you add about 6 more zeros to the end of that cheque you cut for us?"

:smile:


--------------------




Post Extras: Print Post  Remind Me! Notify Moderator
Jump to top. Pages: 1

General Interest >> Science and Technology

Similar ThreadsPosterViewsRepliesLast post
* Blackout Europe - MEP's voting on internet censorship Visionary Tools 392 1 04/21/09 04:51 PM
by Visionary Tools
* Grand Theft Auto marked as AO now? drtyfrnk 889 12 07/23/05 08:44 PM
by browndustin
* Anonymous Proxies - SafeWeb financed by CIA LanaM 1,600 2 09/26/01 02:10 PM
by Serum
* Lawmakers urge U.S. to keep control of Web World Spirit 1,034 12 10/26/05 05:52 AM
by Rustifer
* Why can't I connect through my browser?
( 1 2 3 4 all )
HagbardCeline 2,120 67 04/09/09 09:10 PM
by zouden
* Static IP Help bort 1,042 14 01/10/09 01:58 PM
by mushroomhunter10
* U.S. Won't Cede Control of Net Computers Mycomancer 604 2 07/02/05 09:26 PM
by phi1618
* Alternative Science cybrbeast 910 13 07/07/05 02:48 AM
by TheCow

Extra information
You cannot start new topics / You cannot reply to topics
HTML is disabled / BBCode is enabled
Moderator: Lana, trendal, Diploid, automan
385 topic views. 0 members, 4 guests and 0 web crawlers are browsing this forum.
[ Toggle Favorite | Print Topic | Stats ]
Search this thread:
Marijuana Demystified
Please support our sponsors.

Copyright 1997-2016 Mind Media. Some rights reserved.

Generated in 0.034 seconds spending 0.003 seconds on 14 queries.