Friday, November 19, 2010

 

More on Victoria Free Stuff

I got the Freecycle bit working now on my Victoria Free Stuff site. Freecycle has a posting subject line protocol where when you offer something you are supposed to prefix the subject with OFFER. And then if someone takes the item you are supposed to post another message with the same subject but prefixed TAKEN instead. If everyone did this, and all did it the same way, it would be perfect :) But humans being the creative, deviant and/or lazy creatures that they are, tend not to always adhere to the rules perfectly. By and large, MOST people do post correctly, but often with variations on the theme. Below are part of my regexp I use to pick out whether it is an offered or a taken posting.

if ($link_text =~ s/^(offered|reoffer|offer|re(\-|\s)offer|on\s+offer|free|repost)\:?//) { $type = 'offer'; }
if ($link_text =~ s/^(taken|on hold|onhold|spoken for)\:?//) { $type = 'taken'; }

See what I have to deal with?? Lol. And that's after I've got things already cleaned up a bit ;)

I discard everything that I cannot classify as an offer or a taken post, so WANTED posts and general conversation on the freecycle board get dumped by my system.

The system will group the posts by author and then attempt to match up TAKEN posts with their original OFFER posts. It has about an 80% success rate on finding the OFFER with the last 3 months of messages on Victoria Freecycle. It tries to match by exact subject, and at present for the ones it is able to match up I think it matches about 2/3 of all the TAKENs to OFFERs by perfect subject match, and then it will attempt to do a token based matchup and if it matches 85% of the words in the TAKEN post subject with words from an OFFER it will accept that as a match up as well. This provides the other third of successful TAKEN matchups. And of course once the system believes an item is TAKEN it will de-list it from the combined results :).

Its not perfect, but its as good as I can make a computer deal with the imperfection with what limited resources I have right now. To do better I think I would have to dedicate some serious time and effort into finding a way to do a contextual analysis of the contents of the messages, etc. Stuff that I would need a patron to finance! LOL :) So I'll stick with this simple thing that works pretty well for now.

Of course this would all be MOOT if Freecycle would switch to a more modern way of doing things, something like a craigslist site but with a forum attached. FreeMesa.org is I think doing something like this. I encourage the switch. If Victoria people start using FreeMesa I'll scrape that site too. Its so much easier with the scraped sites (Usedvictoria.com, Craigslist.org, Kijiji.ca) to deal with de-listing items because quite simply when the item disappears from the scrape results we know we should de-list it in our own system.

Now that the freecycle aspect is working, and that I've tweaked the scraper script to my reasonable liking I can call this project complete. Like the chan thread post/bump lulz project. Like the craigslist and gmail account creation projects. Like the proxy-list scrape project. Like the captcha bypass sales/data entry (captchaquest.com/chws.ca combo) project. Its all done. Well rather if I had patronage to make captchaquest more of a fun game I have ideas for it but not time or money.

So I did my thing. I wrote a lot of code, and I had a lot of fun. But now I need to make some money. I had hoped this stuff could become made of money but it did not pan out or I would have to do immoral things that I'm not ok with in order to make it pan out. I still have some ideas on something lolgasmic to get CL PVAs but I'll discuss that another time.

Ta ta for now, time to work on Resume and spam that shit around via bots (lol, be annoying till you hire me? haha. it might work. probably not. might be funny. might get myself blacklisted. but then i wouldnt want to work for _those_ people anyway. lol)

Comments:

Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?