This is kind of silly, but I wanted to see how hard it would be to write. If you want to see a unique list of every word you have ever used on your page (sorted by the number of occurances, in descending order) you can do so at www.digitalmediatree.com/word_count.php3?global=x:x:x: (where x:x:x: is the page global location, for instance, this page is 0:1:1:) Here's my page, for example: jim's page.

(please note: this is highly unpolished. The script makes certain crude assumptions about what a word is, and other crude transformations on the html so that no unintended html gets into the print out. It just strips out most punctuation. Not so pretty, but still slightly interesting. I could make it better, but it's functional enough in this state for what I'm interested in.)
- jim 11-02-2000 11:10 pm

jim can you do that for the sustanance page and link it here??
- Skinny 11-03-2000 2:17 am [add a comment]


  • Here is Sustenance. This doesn't count the comments, but we could easily add that. Just add &comments=on to the URL, like this (same for any page now.) Ask me if you don't know for a page, or here is bill, nola, arboretum, group.
    - jim 11-03-2000 5:36 am [add a comment]


    • 14 love on yours--zero on "mine"
      - Skinny 11-03-2000 5:55 am [add a comment]



Now the word count total for each word is linked to the search engine which will return the posts in which the word is used if your curious about checking (I used 'love' 14 times? Where?) This actually uncovered lots of weaknesses in the search engine. I've patched most of them, but I see now what a big problem it is. What's a word anyway? Is it just any string of characters delimited by spaces? What about punctuation (is 'here' and 'here,' the same word?) or what about html (is 'here' and '<i>here</i> the same word?) It seems like they should be the same, but if so, you need to teach the system a lot more than just that a word is any string of characters delimited by spaces.
- jim 11-03-2000 3:27 pm [add a comment]


  • Keep in mind that the word_count script is only counting word occurances on the top level, while the search engine finds occurances on the top level, plus any in discussion pages below. So the search engine might return more hits than word_count. Also, it might well be flakey in other ways.
    - jim 11-03-2000 6:33 pm [add a comment]






add a comment to this page:

Your post will be captioned "posted by anonymous,"
or you may enter a guest username below:


Line breaks work. HTML tags will be stripped.