Jim Bassett's Weblog

A memepool for unix geeks: sweetcode.

Sweetcode reports innovative free software. 'Innovative' means that the software reported here isn't just a clone of something else or a minor add-on to something else or a port of something else or yet another implementation of a widely recognized concept.

So, in case you need a half keyboard patch to the linux kernal, or a program to extract the original data values from a .gif image of a chart, or something else similarly obscure but possibly useful, you know where to go. Interesting stuff.

I'm very interested in people who are making weblogging software. Or is it personal publishing software? Whatever. Joel is one. His new software, citydesk, is almost complete. He has a page explaining it titled what does citydesk do. I'm not knowledgable or bold enough to make such remarks, but I almost spit coffee out of my nose the other morning when I read Wes Felter's reply to Joel's rhetorical page title: "Looks like it creates URLs with lots of digits in them."

My first version of the system we use here had an even more tortuous URL scheme. Every page was accessed through one script (draw.php3) to which you would pass a colon seperated address for the page. My page was at 0:1:14, and a subpage of mine would be at, say, 0:1:14:28. Comments several levels down would have very long strings for addresses (for examle: /draw.php3?global=0:1:14:28:3:2:18.) Not long after I started in with this system (late '99 I believe) I went to a Camworld blog bowl event at the local Bowlmor bowling lanes. I was too shy to really interact with anyone, but the one thing Cam said to me was "Oh yeah, digitalmediatree, I looked at that - what's with the funky backend?" I knew he was talking about those weird URLs. During the next rewrite I made it a top priority to get rid of all funky URLs. And I think for the most part I was sucessful.

Anyway, back to Joel's Citydesk. I was interested to hear him comment on comments.

My own discussion software does not have threading. "Threading" is technical jargon for a discussion feature where different people can branch in different directions by replying to replies. You end up with a tree of conversation. Most forum software has this feature and some people were rather angry that mine doesn't.

I first noticed the value of one-train topics using the echo community software, which is, in all other respects, excruciatingly bad. Something interesting happens sociologically when you don't have threading: the conversation is forced along one train of thought.

I've been thinking a lot about threading too. When I first added it here it was by far the most complex thing I had ever built. I had to use a recursive function, which to a non-programmer like me was a bit akin to finding a powerful magic spell. And it took me so long to discover this incantation that once I got it working I didn't want to take it out. "Hey look at this - threading!" But I think I agree with Joel in prefering non-threading discussions, although maybe for different reasons.

In my system every page has an entry in a directory table in the database. The directory table holds information about where a page is in the (virtual) file hierarchy, as well as what kind of page it is. Right now there are 3,535 pages in the directory of this site. But 3,391 of those pages are comment pages, while only 144 are "real" pages. Threading (at least the way I have implemented it) takes a big toll in terms of entries in the database. Every post creates at least one comment page, but then because of threading every comment creates another page as well. Without threading there would only be one additional page for each top level post. Changing to a non threaded system would probably cut 80% of the pages out of the directory table. No doubt this would increase system performance. I worry what will happen if I'm still using this software in a few years (despite what I flippantly said the other day about not caring too much about scaling issues due to unpopularity.)

But I won't be removing threading. It is helpful sometimes. Most notably it allows others to link directly to a page that contains just one focused part of a very long, deeply threaded discussion. But I think I will make it an option for each page owner to choose straight or threaded discussions. As long as you don't have to see the spaghetti code behind the scenes this probably seems like the best choice. No going back. But it will help us grow if we only use threading where it's really needed.

Sunday morning reading: "I stumbled out from the cabin to my truck, testing just how self conscious it was possible to be. Deeply embarrassed by the trees, so obviously belonging there unlike my stupid interloping self. What was I thinking to have come here, done this? And how would I survive the next eight hours?"

- jim 11-18-2001 4:05 pm [link] [add a comment]

Well we set the alarm for 4:00 am and hauled some blankets up to the roof to watch the show. We did see a bunch of shooting stars including at least two very large burning green ones - quite amazing. The strong ones all tended toward that green color. I'd estimate the rate at about two a minute for what we could see from our rather bright downtown Manhattan perch. Not bad. It must really be something to see this thing at full speed. I heard reports of over 1,000 an hour in some spots.
- jim 11-18-2001 3:39 pm [link] [8 comments]

Someone smashed the front window of the new basement office yesterday. Curiously it seems like nothing is gone, including a small table saw I would have thought ripe for the picking.

And either they don't like wine or they didn't open all the doors down there. Cheers.
- jim 11-16-2001 3:11 pm [link] [1 comment]

I started over the weekend, but only just now finished. I think it was the upcoming movie release that made me do it. I went back to Middle Earth. I wanted to read it again before I saw someone elses version as a movie.

The Lord of the Rings was the first long book I read (well, it's three books, but they are essentially one.) Certainly the first I was completely absorbed in, and the one with the most lasting effect. I'm not sure what year it was, but I remember finishing the last book on the bed where I slept at my grandparents house on Cape Cod. It was a summer afternoon and I could hear everyone sitting in the backyard, down below my window, under a huge old willow tree. I can feel it clear as I sit here. I loved that book. It was the first time I was ever sad upon finishing one, although not the last. I was much puzzled by this result, as I had been desperate to reach the end of the tale. I was quite young, and some of it I didn't understand, but it gave me so much. Hints of a direction when no hints had ever come before.

I haven't read it since. I think I tried once but found it rather tedious, although I can't now remember when that was or why it didn't work out. This time I read with ease and had to force myself to stop when other things needed to be done. I only read the first book, although I plan to read the next two as well. But I wanted to get through The Fellowship of the Ring before the movie comes out. I'll take my time with the rest as other obligations permit.

"What?" cried Gimli, startled out of his silence. "A corslet of Moria-silver? That was a kingly gift!"
- jim 11-15-2001 9:48 pm [link] [add a comment]

Good background on the upcoming Leonids meteor storm.
- jim 11-14-2001 11:12 pm [link] [add a comment]

Always propitious, Tom writes:

If the hot topic of the moment happens to be "Anthrax in violin varnish," then when I type those words, some crawl begins to sniff that thread - first among the bloggers I know and read all the time, then extending out to the great blogging ocean beyond. It does this without my having to tell it to. Then when I want to see what everyone has written about this topic, I click, and a cloud of threads from all the blogs comes captured in a snapshot array, duly attributed with links, inside some page or realm so that it's there, somewhat collated, just as whatever I wrote in my blog on that same topic is sniffable by anyone else.

The thing is, the words "Anthrax in violin varnish" do not constitute a unique identifier. URLs, on the other hand, almost do. That's why daypop and blogdex use URLs as the basis for determining who's talking about the same thing. Words are too fluid. Is "Anthrax on violins" the same as "Anthrax in violin varnish"? Software will be hard pressed to decide. Yet this is what humans do well, and this is why blogs are important: because they harness a mulititude of human linguistic processing units (that's you and me) to work on these very un-binary questions of meaning. Go the other way, towards full automation, and you wind up talking about XML and the semantic web. And then the whole thing dies because writing is too tedious if you have to make it machine processable.

People have to do the work. We have to be the filter. That's blogging. You have to do the crawl yourself. "...first among the bloggers I know and read all the time, then extending out to the great blogging ocean beyond..." This is exactly what happens already, without any additional technology, when you're tuned into blogspace. You're the linguistic engine. By keeping up with your own corner of the world wide web (parts of which keep up with other corners which contain parts that keep up with still other corners, etc...) you are doing the crawl. And there is no better machine to do it. Blog on.

I've been thinking that all the hits weblogs get from search engines usually don't result in the searcher connecting with the information sought because in most cases the information a search engine has seen on a weblog will already have been pushed off into the archives by the time the searcher comes along. I wish google would spider my archives and not my main page. Probably I could set up robots.txt to create this outcome, but because google ranks results based on an algorithm that pays attention to how many other pages are linked to yours, having google spider your archives (which it would see as different from your main page which most people would have links to) would probably hurt your search result positioning.

A different idea I had would be to look at the refering page when a page here is requested. If it's coming from google (or another search engine) you could parse the refering URL, extract the search phrase that was entered into google, and feed that into the search engine here to bring up the requested page, but with only those posts that mention the search phrase. Maybe the top of the page could be a standard explanation like: "I see you are looking for something specific. I've tried to provide you just that information. If you'd like to see this page as it would normally appear, click here."

At least that way all my "antrax symptons" searchers would find what they are, errr, looking for.

Oh yeah, I know, "that won't scale" but not everybody is trying to scale. Why not take advantage of unpopularity by building in more features then you could for a high traffic site?
- jim 11-13-2001 4:53 pm [link] [5 comments]

You thought it would never happen, but the AppleInsider message boards are back. Biggest waste of time on the apple flavored internet. Must resist...
- jim 11-13-2001 1:49 pm [link] [add a comment]

older posts...