Jim Bassett's Weblog comments

Computer programs can be very good at textual pattern matching, but they are very bad at semantic matching. Finding every occurrence of 'rose' is no problem; finding every expression of love is impossible.

Given this, might it be the case that since information on the web is largely found by computer programs (like google,) will the web exert pressure (realized or not) on writers to standardize (fossilize?) their use of language?

In other words, will our dependence on google as a means of having our writing discovered by people who are looking for just such things, exert a pressure on us as authors to use language more uniformly? Or, again, will something like the semantic web emerge, not through marking up our writings with XML tags which specify what we "really mean", but through a general shift towards always using the same word or phrase for a single idea?

You might think of this as the emergent semantic web. Or the bottom up semantic web. But - and this is the point - you'll have trouble finding all documents on this or any other subject unless we stick to one name or the other.

Will this be good or bad for language? And for humans?

back to Jim Bassett's Weblog

I predict that the small minority who write the search engines will adapt their tools to the vast majority who don't understand the inner workings of search technology. Part of the motivation is a counter-measure war between people who run the search sites and people trying to game the search engines.

Now if I could just find that article I read the other day that mentioned the name of a special assistant to Bolton.
- mark 5-09-2003 3:31 am

Well you're probably right, as it's always dangerous to bet against technological progress.

But I can't help thinking that the problem of writing software to understand meaning (as opposed to just matching patterns of strings) is intractable. And the proposed solution - marking up all content with rigourously pre-defined XML semantic tags - is hopelessly unworkable. So this leads me to wonder if instead of making computer software more like the human mind (human language using mind,) we'll somehow morph our language using patterns to make things easier for our software. I mean: we'll become more like machines.

I know I already do this a bit in the sense that I think (a little bit, whether I mean to or not) about google search queries as I'm composing a post. I'll choose to use one word rather than another with the thought that the one particular word will be more effective in terms of someone finding me through google.
- jim 5-09-2003 7:31 pm

Agreed. Keeping words together rather than spreading them throughout a sentence; checking google to see how certain new terms are being spelled or worded and then going with the "consensus"--these are just a couple of ways writing is affected. I wouldn't say I'm becoming more machine-like; just learning a new syntax with machine-made glitches rather than cultural ones, to reach like-minded people. It's an interesting posthuman hybrid.
- tom moody 5-09-2003 7:49 pm

Yesterday as I cross-linked a few parallel blogs together via comments, the synamptic nature of blogs struck me. However, unlike most associative memory/knowledge systems, blogs have google, etc. to provide an alternate access method. The alternate method isn't quite orthogonal, as google uses inbound links as part of the rating system in their searches.

The number of viable search methods is likely to grow ...

word/phrase match seaches
associative networks searches
keyword (xml, etc.) searches

Meaning based searches is a tough nut to crack. Have you used translation sw?
- mark 5-09-2003 10:01 pm

Language translation software? Like bablefish? I've used that one to often humerous results (although in a pinch it can be very helpful even with the numerous errors.) But I've never seen any source code behind this sort of thing. I'm guessing it's much more brute force than AI. Something like a dictionary, but at a sentence fragment level instead of a word level. This would mean a huge dictionary, but fast computers are good at such things.

Anyway, I don't really have any point here. I like Tom's "interesting posthuman hybrid" phrase though.

And, also, I'm wondering if this topic sets up any L(R)J angle? My guess is that the sort of one word to one meaning mapping I'm setting up here as a possible result of the web and google is not what she has in mind with her (badly paraphrased here, no doubt) idea that words really mean something specific. I guess I'm sort of hoping it might give me a way to connect with (either by agreement, or by being 180 degrees out of phase) the question she is trying to answer. I'm still amazingly unable to get any traction with her writing, and not merely from lack of effort.
- jim 5-09-2003 10:39 pm

(A long-winded comment on web searching inspired by my Hizbulla Hezbolla searches.)

The web is a bit like Star Trek: No matter where you go, everyone speaks English -- and American English at that.

That's not entirely true. By this stage of the 21st century, I'm sure there are rich web cultures in many languages. But cross-linkages between these web cultures are rare. Again, that may be my anglo-american-centric view. Perhaps the Poles, the Turks and the Danes are involved in a rich tripartite crosscultural weblog exchange, but I have no access to it.

Meaning-based searches would be nice, but I'm not satisfied with mono-lingual networking. Cross-language web browsing is what I'm talking about. Today, it's found in isolated pockets. A couple of sites translate Turkish press for an English audience. Lebanon is a rich source, with Arabic, English and French sources, with access to translations of a few key articles. I use English language sources in Hong Kong, India, Pakistan and Jordon.

But this cross-cultural work is primarily the manual work of skilled (or marginal) human translators, or the work of writers who truly span two cultures. The coverage is spotty. Browsers and translators are confused even by simple transliteration. I want to access AFP articles carried in the French press. I want to access all of Ha'aretz, not just the subset carried in the English edition. I want access to news and opinion from around the world with the same ease with which I can find Britney Spears' life story. <----GOOGLE HERE

The quality of a network is about the quality and density of the interconnect. The web/blog space seems to be fragmented along lines of language, culture, demographic and mind-set. There are sparse, fragile cross-links to tie these disparate networks of homogeneity together.
- mark 5-10-2003 8:57 am

Here's an interesting comp.ai usenet thread Jorn Barger started about the semantic web (and it's nonexistence in AI discussions.) Most of the responses there are close to what I feel - XML as the semantic web of Tim Berners-Lee isn't going to happen. (Although some of the respondents sound pompous enough to make me want to argue the other way.)

And, from the other direction, here's Ftrain's Paul Ford outlining his new site software. This is a great introduction - from someone actually producing working code! - to what the semantic web might look like (behind the scenes.) I love the way he thinks.

It's interesting to me that I don't like reading the people I agree with, but I'm fascinated with what Paul Ford is doing, even though (for myself) I've decided that this direction won't work. Go figure. In any case I'm keeping my eye on what he builds. If anyone can get XML metadata to work in a humane (for writers) way it's him.
- jim 5-13-2003 8:29 pm

[home] [subscribe] [login]