Well it looks like Sony has almost done it. Here is some more info on the new Clie PEG-UX50. If this thing was a phone (with GPRS data I guess) we would have a winner. Without the phone link it's really no use to me, but I can still be in awe of this device. Bluetooth and 802.11b for wireless connectivity. Memorystick Pro (I'd rather have it be Compact Flash, of course, but this is Sony, and at least the pro slots can take 1 Gig memory sticks.) Touchscreen with Graffiti handwriting recognition, plus a QWERTY thumb board. And a built in camera. All in that slick little package.

The problem for me is that it wouldn't be connected to the internet all the time. I want the dream device to have GRPS like my Hiptop (basically data over existing cell phone networks,) but to only sign into that network if there isn't an 802.11b network in range. So if you are in Bryant Park and there is free 802.11b WiFi your device will use that. And if not it will sign onto the cell network (where you'll have to pay somehow for your minutes, but the coverage areas are orders of magnitude bigger.) Maybe someday WiFi hotspots will be omnipresent enough so that we don't need the telephone companies at all, but for now (and probably forever) we need to be able to roam seamlessly between these networks.

It's hard to figure out what Sony is doing. Their cellphone business is all tied up with Ericsson in a way that makes you wonder if the Clie PDAs (from a different part of the sprawling Sony empire) can ever cross over and absorb those features. It will be a shame if they don't because this device is otherwise perfect.
- jim 7-18-2003 9:18 pm

I see it has now been announced for the US market as well. That page has a better picture where you can see the cool clamshell / swivel screen.

$700. Ouch. Although I guess that is what something like this is going to cost. At first.
- jim 7-18-2003 9:24 pm


The list of processors in the Sony gizmo (ARM, DSP and graphics accelerator) reminds me of a conversation I had last night about the fate of special purpose hardware for MPEG (and related functions).

As general purpose processors (Pentium, Sparc, ARM, PowerPC) rise in speed and incorporate special instructions, they absorb things that were previously done with outboard hardware.

Graphics acceleration has proven to be a stable external "integration point". A couple of things contribute to that ... 1) rendering is very hard and doesn't fit very well in a general processor architecture and 2) performance expectations are continually increasing.

The special purpose Media Processor (aka DSP) may prove to be another stable island outside the general purpose processor. But perhaps not.

H.264 is 2-4 times better than MPEG-2 for video compression, but is 10-20 times harder. The "10-20 times harder" argues for special purpose hardware. But the next video compression scheme (h.265?) isn't going to be 4 times better than h.264. And the next one after that isn't going to be 4 times better again.

WIll the diminishing gains in compression science result (eventually) in stagnation in the evolution of compression standards? If MPEG-2 reigns for 10 years, will H.264 reign for 20? If so, will the general purpose processor of 2010 be more than enough to handle any sort of video?


- mark 7-19-2003 1:25 am


This is interesting stuff. You know more about it than I do Mark, but I'll try to spit out some comments anyway, at least to help myself think about all this. [Okay, this turned out way too long and rambling, but I'm just going to post it anyway. I sort of sound like I think I know what I'm talking about when obviously I don't really. I would appreciate any corrections to my broad way of framing this though.]

I think there are two ways in which general purpose processors (can we call these CPUs?) might be absorbing "...things that were previously done with outboard hardware."

The first way, as I think you are pointing out, is through the incorporation of special instruction sets. On the PPC side this is Altivec and on the Intel side this is, or was, called MMX (do they still call it that? Also I think AMD has a different approach.) Whatever the names, the idea is broadly similar. As manufacturing process size shrinks, the same circuits can be fit into a smaller and smaller physical area. This means that given the same size CPU chip area, each successive generation of chip design takes up less actual area on the wafer. This free space can then be used to bring other circuitry right onto the CPU itself. You could bring the memory controllers on board, or add more L2 or L3 cache, or add these DSP like instruction sets. You win here because of integration (every chip you can get rid of through consolidation means lower total system cost,) but perhaps even more so in terms of speed. Bringing this stuff onto the CPU die means that data doesn't have to be moved over the system bus (as it does when you are shuffling it from the CPU to an external DSP.) The G4 is a great example of a monsterously fast CPU that is crippled by a slow interface to the rest of the system. Integrating everything onto the chip really helps in this case.

I think this sort of thing will continue. We'll see CPUs get more and more complex.

But the other way you might approach this problem is with software (although the difference between hardware and software does start to get a little fuzzy.) In general, if your CPU is fast enough, you can just emulate the special DSP circuitry with software running on your general purpose circuitry. Software defined radio is my favorite utopian project in this direction. Instead of integrating 802.11a, b, g and Bluetooth chipsets into one uber wireless chip (where the circuitry for each flavor exists side by side as discreet circuitry on one tightly packed chip,) you just employ a fast enough general purpose processor and then let the software layer above that general hardware emulate, in turn, the specific circuits it needs. I guess Transmeta was going in this direction too with their CodeMorphing software layer (so their chip could appear as an X86) although I don't know too much about them.

This second approach has some real theoretical appeal. With a fast enough processor, anything becomes possible, and more importantly, you aren't locked into any particular architecture. Don't want those wireless protocols? Okay, no problem, just download the GRPS software and now your chip appears to be a GRPS unit. Same thing for any future advances - they are all just a download away (again, provided the very big IF of your processor being fast enough.)

So far, this second approach hasn't faired too well in the market. It's like it's an interesting question for academia, but if you want to make a chip, today, that is really fast at video encoding, you are going to build a dedicated chip for that purpose. Whether it becomes absorbed into the CPU, or stays on it's own chip doesn't really matter too much for how I think about things (the only difference is how fast of a system bus you have, right? With an infinitely fast and wide system bus everything is part of the CPU.)

What would be a big deal is if you could design the encoding hardware (or whatever your task was) using computer tools, and then not ever have to fab the chip. Just download your plan onto the super fast general processor which then runs it as an emulation layer.

This would mean chip makers could get huge volumes on one general purpose design (make it general, and make it really fast) and then smaller players could actually build "circuitry" on top of this using emulation instead of actually producing silicon in multi billion dollar chip fabs.

But I don't really see it happening that neatly. I'm just trying to get my thinking straight about what the different approaches are - at least in the abstract. Real life will probably continue to be a strange mix of all these things.

[Switching gears...]

As for video compression, and the possibility of stagnation due to fast enough processors - perhaps bandwidth will be the new constraint that drives compression technology. In other words, my G4 is very fast at encoding / decoding, but then if I want to stream that video over the web, or over 802.11x, I still need a small stream. So I still need compression. And if I'm a big server of bits (think of the iTunes music store) then I'm really interested in this sort of thing. Wouldn't it be great (from the business point of view) to do a custom encoding of every song downloaded on the fly? This way I could get a UID watermark - maybe a hash of the persons credit card - into the song itself. But they can't do this now because their are too many downloads, coming too fast, for the hardware to do it on the fly.

- jim 7-19-2003 7:53 pm


I'll dive a little deeper into the CPU vs. DSP vs. special purpose HW question by looking at a specific computing problem. I'm going to focus on motion estimation (which is a critical step in video encoding), but there are other problems of similiar complexity (e.g. polygon fills in 3D rendering). This goes into a lot of detail about the motion estimation process, but I think that detail helps illustrate the computational issues.

Video compression is all about elimination of redundancy. There are many types of redundancy, including temporal. What I mean by temporal redundancy is that any one video frame usually looks very similar to those that precede and follow it. What you'd like to do is reuse that information rather than sending it again. If you could tell the decoder "I sent you a picture of a car, please reuse that", then you wouldn't have to send those pixels a second time. But if the car has moved, you'll have to send a "motion vector" along with the code to "re-use those pixels".

To take advantage of the "re-use those pixels" code, the video encoder must analyze video frames by comparing those frames to previously sent frames. The encoder looks for matching blocks of pixels, and computes the relative offset (motion vector) for matching blocks. Turns out this is very compute intensive -- many, many billions of operations per second.

The block size in MPEG is 8x8 or 256 pixels. To compute how well a block in the current frame matches a block in a previously sent frame (reference frame), a sum of absolute differences (SAD) is calculated. This means I calculated 256 differences, take the absolute value of each of those differences, and take the sum of those absolute differences -- 512 operations (ignoring the abs. val. operation). Low SAD means I found a good match. But I need to calculate the SADs over a large search range in the reference frame to find the best match.

After I compute the first SAD, I move over one pixel in the reference frame and computer another SAD, and move over one more pixel in the reference frame and computer another SAD, and so on and so on. The 256 pixels in the current block in the current frame are used in a whole series of SAD calculations ... hundred or thousands of SADs. And the wholes series is repeated for hundreds or thousands of blocks in the current frame. For full-resolution standard definition TV, it works out to about 10 billion operations per frame just to do motion estimation.

The problems has very particlular data flow characteristics that allow special purpose hardware to operate very efficiently. In the series of SAD operations performed on a particular block, most of the operands stay the same. The 256 pixels in the current block are used in a few thousand SAD operations. In each SAD operation in the series, most of the pixels in the reference frame are the same. For each new offset within the reference frame, a few new pixels are introduced and a few old ones are dropped. (Think of scrolling within the reference frame.)

General purpose CPUs have difficulty taking advantage of the fact that the data is mostly the same. They simply don't have 512 registers inside the processor. They do have caching capability and can keep the data "nearby". The MMX approach adds special instructions (such as SAD) to help, but it's not optimal.

DSPs are designed to execute algorithms that have large data sets. They tend to have hundreds of registers, and a large number of SIMD instructions. (SIMD = single instruction, multiple data, which means "do this one operation to dozens or hundreds of operands.)

But ultimately, special purpose hardware for motion estimation blows away either CPU or DSP approach. I can lay out hundreds of data registers and math units on a die. I can connect data paths to these registers and math units that mimic the precise data flow needs of my algorithm. The resulting chip may be lousy at general problems, but will be a finely crafted machine that will outstrip a Cray when performing MPEG motion estimation. But sadly, a finely tuned MPEG-2 motion estimation machine is useless for H.264 motion estimation.

The SW emulation concept that you describe really doesn't help. The effectiveness of a particular processor to solve particular "hard problems" comes down to the architecture of the register sets, caches, ALUs (arithmetic logic units), data flow, etc.

A form of emulation that would help is programmable hardware. If processors came with standardized programmable logic arrays, then a program could carry the specifications for an "accelerator" to hang off the side of the CPU. This accelerator would have the data pipelines, register sets, ALUs, etc. configured in a manner that would streamline the performance of a specific algorithm.

At present the "programmable HW accelerator" is a concept that has been used in some computing platforms, but it hasn't really caught on widely. Linux Networx has talked about having programmable HW arrays in their system. (They make supercomputers that consist of arrays of Intel/AMD processor boards.) It would allow the system programmer to dump some of the computation off the Pentium and onto programmable HW.

[Switching gears ...]

On video compression algorithm front, compression efficiency is the driving force. The goals include ... fitting an HD movie onto a single DVD, or sending multiple channels of normal TV over a DSL line, or improving the quality of the postage stamp streaming video sent over dial-up, or sending streaming video to a mobile phone. But it is a science of diminishing returns.

It's all about removing redundancy. MPEG 1 did a decent job of removing redundancy. MPEG 2 did a better job. MPEG 4 did a better job again. But each time, there's less and less redundancy to remove. H.264 does a fabulous job, and it's only 2-3 times better than MPEG-2. If a 2003 model CPU was 2-3x better than a 1993 model CPU, you wouldn't be very impressed. But with video compression, that level of improvement is quite good.
- mark 7-19-2003 10:41 pm


For work i carry (or should carry, I generally forget one or more items) a camera, a cell phone, a GPS, a compass (a GPS can't tell directions if you're standing still), a laser rangefinder, a tapemeasure (a GPS, at least mine, isn't submeter accurate), various botanical taxonomy keys, and so on. A tricorder sure would be nice, but then I would have to do that much more just to stay a step ahead ....
- jeff 7-21-2003 8:30 am


Hey Jeff. It seems like GPS is coming to consumer electornics in a big way. This Ricoh digital camera has an add on GPS card that will stamp each picture with the coordinates where it was shot. That's pretty cool. Others manufacturers will be forced to follow.

Bsquare is supposedly developing a GPS unit on an SD card (these are tiny expansion cards - postage stamp size, so much smaller than PC Cards - that can fit into things like PDAs.) If true this will allow manufacturers to add GPS to almost any handheld device.

And Stockholm police are apparently giving GPS enabled cell phones to women in potential danger (from abusive husbands, etc...) Obviously this sort of location technology for cell phones (whether done through GPS or some sort of triangulation from antenna towers) is very badly wanted by the US government (here's a wired article or just search google for E911.)

In a bigger picture mode it may well become important for wireless devices forming ad hoc mobile networks to know where they are on the globe to facilitate routing traffic. So the dream handheld device would probably have to have GPS built in even without considering the other cool uses (like location stamping photos) or potentially uncool uses (big brother like tracking and / or location based spamming.)

What's the difference between a tape measure and a laser rangefinder? Probably better just to have a submeter accurate GPS, although I've heard that the US is thinking about making the present civilian GPS even less accurate. On the other hand the Europeans are talking about launching their own system, so maybe the competition will drive accuracy.

So I think you'll get everything you want except the built in taxonomy keys. But that can probably be delivered with all other information over the web (if I'm understanding what a taxonomy key is.)
- jim 7-21-2003 6:32 pm


Back to the Sony Clie.

Since this thing has Bluetooth, could it connect through to the internet through a Bluetooth equippedequipped cell phone (where the cell phone has some sort of data plan?) Seems like that should work, although it's not nearly as slick as having it built in. And it kind of screws up the "I only want to carry one device" thing. Still, it might be a compromise I could make if it really worked. And really working comes down to whether or not I could just keep the cell in my pocket (or bag) and have the Bluetooth connection "just work" without having to mess with the cell phone each time.

This is the sort of thing Bluetooth was designed for. The "Personal Area Network" (PAN). Like a Local Area Network (LAN) you might have (either using ethernet cables or 802.11x wireless) connecting the computers in your home or office, except a PAN covers very much shorter distances. Like just enough to connect all the devices you are carrying on your person. Because of this small coverage area Bluetooth can take very little power which is its main advantage over 802.11x.

But again (and this is why I don't really like Bluetooth) I don't really want a PAN as much as I want all the functions that might reside on such a network to be integrated into one device. But in a pinch...
- jim 7-21-2003 6:38 pm


So the Clie does have an ARM CPU, but it's been heavily modified by Sony. They are calling the result the Handheld Engine.

The chip combines an ARM926 processor, a digital signal processor and a CXD2230GA graphics accelerator that uses a two-dimensional graphics engine to produce three-dimensional images. Yoshida boasted that he wouldn't be surprised if competing PDA makers asked to use the chip, although he wouldn't say whether Sony would license it to them.

Kort said the Handheld Engine was a technical breakthrough in the PDA market.

"Thirty frames a second running on a 120-MHz processor, that's pretty impressive," Kort said. "They are ahead of everyone else in that regard."

- jim 7-22-2003 5:12 pm


Speaking of convergence, according to the SJ Merc, Wozniak has a new use for GPS ... finding your keys.
- mark 7-23-2003 8:05 am