This will only be interesting to people posting here (unless someone wants to tell me how other blogging software deals with this issue.) [update: sorry had an error in the block quote. Fixed now.]

HTML entites:
Character entity references, or entities for short, provide a method of entering characters that cannot be expressed in the document's character encoding or that cannot easily be entered on a keyboard. Entities are case-sensitive and take the form &name;. Examples of entities include © for the copyright symbol and Α for the Greek capital letter alpha.
These are useful. Here's a list of Latin-1 entites. Here's a list for symbols and greek characters. Here's a list for special entites. In those lists you should be looking at the two 'entity' columns (the first is what you put in your post, the second is the character that will result when viewed in a web browser.)

Great. The problem is if you use HTML entites in a post, and then go back to edit the post, when the system puts the entity into the editing text box in your browser, it displays the entity, not the code for the entity. In other words if you make a post with > your post will display the greater than symbol: >. But when you go back to edit, instead of seeing > in the editing box, you'll just see > which isn't what you want (because you'd have to change it by hand back to >)

I wonder if that's clear? Anyway, the work around for this is non standard, but will work on all pages here: insert an underscore after the ampersand. So instead of © to make a copyright symbol, you should use &_copy;

Thanks to Bruno for finding some problems in my first implementation of this.
- jim 4-26-2003 7:38 pm

Very interesting, thanks!
- tom moody 4-26-2003 8:22 pm


i think your workaround is a legit answer. the only way to deal with this is to setup some rules as to what HTML is legal and how the user must edit the HTML. you can then write the inbound and outbound encoding to support those rules. that's how most wiki webs handle the issue.

in an ideal world the user would have a better editor than a text box and wouldn't need to worry about formatting issues.
- drice (guest) 4-26-2003 9:34 pm


Hi dave. Thanks for the feedback. Like you say, I just transform my hacked html entites back into regulation style on the outbound side.

My friend Chris is trying to get a project together to write a full featured text editor in Flash. This would solve the problem, but I just can't get too excited about a Flash project. And in any case, I can't be of any help because I haven't even looked at Flash since it was 1.0.

But I wish someone would do something. Even just a stand alone XML-RPC enabled text editor that could post to the blogger API. Back in the day I used to think that "Mozilla as application platform" was going to be the way all these problems got solved. Ha!
- jim 4-26-2003 9:41 pm


OK, as usual, there was a problem with my implementation. These modified HTML entites were not displaying correctly for account holders viewing /newcomments. This should be fixed now.
- jim 4-28-2003 5:13 pm





add a comment to this page:

Your post will be captioned "posted by anonymous,"
or you may enter a guest username below:


Line breaks work. HTML tags will be stripped.