|
|
Index: Home | What Is Izumi | Misc Links | Random Thoughts | Too Much To Read | The Rant Vault | Quotes Dev: Projects | Ideas For Dev | Nerdkill | Rig | Hint
This can be seen as "exploratory" ideas regarding Izumi, it is not a log of whatever features are necessarily being added to Izumi for real.
$Id: Izumi.izu,v 1.1 2005/11/05 21:55:57 ralf Exp $
Fixed a minor bug in the RSS export: accents where improperly encoded as HTML entities.
The fix is available on version 1.1.1 on CVS. I won't create a packages for this, it will go in the next 1.2 version (whenever that is.) So if you really need that fix in a .tgz or DEB, simply ask me :-)
Encoding HTML into XML for an RSS feed... The operation feels weird. From a logical point of view, it's a simple exercise, albeit a verbose one since both XML and HTML uses the same encodings for & < > " and '. It still feels stupidly verbose though.
Anyway the real problem is getting the end-user (in this case me!) doing the proper thing. One of my RSS feeds broke since I used an accent and errors accumulated around:
Bottom line: bleh.
Too obvious, I should have though and done that earlier: I need highlighting spec files for Vim, Emacs, UltraEdit, Scintilla, etc. (there's probably more editors out there -- I don't plan on providing support for all of them, I can just think of these have having highlighting which can reasonably be implemented easily.)
20050405 Update: Vim syntax file available on sourceforge.
I implemented the blog entry anchor for internal Izumi links. The exact syntax and available variants are:
[title|dir/page#s:date:optional-blog-title] [title|#s:date:optional-blog-title] [|dir/page#s:date:optional-blog-title] [#s:date:blog-title-not-optional-here]The idea is that after the # you can have exactly what appears in a section tag (i.e. [s:date:title] or [s:date] without the brackets). So creating an anchor is natural (i.e. using #) and easy (i.e. just copy paste the tag's content.) Note that the link title can be optional. The page name can be optional too, in which case it refers to the current page. There are a couple of limitations:
Incidentally this brings Izumi to version 1.1. I'll upload a debian package as unstable on Sourceforge soon.
I need to change the way I link to internal permalinks. Right now I simply copy-paste the full URL of the permalink. The reason I do that is because I can't manually determine the MD5 key of the permalink so I lazily copy-paste what I see in a browser. This is not optimal as the URL is absolute. And I can't write a link to a permalink before I viewed it in a browser.
OTOH I support a link syntax based on the pattern:
[title|dir/page#anchor]So what I could do is extend the link syntax, for example:
[title|dir/page##date:optional-blog-title]The "date" and "optional-blog-title" would need to be the exact info written in the [s] section of the blog entry. Internally I just need to recompute the same MD5 key and write the link on the fly.
('partly crossposted from Ideas For Dev'')
OK this is really cool:
I might want to use that in Izumi for links using rel=next and rel=prev in permalinks generated from a blog. rel=index would be nice too to point to the master page.
It's a bit tricky to generate the link to the next blog entry as this information needs to be written in the header. Right now the processing is mostly linear (i.e. parse master file and write entries as they appear with an extra step to generate next links at the end of previous blog entries.)
A solution might be to prescan the master file to identify all sections, then read them back (I could keep the file position on the section.) This may prove like a good solution if I also want to do the feature where I create a "what's new" blog page that basically mixes all entries from several sub-blogs. In this feature I was merely thinking scanning the N first entries of the sub-blogs, order them then generate a blog page on the fly using these entries.
Originally I also wanted a mechanism to remotely update the content base. Well I found the easiest way to go is simply use CVS and on the server have a cron job that does a CVS update every few hours.
Obvious idea: rename prefs.php to pref-site.php and pref-local.php. Also have an "location-site.php" for the location site paths (i.e. where to find the pref-site namely, as it would be better off in /etc/izumi on Debian with a symlink from /usr/share/izumi/site, and not the reverse.)
A Debian package for Izumi is now available for download :-)
Rewording the Izumi Text Syntax page for packaging.
Made a Debian package for Izumi 0.9.3. As soon as documentation is acceptable, I'll publish this package.
Added missing support of ETag and If-Modified to the RSS page, as well as the master .blog page (it was there but not activated due to RBlog overloading the path from RPage and not checking the cache after altering its path.)
Tip: Use Ethereal to debug request/response headers.
Firefox seems to ignore the ETag/If-Modified of the RSS page. Or maybe it's just me. Need to check with other "real" aggregators such as SharpReader and Rss Bandit.
Related link: HowTo RSS Feed State
Also added simplistic support for tables.
RSS feeds are now available for each Izumi Blog page. An "[RSS]" link is present at the bottom. There's also a <link ref> tag in the HTML page for aggregators to discover the feed.
Implemented a basic RSS feed support. When the main page of a blog is created, an RSS feed is also created which can then be served by adding a s=rss query to the blog page. Currently the RSS feed contains Izumi's raw syntax data, which is not appropriate.
Implementation of the blog is pretty simple: a RBlog class is introduced, which is activated by a given Izumi tag in the Izumi master file. This class is responsible for splitting the master file into blog entries and blog archives, each one being a single Izumi file which are then rendered using the standard RPage class.
Izumi suffers from a dual usage: as a blogging tool (short daily entries) and as a web site CMS (Content Management System). This wouldn't be so bad if having monolithic text files were an inconvenient for blogging.
The way Izumi works, one text file makes one web page. Pretty simple, obvious and manageable.
When it comes to blogging, each file is a category. The problem is obvious: files grow larger with time. For example this current file is 81 kB and it generates a 86 kB HTML file.
The alternative is to have one entry per file, or maybe just one file per day. Categories could then simply be directories. That would work, except it wouldn't be as convenient for me to manage. I like to keep the files open in my text editor and quickly browse thru them to refer to old entries. I would have to create a new file for each entry and add it to the project list (yes, it's a programming editor :-p).
Yet another option would be keep monolithic files per categories for the blog entries, with a clear distinction between blogging master files and "normal" CMS files. Then a script or a daemon on the server could simply split the blog master files into a multiple per-entry files.
Of course then I need to update Izumi to provide the expected blogging user experience:
Two approaches to do that:
The later option is easier to write (typically a Perl script), yet it is harder to manage and is definitely not portable (think IIS+PHP). Having it in PHP means the code is closer to Izumi, easier to keep in sync, and will work with anything supporting PHP. Processing time can be an issue.
In the master file, there should be a specific tag to delimit an entry, typically something like [entry|date|title...] (the date should be mandatory in an entry, the title could be empty). Another useful tag would be to create a reference to another entry, that is the tag should automagically be replaced by a link to the permalink to that entry.
Which brings the problem of how to generate the permalinks and how to reference them painlessly. The permalink could simply be the entry date mixed with the title, either simply concatenated as in "dev_20040608_izumi_howto_blahblah" or more cryptic but more simple just use a CRC32 or an MD5 on it.
Relying solely on the date and the title is a problem. Several entries can have the same date yet either the same title or (more frequent) no title at all. What about using the index of the entry in the master file? It's a problem too as I want to be able to put them in any order or move them around, etc. That means the user will need to provide a unique id; typically I'd like to see an anchor-looking tag in the title. The anchor would not be displayed on the result page, only used internally to compute the permalink. If there's no anchor, the title and date must be unique. Otherwise, entries with the same date and title will conflict (they can simply be concatenated together.)
Note for a second I though about an MD5 on the content of the entry, but that means the permalink would change if the entry was edited! Not good for a permanent link. OTOH it may be useful to store the MD5 of the content in the split files, in order to regenerate them when the master entry changes. That's because another obvious issue is that only the master file date time stamp changes, it's not easy to compute which entry changed without reparsing the whole file...
How to reference the permalinks in Izumi files? One possibility is to simply copy-paste the link from the generated web page. The problem is that it would be a full URL with the server name and such. So the Izumi source would be useless if the server name or location changed. Currently Izumi can use anchor links with the explicit restriction that the anchor name must be all letters. This could be used to say that a permalink always start by at least one digit and then simply use the referred category and the permalink anchor name. Of course it would be just fun if the permalink looked like "0x42D5A912" :-p Where do I get the permalink from? From the generated web page.
Two more things to close the subject.
Generated blog entries must be considered temporary files, right as the HTML files generated by Izumi are. A master blog entry would be split in many smaller Izumi files in a directory named after its category. Consequently I'd say why generate temporary Izumi files when I could generate the final HTML directly. Each HTML should be a <div>, not a <body>. We want to concatenate many of them to generate this week's page, etc.
This brings the issue of the parsing strategy for the master file.
One strategy would be to scan the file for entry tags and keep a list of them with their offset. It is thus obvious to get the length of each entry too. Assume we have an index file for this master blog file. It contains the list of entries with their date, title, offset, length and checksum (CRC32 or MD5 typically). By comparing this list with the one reconstructed from the master file, it is obvious to determine which entry has changed, been added, deleted or left unchanged. Note that the order should only matter per day to display items. On disk, they won't matter. Another note: parsing the master file can be done in two phases. First getting all the tags and simply by comparing the date/title (which as mentioned before is unique) and the length of each entry with the ones from the index, we can quickly determine if any has changed. Then for the other than seem identical, a more thorough checksum is needed. One may argue immediately that the checksum will need to be stored in updated entries anyway...
Where do I store this index? Three possibilities: in a database (DBM, etc.), in each temporary file (an HTML comment for example), or simply in the temp file name. Once again the file system as the database is the cheapest solution.
BTW once I have such a system, Izumi can really be both a CMS and a blog management system (BMS). And with separate entry files, it is a lot easier to generate RSS streams.
(Extracted from Ideas For Dev)
I've been playing with both LiveJournal and Blogger recently. I mean simply blogging into a free account and examining the various client (Windows, X11) and protocols.
Which brings two kinds of interesting remarks:
Let's start with the former: Izumi is mostly a simplified web site generator ("web content management system" is the modern name for it) that dynamically renders "rich" html web pages from text files (using my definition of rich... mostly formatted text with enumerations, bold, italic and sometimes an image or too... basically HTML 1.0 lynx-compatible :-)). And it uses a wiki syntax. And I can hand edit all my files using any text editor ranging from Notepad to Emacs, UltraEdit or BBEdit. The idea here is that I dislike having to mouse here or mouse there to apply text formatting when I can do it with a simple keystrokes a ''la'' wiki.
But lately... hmmm no, not lately, actually from the beginning, I've also been using Izumi to maintain what looks like a web log: reverse-date ordered chunks of texts, classified by date, exactly like this. I still get the editing comfort I'm looking for, but I don't have the dynamic aspect of a blog web page where I actually expect features like archives, calendar based search, seeing only the last N posts, having permalinks per posts, etc.
In this context, traditional web logs are definitely better than Izumi, even for me. Heck, I even like the nifty icons of desktop clients such as w.bloggar or LiveJournal's Windows client. (Of course, I could create such a client myself for Izumi, but since I prefer the text only edit mode, what's the point?)
An idea would be to mix both: consider an Izumi page as a text file storing one blog category with a fixed format (an izu-tag can be used to define a post with title, date and optional attributes). On the edit side, I can still edit the page using a text editor, or I can create a tool that will allow adding and editing a single post. On the server side, I can provide fancy sort/archiving features as I see fit (similar to the idea of automatically creating div/javascript sections in an Izumi page). On the server side I can also add a blog api support (AtomAPI comes to mind).
Good.
(Extracted from Ideas For Dev)
[...] On a different register, I need to change the What Is Izumi page. It's a mess. It contains the project description with milestones' (lack of) planning. Instead it should just be a generic description page with what it is, why, how and where.
That means I would be left with no place to write about prospective ideas for Izumi. Well as a matter of fact, no, this page is all about "prospective" ideas (read: vaporware).
Which leads me to the purpose of this writing: Automatic section generation in Izumi pages. Let's take the example of the What Is Izumi page. It more or less has sections: what, why, how, when (also called goals, why reinvent the wheel and implementation, milestones).
Now when generating the page, I just need to place every section in a <div>, visible by default. Then using a bit of JavaScript, I can have all but the first div hidden at first. A section list, either on the side or on top, can be used to show some section and hide all others. All it takes is some minor harmless JavaScript and very basic CSS. Non JavaScript browser (or those with it disabled) will simply show everything, just as before.
Of course another approach would be to generate the pages with the single visible section on the PHP server page by adding a query argument. It has the advantage that only the visible section is downloaded so it should load faster. The obvious side effect is that clicking on next sections will require a download of them, so it will require one more trip to the server.
The JavaScript/CSS approach has the advantage that switching section will be as fast as the browser can hide/show a <div>. The obvious side effect is that loading will be slower the first time (but hopefully the browser will start showing the first section right away). One of the advantages I like with this approach is that the page can be saved locally and still have all its sections browsables.
Implementation-wise, I'd probably have to scan the whole document first in order to list all sections, assuming the section header appears at the beginning of the document. Or it could be at the end, with a <div> being forced to appear on the top of the screen -- hmmm nah, that wouldn't be too flexible if later I want to introduce templates and have it, say, on the side. On the other hand, scanning the document to get the section tags might be easy and fast and yield a better end-user response time than blocking all the http output till both the document and its sections have been rendered. We'll see. It's just an idea after all.
As for the section tag, I'm thinking simply [s] and the rest of the line is the section title. It might be a good idea to automatically remove the ---- that I put before sections.
|
|

This work is licensed by Raphaël Moll under a Creative Commons License.
|
|
| Color Theme: | Gray | Blue | Black | Sand | Khaki | Egg | None |
|
|
|
|