|
|
Index: Home | What Is Izumi | Misc Links | Random Thoughts | Too Much To Read | The Rant Vault | Quotes Dev: Projects | Ideas For Dev | Nerdkill | Rig | Hint
$Id: RigII.izu,v 1.12 2007-06-04 20:04:20 ralf Exp $
20050122 Warning: This project is indefinitely put on hold.
20050225 Update: The design is not too bad but the project doesn't bring anything useful real to the existing RIG version. As such it is useless and will not be developed.
20050625 Update: Repeat after me: "RIG is not dead."
(Notes are given in reverse chronological order.)
This could almost be labeled as "Specs for RIG 2.0", except quite frankly there's nothing new I haven't though about before.
So it would go somewhat like this:
Each service has its own UI (one or more web pages for display), its own options, etc. and is solely responsible for what it does and how it does it. A blog for example could be based on data accessible via Izu files -- the two services would show the same data but in radically different ways.
The idea is that all services would share a common architecture provided by the framework, allowing the creation of a new service to only focus on the "what" and not the "how".
The UI should fit on a web page, composed of block elements.
Typically each service or media handler would provide a "master page" that defines the
layout and then different blocks that implement the elements within the master page.
For example a photo listing page would have the following blocks: header, footer, options,
extras, and the main block would be the photo thumbnail table.
Each block would have some kind of id.
Themes would be created that contain alternate representations for the layout page and
the individual blocks.
The layout of a page could be governed either by CSS/div or by tables.
Auxiliary services exists, these are optional addons services that integrate into one of the existing services. For example:
Some details need to be made explicit:
In regard to the idea of an underlying framework, some generic mechanism for configuring a bunch of variabes and storing options should be provided. Then each service can either use the underlying mechanism or override it itself.
A unified URL scheme could be used with a single entry page, for example:
service.ext/urlor better:
index.html?q=service/arg1/arg2/.../argNsuch as:
/blah/th/29/49/travel/00-image.jpg => creates a thhumbnail of 29x49 pixels /blah/dir/travel => directory of "travels" /blah/blog/travel.izu /blah/view/travel/00-image.jpg => view a full image /blah/admin/travel
Note that using Apache's rewrite-url, it should be possible to have this url:
/blah/service/optionsautomatically be transformed into:
/blah/index.ext?q=service/optionsThis only require some configuration attributes in the framework to be able to format the output URLs are desired for the rewrite-URL. The only trouble with that is understanding how to make cookies work.
Speaking of cookies, instead of transmitting a cookie with data to the client browser, it may be easier to simply provide some kind of unique number and then keep the user prefs locally in a cache.
(20050918: Obsolete goal description moved into research notes.)
The purpose of RIG 2.0 would be to actually implement the "modular" framework that I envisionned for RIG 1.0 yet that I never implemented since I kind of got bored of PHP development in the meantime.
Most of this framework would not add anything dramatic to the end result product. It would only enable to develop features more easily. Any of the features I haven't put in RIG yet could be simply hacked in the RIG 1.0 source code with little difficulty. The resulting code would probably be a mess and maintenance would become quite boring (yet not impossible, it's a small code base after all.)
RIG has two purposes for me. The first one is has a web gallery tool. The utilitarian purpose is the first one. The second one is of a learning tool -- with RIG 1.0 I learned PHP development. That was OK for a while yet now I want to evolve. I started small, I did several improvements to the web gallery and now I want something bigger. Whilst PHP would be perfectly suited for the task, I feel like I want to move on to something better -- more oriented object and with a real editor/debugger IDE tool.
On the side I've been exploring C#/ASP lately. That is nice yet the open source development tools & runtime environments are not quite ready yet (Mono, DotGNU, etc.). If anything, I also understood lately that there's no much difference between C# and Java, and between ASP and JSP. I've always pushed aside Java as something boring yet I realize the tools are there (Eclipse, JBuilder, etc.) and the deployment environment are really there (Tomcat, etc.). There's a maturity here that the open source side of .Net will not reach quite soon.
So I'm in need for a project to learn more Java/JSP stuff. Ideally I'd like to explore stuff like JSP 2.0 with EL, Struts, JSTL, etc. I got a good project to do that right here. The more I read tutorials about it, the more I'm convinced there's good usable stuff that can be done. Trying to use all of Struts may not be appropriate for the web gallery, but part of it seem to fit exactly such as Tiles which seem oddly in phase with the idea of a template system that allows me to separate code from layout more clearly.
Now I need to start slow, something like a pure JSP approach with a hand made templating system -- because that's what I know. Then maybe as I get more comfortable with the whole API around JSP I may start integrating them more.
So that's for the implementation detail. Now from a user point of view, what will RIG 2.0 do that the current one does not? Because if my users don't see an improvement, there's no reason for them to switch to something new (meaning new bugs, new installation & configuration to redo from scratch, etc.)
For starters, I'd say a template system that allow users to easily create new layouts is of utter importance because that's what everybodies see first. Then an addon extension supporting user feedback. For myself I want access counters and IP logging. Also of some interest is support for downloading zip files. Feature wise for the image album itself, it's a pretty good set of features as of right now.
(20050918: Obsolete milestones description moved into research notes.)
See Goal section above.
There's number of Jakarta projects that would really help:
I don't see a need to comply with a Struts application for example -- currently I'd prefer to stick to the basic JSP dev (because it is close to what I'm used with PHP) yet some of these libraries would help alleviate the code -- no need to redo myself what others spent time developping and debugging.
Tiles, included in Struts 1.1, will be most helpful in developping the layout system. What I called blocks below are exactly "tiles". So are the template pages.
As for preferences handling, Jakarta Commons has Commons-Digester which can help transform XML files in Java instances.
Some misc note: Defining the exact structure of the files creating the layout of a service should be service dependant. For example if I want PDA support the "directory" service, one choice is to derive the service and create another service called "dir-pda" that does just that, or to mix it with the generic directory code and automatically select a template based on the user agent. In this later case, the user could manually select the pda template too. In this case the template will not only influence the raw HTML layout but also the functionnality behind. That is the service Java code may detect it is using the PDA mode and limit its functionnality accordingly, disable some file handlers such as movies, etc.
When creating a layout, one may want to predict the place of some auxiliary tiles (for example a user comment module's tile or IP log tile.) OTOH it may be useful to have a specific tile for "all the other auxiliary modules" (those that are not listed in the page, what if they are listed later? The tag could have an explicit exclusion list.)
Do I need sessions? Or rather, do I want sessions?
Sessions can make some things easy. For example when navigating a site the session object
can store things that are expensive to compute and thus would not need to be recomputed
when changing page (such as the file list -- which if hashed with the URL can be invalidated
when reloading the next file list, or the user preferences, etc.)
The drawback is that sessions require a cookie.
Ideally if I can detect that session cookie has been refused (to be explored with JSP), then I could work with either sessions disabled or enabled. Basically if a session object does not have, say, the UserPref instance, then I recompute it on the fly. If it does, I reuse it as-is. I session cookies are not enabled, I think JSP has a mechanism to pass an identifier in the URL.
Terminology: an auxiliary service is a "module". Easier to write ;-)
A tentative workflow:
A typical main layout template for the directory service:
Modules can either be requested by the service in its initialization phase (i.e. explicitely created) or they can be requested by a tile:insert when the template is processed. In this later case the module will be initialized on the fly just before being insert or something, as part of the jsp code.
Ideally there would be module manager and each module is instantiated once. Unlike what I wanted to do in RIG 1.0, the module manager does not need to parse a list of classes and auto guess the list. There should be a list of "global" modules to always instantiate. This could simply be an XML list that is parsed in the conf directory.
CSS selection and inclusion can be done by a module.
Note that each module or service has access to a current context that includes the current URL and the JSP variables (request, response, session, out, etc.)
Whereas a servlet is a class that is instantiated once per servlet container and that processes many requests, the service or modules used here will be instantiated at the start of the current request. Eventually the base class could make it so some values are automatically saved and restored from the existing session if any, or that can be left as something specific to each class.
Unified URL scheme:
base.jsp?q=<main query>&extra_args=values
/th/w/h/path/image.jpg => returns a thumbnail of w*h size /dir/path/path/image.jpg
Misc notes for first prototype, probably redundant with previous ones:
The other interesting idea is to manage cache of meta-data by computing an MD5:
Modules should be able to associate meta-data with a given media too.
A tentative class layout:
RIModule RIService RIMedia
I started thinking again about where I want to go with RIG. The project is far from dead -- initially I wanted to rewrite in Java (mostly as an excuse to learn Java & Struts) yet it was not going to add any new feature from an end-user point of view so I kind of gave up once I got the feeling I wanted on the Java dev part.
So here I am back with a list of features I'd like to see. This is a wish list. Not everything will make it, and maybe not this way.
There are actually several lists: Stuff from a user point of view (end-user and admin) and stuff from a technical/implementation point of view (based on the first set.)
For an end-user, the major difference is that I want to integrate blogging and photo albums. I still need some research here but my typical use cases are the following:
From an end-user perspective, the UI should allow for better templates, a more attractive design by default. It should be more responsive with a lot more scripting on the browser side (think AJAX) and fallback for limited browser (old browsers or PDAs.) Templates should be user modifiable.
From an administrator perspective, I need the following features:
From a storage perspective, I need to introduce datasources with dissimilar capabilities: file system storage, DBM or sql would be the most obvious. Different modules may use different storage sources.
From an implementation point of view, I still want to have some kind of modular architecture:
I have obviously more to add here but I need to sort it through first:
As for an implementation plan, I envision something like this:
More on all this later.
This is a pot-pourri of misc notes mostly related to design.
With AJAX:
Without AJAX:
Separate the web server from the data server:
It could be beneficiary to have more than one rig data server, for example different services can be on different servers, or different session can be handled by multiple servers (load balancing.) A web server acting as a proxy for AJAX would provide a good place to make a transparent management of such a load balancing (i.e. transparent as in not visible to the outside world.)
Implementation wise, I'm still ambivalent on what languages I want to use.
The web server part is mostly a pass thru. It should "intercept" URL or AJAX requests and pass them to the data server. In return it should serve the content provided by the server. It should handle sessions.
Ideally the web server's script and the data server can communicate thru a socket with a simple protocol such as SNE 2.
For the data server part, I'm thinking of either pure Java or pure C#.
I still think C#'s syntax and mode of operation and libraries are a bit more
convenient than Java's ones -- this is very subjective of course.
As far as IDEs are concerned, Eclipse and VS.Net are pretty much in par.
Eclipse has better native jUnit support but this is irrelevant since I created
my own NUnit framework attached to libraries.
Now since this is focusing on the server, C#/.Net and VS.Net is totally irrelevant.
What matters in this case is Mono and this can be a problem.
That means whatever I do with VS.Net I have to double check it is supported and
works correctly with Mono. That also means the build will be done differently
(VS.Net solutions on one hand and NAnt on the other one.) And that in this case
NUnit testing may be crucial in making sure things work like expected.
So for the moment the choices can be summarized as follow:
Terminology:
The main principle of the design is that everything is a module. The core of the data server is basically a module manager.
Ideally it should be possible to update the modules live while the server is running. That may not be the case in the initial design and may be the result of a redesign or refactor later, but that's ultimately the goal. The reason why this won't be possible at first is that I haven't really though about how to do this at all from an implementation point of view nor experimented at all with it. It's just an ideal. It's OK if it doesn't happen anyway.
If the implementation was done in PHP, Python or Ruby, changing a module live would be as simple as changing the source directory. But I want something precompiled such as Java or .Net for obvious performance reasons. In this case, it implies that modules are compiled in separate DLLs or JAR files and then loaded and unloaded dynamically. I don't know how to do that yet and it requires some experimentation.
In either case, if I want to swap an existing DLL while the server is running, I can foresee that I will have to unload that DLL first, so any code referenced by this DLL should stop running. This may require the module DLL to persist all its data structures in some way, be unloaded, reloaded and then read its persistence data.
The question remains: Why would I want to do that?
Live update of modules is mostly useless in a development environment or even a hobbyist server, as in my case. I have no trouble stopping my own server just to update it (by server here I mean a web server or web application, not a computer.) On a bigger scale, in a real production environment, it is nice if one can avoid shutting down a complete server just to update minor pieces of software. It is largely possible that once the software is deemed stable minor bug fixes will occur, for example to fix minor security issues, and that some of the minor modules require being shut down to be replaced. It can be assumed that this will mean shutting down one whole service whilst another service can remain untouched -- which remains to be seen.
Modules will be inherently cross-referenced. For example the blog service will refer to the image album service and both of them might use a calendar module. When designing a service, it should be possible to mention weak-links and check for a module's availability before use it. This is also an implementation specific issue where I should learn more before engaging in this direction. It may require late bounding and thus may affect performances and complexity of coding.
So it seems like although I like this idea, it may not be proven that useful and may result in added complexity that do not balance their benefits.
The bottom line is that live update of modules is an interesting problem but is not the core of this project. If I don't know how to do it or if it proves useless (like having to do unload all modules to replace one) or affects performances, then I better start without it.
I want configuration to be as easy as possible.
For services and modules, auto-discovery should be enough by simply listing a predetermined directory structure, exactly like Ruby-on-Rails does. Creating a new module is thus as simple as duplicating an existing directory and changing it. A default skeleton module should be provided.
Configuration files should be avoided as much of possible. For example many Java-based web apps have complicated XML configuration files that details which services are available, on which port, via which transport, etc. This is often done with one configuration file for the whole web app. I should avoid such a generic config file by just fixing that there's a subdirectory "modules/" and every sub dir in there is a separate module.
Then in a module dir, I'd generally place a configuration file that describes this module, at least so that the loader can deal with it. A different way to do it would be to place a class in there and have it provide the necessary details via properties. The question becomes how does this class gets loaded -- and this is implementation specific.
Ideally modules should be dynamically loaded in order to only load stuff that is used (in reality it may be necessary to provide a way to start with all modules statically linked in, for debugging purposes -- another thing that needs to be experimented with.)
Assuming dynamically loaded modules, I could envision a mechanism where module A requires module B. If module A has already been loaded, its main class will inform the module manager that it requires some other modules (it should also indicate whether it merely uses them if available or actually requires them to run and thus would fail if not present.)
The module manager could then simply check that a directory with the module name exists in the module directory (that is "modules/B" exists and is a directory and "modules/B/B.dll" exists and is a file.)
Later on before module A actually uses module B for the first time, it should request it to be loaded. Obviously this call will do nothing if the module is already loaded and should thus be inexpensive.
For debugging purposes, it may be necessary to force all depend modules to be loaded (i.e. load B when A is loaded) or to force all modules to be loaded (even if nothing depends on them.)
This scenario works if the only thing we need to decide to load a module is its name.
I may want more than that. When thinking about module management for RIG 2 in PHP, I envisioned modules to be PHP file that started with a special comment section describing the module has well as what kind of file extension they would manage. I now think the idea was moot (it relied on file extensions rather than a notion of services) yet I may want in the future to be able to know more about a module before loading it.
I can see 3 ways to do that:
If this was needed the only portable solution would be the description file. We're not going to go there yet anyway.
Now there still is the issue that if all modules are not automatically loaded, how do we get services loaded at all?
One solution that would come to mind is to have more enhanced schemes. For example:
The first scheme is to have a "services/" directory that contains "top level" modules that define a service. This may be a simple class that gives properties and which responsibility is to dispatch calls to sub modules. The actual modules would be in "modules/". Everything in "services/" would be always loaded at startup and everything in "modules/" would be loaded on demand.
The second scheme is to have two sub directories in "modules/" with modules that need to be loaded at startup and modules that need to be loaded on demand.
The last scheme is to fix the dll name in a module directory. For example for a module "foo", there would be a directory "modules/foo/" that would contain a file "main.dll" or "startup.dll". If the file is called "startup.dll" it is loaded at startup, otherwise it is loaded on demand. Note that with this scheme a given module directory could contain both kind of files although I am not sure this would be beneficial and should be allowed at all.
Ideally there should be no need for complicated configuration files. In an ideal world, the software would work out of the box and be suitable directly.
Ideally the administrator of the server wants to control as much as he wants and the end-user wants to customize every single detail. But it has to be easy, without readin docs and without dealing with obscore configuration files.
And from my point of view I don't want to spend half of my code reading or writing perfs and inserting tons of conditional statements that make the code hard to read.
Yeah right.
So I'll need configuration. Some but not too much. And in a simple and unique format.
(to be continued)
So I may go towards a Java+PHP implementation to start with.
This all depends on what is faster to deploy on the linux side: Java or Mono. And right now I think it's clearly Java -- although the Java VMs are generally non-free (as in the Debian term "non-free" i.e. proprietary Sun license) this may change. In the meantime, Mono is a pain to install (too many packages). Tomcat is widely deployed so any server that somehow has a J2EE component running would likely have Java installed ;-)
Anyway, first I need to experiment with the following:
and the PHP side provides an SNE client.
Some session information should be used on the PHP side to keep track of the associated Java server session.
command line actually... jhead, etc.)
handled by the PHP side and then handling of the file on the Java server side.
Then we can build up on that.
The exact same thing is doable in .Net (VS.Net vs Mono) except I'm less enthusiastic about Mono so that will wait. The Java code should be simple enough and not use too many tricks so it would be imaginable to actually write a script that rewrites the code in C#. To make sure this may be possible, it is mandatory to abstract the language as much as possible. For example instead of directly using Java Vectors all over the code, write a dummy wrapper and use this type instead. Then it will be possible to provide a C# wrapper for ArrayList that does the same. It may be necessary to make sure the wrappers can be ported to C# directly (with NUnit testing to test access on both sides.) The wrappers may need to explicitly provide all usable methods (i.e. defer, not inherit from the native type.)
In the current implementation of RIG and Izumi, there is no online edit capability.
There are two reasons for that.
The first reason is that I mostly use these tools from home. Publishing photos for RIG is just a matter of copying them on the server over Samba, or maybe rsync over SSH when I'm remote. Then I can use the admin web page to select which picture to show & hide. But that's about it. There's no mechanism to upload photos, rename them, etc. Izumi is even "worse" in the sense that I tend to edit Izumi text files directly on the server and more recently I simply push updates to the CVS server and an hourly cron job checkouts the latest revision. So there is little incentive for me to have the possibility to actually edit from a web page.
The second reason is that I generally hate web forms. I tend to dislike tools like blog sites or webmail. These are generally really annoying to use:
There are just too many drawbacks!
One way to circumvent this pain is to type whatever you want in a normal editor on your desktop or laptop then copy-paste it all in the browser's submit form when ready. For major weblogs like Blogger or LiveJournal you can also get many desktop clients. So at least you can (hope to) edit offline and submit when ready.
So all in all it's a poor state of matters and when I created RIG and Izumi I wanted to avoid all this mess.
Now comes AJAX and more and more things are going to be done online. A very smart design on the client-side using AJAX could take care of the problems of interrupted connection. Yet it can't solve the unreliable part of the problem, which is mostly the inability to save locally and recover from a crash.
Ideally there should be some local storage for webapps on the client side. .Net provides such a thing for thin clients (I wouldn't go as far as classify .Net thin clients as "wonderful" but it's at least a good move.) I do not know if IE and Mozilla/Firefox provide such a local storage for JavaScript-based web app. I'd doubt it.
One way to solve the issue might be just to try to use a cookie to save data and restore it later. Or a more brutal approach would be to install some ActiveX or XPI that provides this kind of storage. This would be security-sensitive and should be designed with security considerations in mind.
In any case, a version of RIG 2 should provide some way to allow postings from a web. It must requires authentication and must be carefully planned before.
One of the issues I had with the original RIG is that I didn't want to trust the PHP script. From the PHP script point of view, the image repository was readonly and by design no code would write to it. The idea was that I didn't want to loose images due to bugs in my code. Furthermore in the deployment the user running the PHP script (www-data) could only read the image folders. In my initial plan I wanted to allow for posting from the web, with the limitation that all posts were done in a special "upload" folder, separate from the main one. When viewing albums, data from both the normal folder and the upload one would be combined. It's not a bad idea and the implementation could check that both folders are actually the same (by name, or symlinks or hardlinks) or let the site admin configure it.
This idea can be reused. Although I didn't think of it this way at the time,
there is one piece of logic between this: for one given repository (i.e. upload
repository or local filesystem repository), there is only one method of update.
The local repository is only updated via the filesystem and not from the web UI.
The upload repository is only updated via the web UI and not from the filesystem.
There can be exceptions to this (i.e. a local admin can manipulate all files on
a system) but from a user point of view there's only one way to have write access
to the data. This simplifies implementation and simply makes the whole thing
easier to understand.
In this light, having the possibility to upload images into the same directory
from both the local filesystem and from the web UI is problematic. Then you want
to have locks, handle duplicates and concurrent updates, etc. It gets ugly.
There are some other possibilities.
One simple way to make all updates go thru the same channel is to store everything in a database and access the data thru the database API.
If you want to have real files on a filesystem, another way is to use some local storage when copying data over the network then have a daemon upload to the server. In the case of RIG for example, there could be a designated "incoming" folder on the server's filesystem. Copy all images in there via regular filesystem commands (hardlinks would work nicely too.) Then either from a cron job or from a manual web UI, have the system merge these new images into the image repository. (Note that this is similar to the way one uploads files to Sourceforge.)
Another option is to have a client application on the desktop where the images are that will upload them to the server.
Many of these approaches have issues, typically that one has to deal with duplicates, partially aborted transfers, re-get and co. There's also some security considerations, i.e. authentication is necessary if the system has users with different privileges accesses.
The system is architectured in 3 parts:
[ rig-server ] <> [ Apache + PHP ] <> [ Web Client ]
The rig server, here referred as the server, manipulates various resouces. It serves different applications (blog, image album). These applications get the data from various sources, for example: - Readonly directory of images on local filesystem (these images are written by the user from the local LAN.) - Monitored directory of images from web-based uploads (these images are written by the rig server in response to user uploads.) - Settings & configuration files (some readonly, some read/write as results of online user preferences.) - Databases of data modifiable online by the user.
The idea is that the server will allow different as protocols or media adapters as "modules" (i.e. sort of like plug'in but hardcoded at first.) Then how these data repositories are used is mostly service-specific.
Ideally all these media adapters should have the same API. This may not be possible in reality.
So currently we have the specs give us the following implementation:
The features set for the image album service:
The features set for the blog service:
Another service for later is a plan manager.
Web UI:
Now that I think of it, I should split the Rig2j project in two parts. The modular architecture with templates and AJAX stuff should be a framework that is independant of whatever application I do on top if it. RIG, beit just an image viewer and/or a blog or whatnot, would be an application of the framework.
Let's move this later in a separate page once I get a name for this framework.
Here are some specs, to be developed:
What needs to be explored:
Naming conventions: standard.
Rig2r is the last development of this idea: same goal than rig2j but using Rails for the web framework and Ruby for the server part.
Rails for the web framework part makes a lot of sense: it's a good framework, I've been using it for 5 months now, it's easy to use, setup and maintain and is really adapted to an agile development where things are easy to add, change or modify. It will also allow me to add a more modern web UI design, including Ajax, with little impact.
I still don't want to rely on an SQL data storage. A back end server sounds like a perfect idea. Java or Ruby would both work for this, but since the front end requires Ruby I'd hate to also force Java as a requirement and it makes sense to limit ourselves to one language for uniformity.
Now the interface between the front end and the back end should be clearly defined and using a single channel. It should also be language agnostic so if there is a need to recode the back end in Java or C/C++ later, this should be possible and should not affect the front end.
The architecture is going to be a somehow different from the one for Rig2j. In Rig2j the web server side was just a pass thru that would basically call the back end for rendering and event management. Here we're going to leave this to Rails which just does a good job at it. From Rails' MVC point of view, the view and controllers are going to be in the front end (Rails) and the model part is going to be handled by the back-end. Pseudo-models, with an API similar to but not deriving from ActiveRecord, will connect to the back-end. In this regard, the back-end acts as a database as far as Rails is concerned.
On the back-end side at first I'll make extensive use of YAML and CSV to provide configuration.
The back-end and front-end should respect the ideas exposed in the milestone 2 of Xeres: there's a set of interconnected server nodes that expose media elements organized in a hierarchical way.
The job of the front end is to display this hierarchy, the corresponding elements, and also to help the user navigate. The front end should not have to perform format translation. The back end should provide the means to discover the hierarchy and give the media elements in a format directly usable by the front end. This means resizing images or converting blog posts to HTML or RSS.
The back end being a single entitiy also allow us to perform additional tasks which are difficult to perform in a normal stateless web application, namely cache recently transformed data and enforce consistency during updates.
The back end will need to be aware of who is currently viewing what, typically by keeping state associated with HTTP sessions. This will allow the back end to notify the front end when the underlying data has changed (assuming the front end can receive callbacks or periodically check for a changed state.)
Now implementation wise, we'll adopt an agile planning with milestones and iterations. The first milestones will take whatever shortcut is necessary to get most of the essential stuff out of the door as soon as possible, with any disregard to the goals as needed. This means for example limiting ourselves to one local server and limiting ourselves to directories of images and blog posts -- i.e. emulate current's Rig and Izumi, combined together but with a richer web UI.
Initial milestones might happen like this:
Let's see the server part in more details. Before that, what kind of interaction do we want the user to have? That is let's build "use cases" and use it to infer the architecture of the server.
First things for the admin & the user:
Notes:
(to be continued)
SNE
rig2serv
rig2rails:
Workflow:
Several notes:
RIG but not RIG... hey why not.
OK here's a very different idea: create a daemon that monitors image folders and generates actual static HTML files and previews on the fly. It should also delete obsolete ones and update only what changed. The generated HTML would contain the selected template style, pagination, etc. Then a web server can serve the static content.
The obvious advantage is that you're serving static content. That's also the obvious inconvenient. So that removes the ability of features like user-side preferences -- image size, sorting, etc.
You can still have user comments and user rating for example: to update, simply switch to a dynamic site with forms, and post the information to the daemon. The daemon can then update a meta-database and regenerate the appropriate page.
One more advantage is that it makes it trivial to snapshot the site -- simply backup all the generated pages.
How's that different from what I had in mind for RIG 2? No too much actually. In RIG 2 I already wanted a server that would handle content. Now the thing is that originally I wanted the server to generate the content on the fly following a request for that content and that the server would generate just what is needed by the web front-end to generate the page. Here instead the server generates the page directly in advance and it is served statically.
I'd like to point out that even though a page is static, we can still put some dynamic content in it using some simple AJAX. Typical examples would be a MOTD. There are many other dynamic effect we can achieve using JavaScript with no server side access, such as changing the CSS theme, the layout or displaying a summary of user comments and expand them on user demand. All these are "extra" features and auxiliary to the real content of the page.
Basic Architecture
Key elements for a basic architecture:
Details on implementation:
(actually a good idea for localhost debugging or control/stats/monitoring) and access to a lot of plumbing libraries easily.
If we really need the extra performance of C, we can use Python C modules or SWIG for C bindings.
Modules:
specific usage.
Specific modules:
* Generators parse input storage and generate "units" of output, for example an image or full album or one blog entry. * Entries are timestamped and the default is to sort by timestamp. Generators can override this. Image albums are generally sorted by name.
* Paths: base generated path (output), storage paths (inputs)
* Pictures, Blog/Channels, Latest News, Top 10. * They handle pagination.
* Izumi2Html
A data pipeline:
* For all mixers:
* For all generators for this route, get entries
* Combine/sort entries
* For all pages:
Generate page html using template
* For all possible subpaths, recurse
Things we cannot handle:
* Sorting * Finding and displaying subset
Python libraries of related interest:
* http://pyexif.sourceforge.net
* http://home.cfl.rr.com/genecash/digital_camera/EXIF.py
To be defined:
Storage modules:
Modification times. Callback on file/directory modified. Hides absolute paths.
There are various levels in our implementation, which we should clearly separate even though they all work together in the end:
task is to periodically generate the static content. Being a web server also allows us to use it for web-based configuration, typically on a specific port hidden behind a firewall and not visible from the outside. All content generation is managed via modules.
exactly how's that all going to work.
look like.
The generated content doesn't have to be only limited to pure HTML pages. One
way to factor the content is to use server-side includes (SSI) files (aka
.shtml), which amongst other things allows us to use includes for headers and
footers. Typically the choice to use SSI would belong to the user at the
template level. Server-side includes have a performance hit, are web server
dependant and generally need to be enabled manually in the web server's config
(i.e. Options IncludesNOEXEC in Apache 1.3+).
One interesting idea would be to generate dynamic pages, for example in PHP, instead of pure HTML. We are not going to explore this in this implementation. However, just as for SSI, the end user has the option of using PHP in the templates if that's what work best for the site.
The obvious benefit here is that the end user should be able to generate static content that works for his site. By default we'll provide basic HTML but really some sites may want to generate .shtml or PHP to include headers & footers.
One thing we'll want to explore is to use in-page AJAX to give some control of what is being displayed to the end user. One simple idea is to have the ability to view all images from an album or only the top-rated ones, to do sorting and generate pagination on the fly. This would work by have a generic album page configured with a list of images to display and attributes and JavaScript executed on the client side would populate a table with the image parts as requested by the user.
In the case of AJAX output, we should generate a static non-AJAX version.
Multi user/multi site: most of the time we expect to have one Rig server on a single server, serving a mix of local and remote files. The server could run as a user task or as a system-wide daemon. Several rig servers can run at the same time by just using different service ports.
In the context of a multi-user server, it would be easier for each user to run its own server.
However, one rig server should be able to handle more than one site. That is the static HTML output may represent more than one site, or variations on a site (public vs private). These will be made available using route mappings.
So let's review the implementation layers:
* Module Manager * Unit testing facility
* Users & ACLs * Web server for local configuration * Cron daemon to execute tasks regularly
* Storage/media access:
* Local filesystem
* Remote SNE filesystem
* Config files
* Parsers:
* Image repositories
* Izumi files
* Generators:
* Photo Blog
* AJAX
* Static
* Latest News
* Top 10
* Templates
* Route/site mapping
* Other modules:
* Ratings
* Description Comments
Workflow:
builds directories and image files entries.
(i.e. the parser module for an izu file will split it in time or hierarchical entries, the storage module takes care of whether the file is local or remote, flat or gzip, etc.)
metadata), and then request access to data. For simple data that can fit in memory, scanning the entry may cache the data directly. However for memory-intensive data (images, very large blogs), we just need enough data for the generator to do its thing and then can get the rest right before outputing the content without temporary caching.
Route example:
route(name = "Public",
outdir = "/var/www/html/foo",
modules = { "rig": { source = "/home/foo/images",
acls = "public, private" },
"izu": { source = "/home/foo/blog" } } )
This is more configuration that originally planned but it's easy enough to manage and we can provide samples to get started.
Top-level modules mentionned here are the generators, which drive the whole process. We might want to add configuration for other kind of modules (i.e. parser/storage) either globally or per generator/route level.
Namespace:
rig2serv.py -- main, parse opts, run server base/server.py -- load config, routes, run server, delegates base/modman.py -- load requested module (once) base/log.py -- logging facility module/module.py -- base class for modules (init, term, hup, start, stop) module/generator/generator.py -- base class for generators module/generator/photoblog.py -- photo blog, a mix of blog & pix module/generator/new.py -- latest news/images summary module/parser/parser.py -- base class for parsers module/parser/image.py -- image albums module/parser/izumi.py -- izumi files & blogs module/parser/rating.py -- user ratings feedback module/parser/comments.py -- descript.ion and author comment files (not user comments) module/storage/storage.py -- base class for storage module/storage/local_fs.py -- local filesystem (pass-thru, .ignore) module/storage/remote_sne.py -- remote SNE (TBDL) module/storage/sqlite3.py -- SQL Lite 3 lib for simple database stuff (ratings) module/storage/config.py -- config parser dummy test/test.py -- base class for units tests test/(modulename) -- per module unit test test/data/... -- data for unit tests
Helpers:
base/helper/main_opt.py -- helper for getopt (maybe)
Missing:
acls prefs routes conventions, default paths & files (default to sensible choices, overridable) SNETodo: make entensive use of doc & auto generate html doc. One basic test to automate: all public methods (not underscore-based) should have a doc string.
Redesigning an architecture has little benefit if not used to improve the user experience.
In this case the desire for redesigning Rig using Python has 2 or 3 goals:
Let's focus on this last one. In one word: Photoblog.
The idea is surely not new. I just have my own vision of it, which may or may not intersect or mirror existing services (I haven't bothered checked what's out there, as this is not a competition, more like a personal research project.)
Anyway, the underlying idea is that there's little difference between a blog and an image album. In both cases, categorized pieces of information are displayed to the user. The most important category being the post date. Another important category is user-defined keywords, also named tags, labels, topics or categories. The obvious purpose is to let the user display information that matches one or more of these categories.
Obviously I don't want to limit the capabilities of the system, meaning I don't want the base framework to impose one kind of display. However the implementation will focus primarily on one type of UI at first and we'll see later if I want to add more.
Blogs generally display a page with the latest N posts. Some may be "frozen" to always display on first page (sort of important annoucements.) A post is generally a text blurb with eventually images, either HTML or pure text. A calendar view or a date-based menu lets users access older posts. There is generally a synthetic summary view. In essence we have a flat non-threaded view.
Photo albums at the contrary generally focus on a hierchical view, where photos are grouped by themes in albums and sub-albums. There's often a grid view of albums then a grid view of contained photos and then singular photos can be seen one at a time within a given album, maybe with a slideshow option or something automated like this.
A photoblog would be a mix between these two types. This is particularly convenient if one uses albums as self-contained entities and most albums are ordered by date rather than topic, in essence an album is equivalent to one blog post.
In this case, a post would contain a description, i.e. what is typically a blog post (descriptive text with a few pictures inlined) and then attached in the same logical group comes the rest of the pictures.
The same structure can be described by having a regular album contain an index text file which references the pictures inside the album. Now this being Rig+Izumi, the index would be in Izu format rather than straight HTML but that's an implementation detail.
Note that a post should not be limited to just text and images. There should be equal support to movies, sound files (both needing to be streamed via an active object) or zip archives.
Note that there's still room for a hierarchical organization and/or a keyword-based approach. In a hierarchy, folders would contains either folders or posts. A top folder with the "latest" posts could be automatically generated. In a keyword-based approach each post is associated with keywords. Rather than rely on a hierarchy, the model is flat and only posts matching search keywords are displayed. Both approaches can be combined, that is we can have a hierarchy and still have keywords to limit display. A regular view would display the hierarchy whereas a search would display the matching posts in a flat view. On the other hand we can use keywords to make a flat view appear hierarchical (i.e. "A/B", etc.).
Either way, we need a way to present an overview/summary of a page. For example a "latest" page or a search result could display a thumbnail (if available, or a media icon) and a text excerpt.
How do we make the site dynamic when it relies on static generated pages? We don't. We can rely on AJAX and a page knows all the items it should display and then only display a selection on them, however this approach means it's difficult to create bookmarks (for example for a pagination system or a keyword approach). If we don't have AJAX, we can simply generate all the interesting combinations, possibly just creating a subset of them.
3 main "use cases" or "really whatever I want to do with this":
To replace RIG:
information and keywords. Image names must/can be used to generate keywords.
* View all of a given album or just best ones (so 2 sets of parallel html files.) Album and images are mixed, sorted by filenames date or filenames index. * Search-by-keyword option that is partly AJAX based: we just need to retrieve a list of images and keywords from the server and make a flat view of all what matches. * There would be no option for changing the sort criteria. * When displaying albums (one in hierarchical or several in search mode), display all related index files underneath or on top (as per template layout). * Clicking on a single image displays just that image (static HTML) with N*prev/next images and associated descript.ion underneath.
To replace Izumi:
Meta info is mostly the entry date and name.
To replace Wordpress:
* Use web album view for existing pictures, as usual. * Need to change image URL into new URL.
* One dir per entry with hand-made date * Izu for writing the index with self-referenced pictures * All associated pictures * Display index then link to full images at bottom * Image selection: best (default), normal, all (AJAX on the fly) * Image filter based on file-name rule * Automated footer/headers for izu post
Namespace:
Routes:
Rig 2 is about generating a framework that allows me to build the kind of application I want. Part of the logic is embedded in the actual generators and parser, but most of the application-depend design is mostly done via templates and generated pages.
Phase 1:
Phase 2:
The real "rig 2" photoblog design is the "new wordpress" replacement, that is a per-directory structure that contains text associated with many images.
For a blog, we also need to:
One "easy" way to do that would be to generate each entry as an HTML <div> block and then make indexes that reference one or more of these existing blocks. To reference these blocks, we would need either server-side HTML (aka .shtml) or PHP using includes. PHP sounds like a better choice, being more ubiquitous and providing more power for extensions. If we avoid keep the security should not be too much of an issue as the content will be static.
For this structure we can derive the izu view, which is exactly the same thing without the images and an album view which is the same without the text.
* Generator: pb (Photoblog) * Source parser: pb_dir + izu + image * Template: pb
The first time the photoblog generator works, it should create all individual entries, keep references around and then generate the various indexes as needed, i.e. static pages with everything, or separated by pagination, or by date or category.
Next time the photoblog generator works, it could avoid regenerating the entries that have not changed. We can check an entry has not changed by keeping the original item's date around or by keeping some kind of checksum. If each entry is stored in a directory, it's generally enough to keep the directory's modification date around. However to make this more generic we'll simply have each parser be able to return a "signature" (md5 or whatever) as a string and compare them. For the indexes, it may be easier to simply regenerate them completely every time.
We can store the signatures either in a separate database, or in the generated file as some kind of comment or simply in the filename of the generated item. The second option may be the easiest -- simply reserve some specific comment in the template for the signature to be stored and read it back. If the file doesn't even exist, we know it must be created in the first place.
In the first version of the photoblog, we'll want a flat view of all entries. However on disk they may be hierarchical. This can be used later to automatically generate categories or similar.
Workflow:
* Sets up default generator & parsers, templates * Starts generator
* Get top-level listing from photoblog directory parser
* Returns a list of ParserInfo items (stateless parser reference
and file info -- item hasn't been parsed yet)
* Each entry is a directory that contains a "photoblog entry", i.e.
izu text and images (either one can be missing). Empty entries are ignored.
* For each entry, generate them if missing or obsolete.
* Keep references of entries (new or existing).
* Generate indexes. First simply do the "all", then we'll see about dated or
categories.
* Reads a top directory recursively for directories with certain pattern name. * Pattern: "YYYY-MM-DD[ _]Entry Name". Allow usual variations on date. * Extension: Look for a specific file with options (categories, exceptions, etc.) * Directories that are actually empty are ignored. Other than that, content is not parsed.
Pegasus is Photoblog's new code name.
I need more details in the interal structure of Pegasus. So far I have the Pegasus generator and a PegasusDir parser. What does the parser really creates and how do I use it?
First, do we want a flat view or a hierarchical view? That is, should the directory parser automatically expand directories and create an internal flat list, or should I keep a list of either directories or files? In the former case, may it make sense to load the sub-directories directly or do some kind of lazy/late expansion?
If we were generating pages on the fly on a web server, the ideal structure would be to minimize reading to respect the locality of what is generated. Here it's a bit different since we want to generate everything up-front. However if we avoid loading everything at once, the memory footprint could potentially be smaller.
All this is a moot point. What matters here is how we want to use this in the final application. Ideally I'd like the page layout to somehow relate to the disk structure:
a paginated view of entries (about one per day at most.)
keyword or date, and the result is paginated.
The default pegasus display could be mix of those: by default it would reflect the disk structure, with options to get a flat view or filter on keywords or date, in which case the result would be flat. Either way, the end display will be paginated.
So this replies to one question: by default the directory parser should keep the structure intact and it's the generator that makes things flat as needed.
As for up-front or lazy/late recursion, I'd say the mix is to parse the directories up-front but generate meta-data where data is loaded late.
In this case I'd say the output of the directory parser should look like this:
contained in that directory. The directory name and the relative path to the root can be present.
filename. The parser can decorate the parser info directory with the actual content when needed.
One extension I had in mind was to decorate directories with "options" files. For example add a ".ignore" file to make sure a directory is skipped. This can be integrated in the parser later. To handle many extensions, a simple local config file could be used instead of using singular per-extension empty files.
The options I can think of righ now are:
otherwise simply use unix-like symlinks.
Some of these options may be more adequately targetted at the generator. Some of these should be the same you'd expect to find in the top-level or per-site confiuration file, but localized with the source media.
One structural dilema with Python: to deal with a structure such as the "parser info", one choice is to use a straight dictionary and the other choice is to create a class. The dictionary is very lightweight to declare whereas for the class I'd want a constructor with parameters, assign parameters to internal values then getter and setters as needed so it's a lot of boiler plate code for nothing. OTOH there's a lot of possible errors at runtime with a simple dictionary since all keys are strings where as the class member names will be checked in at import time so there's no chance for typos.
There are three alternate choices.
First, I could have some kind of meta-language and a mini-compiler that generates the Python boiler plate code for a structure (think of an IDL compiler). This implies one more tool to generate the structure and two extra files.
Second, I could just use dictionaries but hardcode the key as constants in a module. I get the benefit of a dictionary -- lightweight syntax and dynamic properties -- with checked names.
Third, I could have a mix of these -- that is have a class instance with a getitem overload that calls get attribute so it could behave like a dictionary.
Finally instead I'll start with a different approach: derive from the base dictionary class and assert that keys used in getitem/setitem are part of the known set of keys.
...
|
|

This work is licensed by Raphaël Moll under a Creative Commons License.
|
|
| Color Theme: | Gray | Blue | Black | Sand | Khaki | Egg | None |
|
|
|
|