Izumi Blog: Ralf - Concepts
Index: Home     | What Is Izumi | Misc Links   | Random Thoughts | Too Much To Read | The Rant Vault | Quotes
Dev:   Projects | Ideas For Dev | Nerdkill | Rig | Hint

Concepts lists several conceptual ideas of dev projects
This page uses reverse-date ordering.
Site License And Disclaimer as well as contact information are available here.
$Id: Concepts.izu,v 1.14 2007-04-15 18:14:31 ralf Exp $

«»  2007/04/14 «» MPLPA  «»

MPLPA stands for Massively Parallel Low-Power Array. The idea is to have tons of machines running some kind of VM/interpreter at the lowest priority possible, all connected together by low latency RPC. They'd get cycles only when the machines are idle. By tons, I mean hundreds of thousands of cells. The whole would form a grid, or more exactly be aggregated in clusters meaning that the actual latency between two cells would vary between their distance and also fluctuate with the load of the hosts.

The question then is how can you use this? Overall it's still a lot of power available but you wouldn't want any single cell to have a full load -- for the exercise we assume we only allow it to use a fraction of the host's processing power to keep a low profile and not disturb existing processes.

More over, any cell can go down at any time. There's no guarantee its state can be restored nor that it is lost either, meaning the instance can be dead or it may come back online later.

Sounds familliar? These are more or less the specs of the massively distributed clients such as the RC5 cracking contest and all the successive clones such as seti at home and the dna mapping project. These projects have all in common that they are inherently good designs for massive parallelization, with the extra benefit that it's ok if any particular cell dies (its job will get redistributed after a while) or gets back online (we can detect the job has been finished in between.)

The question is, can you run something else on such an architecture? For example any kind of generic program? Let's assume we have an interpreted language and each job is a VM/interpreter. We could have each cell run a tiny portion of the overall program. On top of that it would be nice if each cell was not using the local disk for storage (only memory) and if we didn't have a central "master". This means the whole cell network would be responsible for actual storage, with lots of replication.

[permalink]


«»  2007/03/30 «» EyeOfRa  «»

EyeOfRa: Now the main question is whether we could replace BitTorrent. Probably not. YouTube was hot because people could swap stuff for free when the big guys want it to be paid for. Keeping YouTube hot whilst legal is a challenge, imho. BitTorrent has the same issue. It's a perfectly valid protocol that works well; it could be improved on, but it's hard to replace the existing sharing networks. At least not by something backuped here.

Anyhow the idea would be to attach to trackers, ideally hiding being a fast net proxy and keeping the lowest profile possible. That may not work since the proxy will probably only download http/ftp content. The other issue is that the protocol encourages peer's activity so unless you can masquerade for a perpetually interesting yet choked peer, you'd probably be tossed away fairly quick, even by trackers. To keep low profile, if we could rely on the proxy we could also target non-us ips as a way to mitigate.

[permalink]


«»  2007/03/30 «» SlowMotion  «»

Or perpetual backup.

Data retrieval: a chunk-based protocol such as BT would be ideal, maybe on a non-standard port. Or we could use something like SNE3 for retrieval via http connections, maybe https.

Ideally transfered data would already be encrypted. It wouldn't compress to start with so encryption can't hurt there. We could use something like helix which does both mac and encrypt but that would mean the receiver would need the key to verify the mac iirc, which is out of question. So instead the remote client should deal with the encryption (or lack of) and the receiver just deals with the mac. OK scrap that, use helix as opaque data and then simply checksum the transfer.

Storage: Either BT or simply G files with sequential RIO records. We need mostly two modes: append & update. We'll rarely have deletes. To simplify records could be chunks of a given know size. So basically the storage is a fixed linear list of chunks (file id, chunk index, flags, data) and the chunk id is the index in the master file. Deleting files is done by marking the chunk as empty, for example nullifying the file id. File ids are determined by the remote client (f.ex. sha1 or fingerprint of filename -- it's a client's implementation detail to take care of file paths and moves).

The local server can keep a list of know chunks cached in memory, or scan the storage file for it.

To update, it would be better to first write a new file and then obsolete the old one. The old chunks can be recycled.

Transfer: If we use an HTTP connection we can have:

Note that in GET we need to get orders too, typically to implement a DHD-over-HTTP.

[permalink]


«»  2006/11/24 «» DHD  «»

Simple version (not using Nodes or SNE):

We can combine the relay server and the remote log collect server.

TCP flow:

Remote <--connect-- [ DHD  ] --connect--> [ Relay ] <--connect-- Local
Server <---data---> [remote] <---data---> [firewal] <---data---> Client
           app                 tunnel                   app

The channel between DHD and Relay can be unique and yet multiplex many potential IP tunnels. It needs to carry a payload that identifies actions vs tunnel data. So that's where SNE could be used.

[permalink]


«»  2006/08/28  «»

SNE2

 * Bind
 * Connect
 * Services (add/get/... list)
 * OnConnected > Connection
 * Open (end point ip, service)
 * OnReceive (event delegate)
 * Send
 * SendBegin / SendEnd
 * EndPointInfo
 * OnStateChanged (connected, disconnected, trying to connect)

Design issues for SNE2:

 * UDP means I have to deal myself with congestion, out of order, etc.
 * TCP handles all that. It breaks if the network goes out (surprise!)
   This can be easily handled by trying to reconnect in the client.
   The SNE instance should have a flag indicating if it should abort
   or try to retry and maybe how many times/how long.
 * We should notify the client when disconnections/retries happen so that
   they have a chance to update some UI and to cancel retries.

Nodes:

 * Register (app "service" name)
 * UniqueId
 * OnConnect
 * Interconnect Path

[permalink]


«»  2006/08/22 «» Octopa  «»

I should really name this project Octopus, but Octopa has a nice twist to it.

Anyway, this is a fake project which doesn't exist yet. It's merely a concept so it fits well here in the concepts page. I'll move it to a separate page if anything comes out of it.

A while ago I realized I should unify Xeres, Rivet and Rig: have servers that can serve content (Xeres), desktop viewers (Rivet) and web-based viewers (Rig). They communicate between them using Nodes, which is based on top of SNE2.

So in essence I'm taking several unrelated projects and unifying them. Octopa is the umbrella project.

That was in 2005/10/23 and not much has happened since. Recently I started rethinking of Rig in a totally new direction -- a static content generator. I'm basically removing Rig from the Octopa project and setting aside. Maybe later I'll reintroduce a web-based viewer, right now it doesn't really matter.

Anyway I feel like i need to revive Octopa and actually implement something. That's because I want to use the feature or more exactly a subpart of it so I'll focus on that one first and then see if the rest goes in.

I have two use cases in mind, which are apparently totally separated:

The only thing they have in common is the keyword "remote". I should add that both will run in trusted environments. Security should be built-in, but not at the detriment of features.

The first question that one should ask is "why?". Because I can. All of this exists in some form or another but I want to play with these ideas and see if I can build a framework around them.

First I need to rethink some of the bases.

Sne2 remains a very low-level exchange protocol for point-to-point. Clients connect to servers which provide "services" and the flow is mostly asynchronous and event-based. However instead of being connection-based, it should also allow for service interruptions and reconnections. We probably want to masquerade as an HTTP protocol to be able to go thru HTTP proxies. Services is a misnomer are should be renamed to channels. It is expected users of the library will use only one channel and only occasionaly use secondary out-of-band channels.

Nodes is an application-level library that offers a peer-to-peer network exchange protocol. The main purpose of nodes is to allow automatic discovery of application instances. Instances of applications exchange data packets with a similar concept of channels as in Sne2. This allows for roaming, i.e. a connection may be interrupted and connected at a later time without breaking the connection between the application instances. Consequently applications should be built with this concept of intermitent connections in mind.

(continued 20060830)

Octopa itself is an application that builds on top of nodes. Octopa can be best described as a simple framework that provides common functionality for specialized service modules.

Use case: remote desktop viewer. In this case the remote computer is running octopa with the "server" part of the module. Viewer clients can connect to this server and receive a stream of images. Clients can provide feedback, that is mouse manipulaton (actual implementation is a negligeable detail left to the reader :-p.) Typically there would be only one viewer client, however we could conceive a version with many clients viewing the same server.

Use case: remove webcam viewer. This is similar to the previous case, except the video source is not a copy of the screen but a webcam device. We could image clients being able to interact with the server by requesting the webcam to move (if it is motorized) or zoom/pan the image before sending it.

These two use cases are very similar in nature, which is why it would be interesting to combine most of their functionality. Also, Octopa is that kind of client, that allows modules of similar nature to use the node network. Octopa is not an ultra-generic node client -- there isn't such a thing.

[permalink]


«»  2006/02/01 «» First-class data  «»

Spreadsheets are programs that mix data and tiny code snippets (i.e. expressions). Moreover columns' cells typically have the same expression, which we can see as columns being classes and cells as instances of this class.

However the real strenght of spreadsheets is that data comes first. That's what people see. Expressions are hidden being data. This is the reverse of conventional programming languages where one sees the code, the abstraction and not the real data. You can see the real data only as a side effect by manipulating inputs and examining outputs, which is sometimes difficult. Even more difficult is examining the inner steps of the data transformation.

Visual programming does not solve this issue. Whether algorithms are designed by organizing logic shapes or stacking action verbs thru worflows, we still have the same problem: this details the abstraction of what we want to do, not what will actually happen with real data.

Obviously the problem here is getting meaningful data in and out. While it's easy for a spreadsheet, data can be a lot more abstract for other domains.

Luckily for us, there's one set of data that is very meaningful when one is writing code and that's no less than unit tests' data.

So the bottom line is that tools to write code should focus on merging unit test right into the code writing process. This is obviously not something easy to achieve when code is edited in a classical text editor. It is however a good extension for visual editors. The code doesn't have to be expressed as flow charts, it can still be full text with an ability to associate directly input and output data.

The problem here is coming up with a graphical metaphor that allows the data to be associated with inputs and outputs and defining criterias for meaningful comparison. Think beside integers and strings: how do you test an algorithm that works on images or a network stream?

The visual programming tool should provide ways to generate mock data sets on the fly with little or no coding required.

[permalink]


«»  2005/09/21 «» FastDev  «»

Buried somewhere in Ideas For Dev is the expression that development is too tedious. Be it C#, Java or even worse C/C++, there's too much work to express one idea and make it an application.

Environments like .Net, Java, Python (and even PHP in some way) help by providing full featured frameworks, a stable foundation for one to build on top. Still when programming, a lot of thought has to be put in how to organize, how to design (which is good) yet a lot of it has to do with how to deal with the language, discover the API, and most important actually write the code and format it as appropriate.

So in a way, one solution for RAD is to provide a rigid canvas. Things like VB6 that assist the user by providing an UI editor and then focusing on adding code to UI elements (rather than the other way around) are pretty good. "Professional" programmers dedain this has been too simple yet it is proven to work. OTOH since these environments are UI based they are clearly limiting their domain of application -- sort of. You can make almost anything with VB6, at the expense of speed but that's not the issue.

So is this the only solution? UI centric RADs? Probably not. Another approach is to consider an environment such as Squeak and Self: live instrospection of live objects, and in the case of Squeak the ability to perform some of the programming by clicking around to explore objects and select them. In this mode, note that the program is alive.

One of the issue in this case is discovery: it should be fairly easy to get an overview of the system (either running or static). So in a way the system is first of all a live database of objects that interact with each others. It just so happens that people are good at manipulating databases and inventories. The key is that most of us need to see it visually. IDEs for serious languages tend to go this way, such as Eclipse or NetBeans displaying list of files with list of methods, yet with hundreds of files in a project and hundreds of methods lying around it becomes quickly overwhelming.

So that's the part that none of this address, in my opinion. A VB6 application with hundreds of windows and each one with a myriad of functions associated to UI elements would be rather hard to maintain -- not to mention that a newcomer would have no way to get an overview of the system.

An application with a thousand of C++ files is not any easier to understand. Some files may form an underlying kit to encapsulate simple concepts (like lists, containers, etc.), then some other files contain just auxiliary methods and finally a core of files would make the heart of the application and another huge bunch of files would implement the UI. Separating all these in modules helps some but not much.

The reality is that no amount of magic is going to make a complex system simple. An application with thousand of files is by definition a complex system with myriads of possibly intricate relationships between objects. Any attemp to make a complex system look simple is doomed; Such a tentative would generally simply result in creating a rigid framework which purpose is only to limit the complexity of the relationships that can be created. For a limited domain of application this is beneficial -- understanding the rigid framework means understand any application that can be written with it. This is exactly what VB6 or any database-related RAD does.

In the context of a generic framework, it seems more appropriate to help the process of discovery, for example by providing overview of classes and objects, live introspection for debugging and also by providing editing tools that reduce the amount of code that one has to write. This last part includes completion and refactoring (which are now common) and should also include classes skeleton for known design patterns.

Current IDEs should and could help with design patterns in a couple of ways. First wizards could help generate the initial classes that form a known design pattern then help add classes to the pattern. And finally they could show an abstract overview of a project which patterns shown as modules or black boxes rather than their individual classes.

[permalink]


«»  2005/07/06 «» PhotoBlog  «»

Let's describe a prototype web interface that allows to manage both photo albums and blogs. The web UI should be CSS and AJAX driven.

From a user perspective, I want to:

Image albums don't have to be that different from blog posts. In fact I'd argue that an image is just a regular blog post without text.

[permalink]


Blog Archives:
Most recent posts


Site License

Creative Commons License
This work is licensed by Raphaël Moll under a Creative Commons License.

Options
Color Theme: Gray  | Blue  | Black | Sand  | Khaki  | Egg  | None

Web ralf.alfray.com Powered by Google

Display Izumi & PHP Credits

Stats
492 accesses, 1 access from 38.107.179.206
Visited 10 times by Google, last 2012/04/03 00:49
Visited 7 times by Yahoo!, last 2011/10/18 07:29
Visited 5 times by MSN, last 2012/05/18 08:58

< Generated in 0.47 seconds the 05/20/2012, 07:14 AM by Izumi 1.1.4 >