Saturday, April 04, 2009
Sunday, June 15, 2008
Distributed OSGi — tilting at windmills
In a recent eWeek article, though, we see there is a movement afoot to do precisely that:
Distributed OSGi Effort Progresses
This excerpt makes it quite clear what this initiative is all about:
The Distributed OSGi effort also seeks to enable "a service running in one OSGi framework to invoke a service running in another, potentially remote, OSGi framework (meaning a framework in a JVM)," Newcomer wrote. As the current OSGi standard only defines how services talk to each other only within a single JVM, "extensions are needed to allow services to talk with each other across multiple JVMs — thus the requirements for distributed OSGi on which the design is based," he said.I see such an effort as yet another attempt to reinvent what has already been tried several times before — distributed object computing — by the likes of such technologies as DCOM, CORBA, and even JEE/EJB (ala Session beans and RMI). What do these technologies all have in common? They have fallen into disuse after befuddling and frustrating many a programer (I personally fell on the DCOM sword back in the mid-90's).
Interface Calling Behavior Semantics
I have a by-line that I sometimes sign my blog postings with:
Let me get right to it by lobbing some grenades: I recognize two arch evils in the software universe – synchronous RPC and remoted interfaces of distributed objects.The sentiment being expressed here is that it is wrong-headed to try to make synchronous method invocation transparent over the network. Yet that is the grand illusion that all these distributed object solutions strive to accomplish.
The problem is that an object interface that may be perfectly usable in a local JVM context — where call chain invocation can be sub-millisecond — will not have the same behavior semantics when invoked over a network connection. The method invocation may take 15 to 50 milliseconds on a good day; or may fail due to low-level transport errors (which never existed when it was being invoked in a local JVM context); or just time out without completing the call; or even never return/complete at all in any acknowledged fashion.
The consuming software code that used a method in a local JVM context has to now be designed to anticipate a wide range of different situations, as the calling contract of the method is radically different depending on the context in which it is being invoked. The advocates of distributed object computing, however, want us to believe in a grand illusion: that modules that were written and proved out for use in a local JVM context can be shifted to be consumed in a distributed context — and where consuming code, presumably, doesn't have to be any the wiser (or at least not very much wiser).
Of course, some may recall JEE/EJB Session beans where methods invoked via RMI had the potential to raise exceptions related to i/o transport errors. The upshot is that one had to design software from the outset to be a consumer of a "distributed" object interface vs. just a consumer of its local interface. Also, it was not long before EJB developers discovered that an object interface that made sense for a local JVM calling context would become a performance liability in a distributed computing context.
To use the interface by its design gave rise to chatty round trips over the network, where, due to the latency/unreliability, the software becomes visibly sluggish. It is most dismaying to see all the enterprise software systems that have sluggish and problematic user interface behavior due to the application being written on a foundation of synchronous use of distributed object interfaces. That distributed object koolaide that folks drank from proved to be spiked heavily with toxic radiator fluid.
In essence, EJB developers found out that an object-oriented approach to software design could not be transparently shifted to a distributed computing context. The OOP software systems that tried to make that leap devolved into a quagmire of issues that had to be battled.
Interface Version Management
On the basis of this one failing alone distributed object computing has been one of the greatest colossal architectural mistakes of the last 15 years in the IT industry. Yet the failings don't stop there — the other equally perplexing obstacle to this undertaking is object interface version management.
I tend to think that the versioning dilemma is perhaps even more insidious than the synchronous distributed method call semantics problem. One encounters the issues of call semantics fairly early on, however, the interface versioning dilemma arises gradually over time, and then mounts up and becomes one of the greatest headaches that one battles in trying to keep deployed distributed software systems coherent.
One of the popular agile OOP developer practices of recent years is frequent re-factoring of code. Indeed, all of the popular IDEs in use are adept in assisting the developer with re-factoring. Re-factoring may well be a good thing in a context where the development team gets to control all the deployment pieces that are impacted by such change. However, in a distributed computing context, which is usually heterogeneous, re-factoring would just be asking for misery.
Making changes to how distributed software systems interact, and where multiple development teams and/or companies are involved, is a process undertaking akin to wisdom tooth extraction (the difficult kind where the dentist has to work for a few hours to break the tooth apart and bring it out in pieces). The simplest of changes can be tedious to negotiate, often politics of some variety intrudes, and it is often challenging to schedule releases to well synchronize with one other so that deployment can occur.
As such, the notion of versioning of distributed object interfaces has been proffered as the means for coping with this. One team can come out with a new and improved interface to an existing object and deploy it unilaterally. Other parties that devise software that consumes the interface of said object, can catch up to the new interface as they are able. In the meantime the older interface remains in place so that existing deployed software keeps working.
On paper versioning looks like a workable approach — the significant distributed object solutions have all had provision for versioning interfaces. In practice it can even be done a few times. However, for large, complex enterprise software systems, maintaining interface versions gets to be burdensome. One of the reasons is that by their very nature object interfaces are very brittle. The information state that is exchanged tends to be very explicitly coupled to the interface signature. It can be hard to significantly (or meaningfully) evolve the software implementation of an object without having impact to the object's interface. Once that happens, the interface has to be versioned — and a sense of dread then sets in.
OOP Does Not Work Well In Distributed Context
As to distributed object computing, quagmire is the operative word here. Quite simply OOP does not really work in a distributed computing context — there are too many difficult issues that entangle for it to be worth the while.
It is fascinating to see new generations of software engineers getting lured into re-inventing distributed object computing over and over and over again. And a lot of the same computer industry corporate players get right behind these endeavors every time. These kinds of systems become complex and grandiose — thus they seem to be excellent sticky fly traps for luring in developers. Think of the legions of developers (and their companies) that have floundered on DCOM, CORBA, JEE/EJB, WS-* (death star) — and now lets add Distributed OSGi. Distributed object computing is our industry's Don Quixote tilting at windmills endeavor.
Asynchronous Messaging and Loose-Coupling Message Format Techniques
So what is the alternative?
When it comes to distributed computing interactions, try following these guidelines:
- Design the software from the outset around asynchronous interactions. (Retrofitting synchronous software designs to a distributed context is a doomed undertaking that will yield pathetic/problematic results.)
- Prefer messaging to RPC or RMI style interface invocation
- Attempt to use messaging formats that are intrinsically non-brittle. If designed with forethought, messaging formats can later be enhanced without impacting existing deployed software systems. (The entire matter of versioning interfaces can be dodged.)
- Build in robust handling (and even auto-recovery) of transport related failure situations.
- Never let the user interface become unresponsive due to transport sluggishness or failure situations. A user interface needs to remain responsive even when over-the-wire operations are coughing and puking. (So distributed interaction I/O always needs to be done asynchronously to the application's GUI thread.)
- Keep transport error handling orthogonal to the semantics of messaging interactions. (Don't handle transport error and recovery at every place in the application where a messaging interaction is being done. Centralize that transport error handling code to one place and do it very well just one time.)
AJAX and Flex Remoting
A messaging approach is a sound basis for designing distributed application architecture — especially when one does not control all the end-points. More recently I have been designing architecture for Flex-based web RIA applications. In these apps, the client uses Flex asynchronous remote method invocation to access services on the server. Adobe's BlazeDS is embedded in the server to facilitate: remoting calls, marshaling objects between ActionScript3 and Java, message push to the client, and bridging to a JMS message broker.
You may think that I'm not exactly following my own advice. However, there are special circumstances at play:
- Flex I/O classes support asynchronous invocation, so the operation does not block the main GUI thread of the app.
- Flex I/O classes invoke closures to process return results; also, a fault closure can be supplied to handle transport related errors. Consequently a programmer can write one very robust fault handling closure and reuse it in all I/O operations. Thus Flex does an excellent job of segregating business logic processing from transport-related error handling.
- Flex client .swf files are bundled together with their Java services into the same .war deployment archive. Consequently, the client-tier and the server-tier are always delivered together and thus will not drift out of version compliance.
If the service call interfaces are refactored, then the client can be refactored at the same time. Typically this is even being done within the same IDE (such as Eclipse with the Flex Builder plugin). The Flex code and the Java code each have refactoring support. Flex unit test could then be used within the development context to verify call interface validity.
Google GWT applications have similar characteristic where asynchronous method invocation is supported for invoking services on the server tier. Client tier java code and services jave code is developed co-jointly and can be packaged into a single deployment unit.
AJAX web applications may be another case where the client tier and the server tier are often deployed together.
So the take aways from this discussion are:
- If you can't control both end-points stick with messaging and loose-coupled message format design. Be very mindful of the versioning dilemma from the outset and plan with forethought. The best outcome is to be able to evolve message formats without breaking end-points that have not yet been upgraded. Try very hard to dodge the burden of version management of installed end-points.
- If you can deliver both end-points from a common deployment unit, then method invocation of remote object interfaces can be okay. However, stick with the technologies that support asynchronous I/O. Separation of the concerns of business logic processing on return results from transport fault handling is the ideal.
Building Effective Enterprise Distributed Software Systems
XML data "duck typing"
Flex Async I/O vs Java and C# Explicit Threading
Sunday, May 18, 2008
Explicit Threading for Async Behaviors (Java/C#)
With Java and C# .NET developers, these folks are used to doing asynchronous operations on threads they explicitly create and manage, and then use the utility mechanisms that both Swing and C# .NET each provide to marshal any data to the main GUI thread.
Flex Async I/O
With ActionScript3/MXML based Flex apps, asynchronous operations revolve around doing I/O service calls or messaging interactions with the server-side.
So all the various I/O classes of the Flex SDK can operate in an asyncronous manner. For instance, the HttpService class has a send() method. Invoke send() and it will execute asychronously to the calling thread (which for Flex, will always be the main GUI thread of the app).
ActionScript3 Closures coupled with Flex Async I/O
To handle the result (or any error condition), you can use the closure feature of ActionScript3 language to supply a closure block that will process any result that is later asynchronously returned. The Flex architecture insures that your closure code that is invoked to process the result will actually execute on the context of the main GUI thread. Hence that code can safely interact with the state of Flex GUI objects.
A closure can be supplied to handle a successful business logic result, a business logic failure condition that is perhaps returned to the client - and a separate closure can be supplied to handle faults. Faults would occur because of, say, transport related failtures, etc. - such as not being able to connect to the destination server, I/O timeout, underlying socket errors, blah, blah, blah.
Closures are a very natural and intuitive means to program asynchronous class APIs.
Flex approach is a simpler programming model
The net result is that Flex architecture presents a simpler single-threaded programming model to the developer, while asyncrhonous I/O programming coupled with the language feature of closures, provides a means to create apps that are just as fluid and responsive to the user as are Java Swing or C# .NET WinForms apps. The user can continue to interact with the UI of the app, a progress indicator can be presented, and a cancel button that acutally works can be presented to the user.
So some of the same things can be accomplished in a Flex app but with a better and easier programming model than the explicit thread programming of Java or C# .NET.
Client-side MVC Pattern ala Flex
Just to put a fine point on this subject matter of Flex asynchronous style of programming, what a real Flex application will typically do is implement some variation of the MVC pattern (on the Flex client tier - not the server-side tier).
So the closure that processes the result of some asynchronous I/O service call operation (or a message pushed from the server-side via the Comet Pattern - ala BlazeDS), will typically use the result data to go and modify the state of an object that resides in the client. That object would be some aspect of the data state that constitutes the Model from the MVC pattern.
Properties, Events, Declarative Data Binding
Views onto the Model can utilize the AC3/MXML language features of properties, events, and declarative data-binding to couple interactions with the Model.
A Model object can expose its state as properties. When any I/O closure code modifies the state of Model objects, events on their properties will fire. Views can bind to these Model object properties such that Views will update themselves whenever the underlying Model objects change due to said asynchronous I/O activity.
Views can use a Controller to deal with feeding view changes (which are usually due to user interactions) into the underlying Model objects. Data validation will likely tend to reside in the View code. So View/Controller interaction will concern itself more with feeding user-driven changes in View state (that have been validated) back into underlying Model objects.
Multiple Views onto single Model
One upshot of the MVC pattern is that it now becomes straightforward to implement multiple Views onto the same underlying Model objects. Also, View code gets completely decoupled from I/O plumbing code, as that becomes partitioned into a separate aspect of the client app architecture.
Additional UI Considerations
Views may still need to be designed to say, disable a submit button after it's been pressed (so the user can't click it by accident or intent multiple times). Events could be used to signal to the View when the submit button can be safely enabled again. Likewise, there is the View programming involved for such things as showing progress indicators and/or cancel buttons when asynchronous I/O operations are actively being processed.
Sunday, May 11, 2008
Is Flex RIA suitable for content-heavy web sites?
Often it's believed that a web RIA approach is perhaps not so optimal for content heavy web sites (the blogging and social web sites, etc).
However, there are two aspects of Flex that could be leveraged to integrate content and yield a more dynamic UI.
- First is that a Flex app can be built like a portal. A Flex form can dynamically load other Flex forms in a nestable like manner. The sub forms loaded can be determined by logic that executes at runtime. The loadable modules can be built as totally standalone components. (Via OSGi, is possible to even bundle these Flex form modules with their own respective Java service layer beans.) They can bind into the hosting portal environment using Flex properties, events, and declarative data binding features. So the coupling layer can be well abstracted so as to not be overly brittle.
- Thirdly, Adobe AIR has WebKit HTML engine built-in, so an AIR app can also readily make use of HTML content in an integrated fashion.
- Fourthly, there are already mechanisms that can be hosted in Apache httpd and IIS that will compile Flex forms from source code on the fly. So some aspects of even the Flex RIA could be generated dynamically at runtime. Currently these mechanisms are intended for development phase purposes. The compilation is not considered fast enough for production sites. However, in time this will likely improve.
(Sorry for going a couple of bullet points over.)
In the meantime, my preferred approach is Flex form modularization and Flex control iFrame for embedding HTML content.
Friday, January 25, 2008
An Introduction to OSGi on the Server Side
No-Fluff-Just-Stuff founder, Jay Zimmerman, likes to refer to what he calls the WOO factor - window of opportunity - when talking about the dynamics of our software development industry.
For me, Adobe Flex is a case in point of the WOO factor that he talks about. It's a technology that was ready and in position at the right time to become a dominant RIA solution. In contrast Java was caught with nothing to show in the RIA space. It missed the WOO.
Right now, I have an engaged server-side project that will be built on OSGi. OSGi is here today, has a proven track record (Eclipse), and solves the problems of project size scalability that we need to grapple with over product life time.
Once again, just as EJB3 missed the WOO and has been eclipsed by Spring Framework, whatever standard that Sun will ultimately back via the JSR process will miss the WOO.
Why am I writing this as though I'm lamenting a sad development? I guess that's a good question. There's really nothing to be downcast about. Great software solutions on the Java platform exist outside of JSR land. Indeed, maybe the very best software used in Java circles has been the non JSR stuff.
In coming to the Java community I've never had a problem sidestepping the nonsense being peddled and instead adopted what made sense. With JEE I completely ignored session and entity beans and just used JMS MDBs and Hibernate. So in 2003/2004/2005 I built a distributed software system on principals that the SOA dudes are now all crowing about.
Building Effective Enterprise Distributed Software Systems
data-driven communication vs behavior-driven communication
More recently I've completely ignored MVC web frameworks on the server-side and instead went with RIA + SOA where MVC is completely done on the client-side. (Doing MVC on the server-side in defiance of the Fallacies of Distributed Computing was wrong headed from the get go.)
Web frameworks peaking toward obsolescence
So here we stand today poised to fish or cut bait on this matter of large scale SOA-based development on the server-side - and we badly need a modularity solution for Java.
This is the moment of the WOO and here stands OSGi.
Saturday, September 22, 2007
Dynamic proxy vs byte code enhancement
ActiveObjects is interesting but could be improved by being implemented using byte code enhancement tools instead of Java dynamic proxies. Ruby, et al, are fine with taking the hit of dynamic binding overhead (people expect such overhead with scripting approach, as it's a conscious trade-off with respect to benefit of rapid development). Java is a compiled language, though, where performance is emphasized (and prized) for the purpose of building industrial strength solutions for enterprise class applications.
The author's phobia regarding byte code enhancement only serves to undercut his credibility. Byte code enhancement is well proven as many systems have been in wide use for several years now. In years of working with such tools, libraries, frameworks, languages (AspectJ), etc, byte code enhancement has never once been the cause of any issue or point of grief. Others have like experience or else the forums would be full of complaints.
Convention over Configuration
Convention over configuration is a utopian concept given that in practice enterprise development has to give sway to schema that is legacy and/or dictated by DBAs (and for good reason as relational database schemas need to be designed to perform well - not accommodate idealized object-oriented design).
In the end, for ActiveObjects to be realistically useful, there will need to be a way to use things like annotations in order to map the object perspective to the underlying database perspective. Or else the ActiveObjects entity interfaces could always be strictly generated by a tool that takes the database schema as its input (but event then there will be things that need to be customized via mapping overrides, etc.).
Creating Java interfaces for entities as the starting point only works out for sample apps, the simple Ruby-esque contacts CRUD app, or the occasional (but highly infrequent) green-field situation. Yet even there, a serious app of some complexity can get in trouble if database scheme is directly based on idealized object-oriented design. There are a lot of special things one does in the database to make operations perform well that would never have come from generating off of the object-oriented design of entities. Plus a pure, idealized object-oriented design could lead to pathological constructs in the database - from a relational database perspective.
When in Rome, do as the Romans. When living in the RDMS, do as the RDMS does things and life will be more pleasant.
Lazy-loading seems like a fine idea from a naive (inexperienced) Java programmer's perspective, but in practice one has to work to minimize discrete trips to the database - or else will get thrashed on performance. Trying to always default to a simple heuristic, such as lazy-loading, is another concept that falls down in practice.
Business Logic in Entity Interfaces
Placing business logic methods into entities is pretty much a bad idea.
Often times it is desirable to convey entities over the wire (90% to 98% of the time) - that is, between the middle-tier and client-tier of 3-tier or n-tier distributed applications.
As such, entities need to be kept stripped down to pure data only, ala DTO (data transfer objects).
One of the very worst sins that one can commit is to pollute a distributed application with the anti-pattern of "distributed object interfaces". Attempting object-oriented programming that transparently spans across the network is discomfiting as the network is neither reliable nor deterministic.
Sticking with DTO approach to entities and then create services that operate on or with DTOs avoids such morass.
By tossing objects with behaviour and going strictly data-only, there are then some great enterprise messaging patterns that become possible. Those patterns will more than make up for the abandonment of OOP over the network.
The problem with OOP and network is that OOP as we understand it today is build on synchronous assumptions of behaviour interactions. Synchronous operations and the network mix like oil and water - bad combination. The second (or is it the first?) big problem with OOP over the network is tight coupling. Tight coupling is bad news for distributed application. Either one of these issues by itself is sufficient to doom this concept.
ActiveObjects needs to have the perspective of another five to seven years experience developing real-world enterprise applications.
The shear rapid ability to write working code, ala Ruby and RoR, focuses on a very narrow part of the overall problem spectrum. When you concentrate so much on this part of the equation and neglect/ignore the others, then the end result is something of limited applicability.
The serious-minded tools/solutions that exist out there have endeavoured to address the wider range of issues that professional developers (at least those in the enterprise arena) confront.
Wednesday, December 27, 2006
XML data "duck typing"
One of the most paramount concerns in working with enterprise distributed software systems is the issue of tight coupling between nodes. This particular issue is one reason I prefer messaging vs invoking methods on a remote interface. Interface binding leads to very tight coupling. Tight coupling leads to difficulties in attempting to evolve a distributed system over time. In practice it is seldom feasible to rev tightly coupled nodes in lock step with one another. And the practice of versioning interfaces eventually becomes too burdensome if attempted pervasively.
When messaging is applied with certain tactics it can bring about some significant relief on this front.
Now as I was first starting out with my JMS distributed system development, I was conveying relatively complex XML documents around. These were being described in XSD and then tools like JAXB for Java and XsdObjectGen for .NET were used to generate suitable marshaling code. The documents could then be marshaled into .NET or Java object graphs with ease. This works pretty well (though Ted Neward can speak to some significant limitations and caveats) and the nature of XML Schema, as described by XSD, makes it possible to evolve document schema over time while not disrupting existing code that binds intimately to the resulting object graphs. (Yet one may sometimes still need to regenerate the code when the XSD changes which can require updating deployed code.)
However, I eventually drifted away from this practice and now use an approach that I dub XML data duck typing - a play on the term duck typing that Dave Thomas is well known for using in his Ruby talks.
Basically I was finding that processing nodes tended to have interest in subset information as gleaned from XML documents. This was just a natural evolutionary outcome of organic expansion/enhancement of the distributed software systems. Indeed it was very advantageous to encourage this tendency (a single message could have multiple nuances of effect based on what node processes it).
So I now write components such that they just use XPath to glean the information that they care about from XML documents. I have a technique and a few helper methods that make this a very easy and systematic approach. The upshot of this, though, is that a given node in the distributed system could care less how radically an XML document is evolving over time so long as the info it cares about can continue to be successfully gleaned. If it looks like a duck and quacks like a duck then...
Dropping the code generation tools and just going with XPath-based XML data duck typing has been a breath of fresh air as it breathed a new level of loose coupling into the distributed systems.
BTW, Ted Neward is all over this entire subject matter in a talk he is giving at developer conferences that he dubs XML Services. You can catch him speaking on this at both Java and .NET conferences.
In my distributed application architecture, there isn't really much OOP-think that goes on. Is mostly topology, data flow, event flow, with a lot of emphasis on filtering, routing, bridging, and transformation. OOP is pretty much just relegated to the internal design of distributed components.