Tuesday, 18 January 2005

Business Case for LMS and scalability design considerations

This post have two points to make:

This afternoon, I have lunch with two best friends from two different Australian universities. Both are middle managers and both are responsible for making a recommendation to the Senior Management for a campus-wide course delivery system. Eventually, they both will be responsible for the implementation and maintainence of the system. Both are non-technical and come from an academic background. After some wine and food, we started talking about LMS.

I was in favour of open source and have been pushing hard to both to adapt/adopt the MIT/Stanford open courseware and saki approach. However, both rejected the idea basing on their perceived value of their respective higher management. While one of the university has already made public their choice, the other's choice is still officially private information. But I know that they ended up choosing differently from WebCT and Blackboard. These two universities, due to their different background have chosen different commercial LMS. One of the strongest reasons for the choice is the perceived support offered by the vendors and the "additional" cooperation offered to them. They were both offered position in the development committee of the product so that their implementation experience and needs can be addressed in the product. Another interesting result of the negotiation was that they both got a 5-year license with a cap on the license fee.

It is true that LMS has become the tier-1 IT infrastructure that Australian Universities will rely upon. They equal the LMS to email system. They are happy to find a solution for the 80% of the academics and let that innovative 20% to find their own ways.

Content management system was considered together with the LMS. One of the University chose one because the other vendor configured their system wrongly during pilot. It seems that if configured correctly, the missing vendor can provide exactly the same feature. Because the CMS and LMS was offered for testing from two different servers, some of the functions they wanted failed. If the vendor has run the two systems on the same machine, it would have met all the requirement and remain a sound contender!

I saw a fundamental fault in the architecture if this is the case. I know where the problem is and I have a solution. But I am not disclosing this here. If any LMS/CMS vendor is reading my post, expect to pay good money for me to reveal this. :-) OK, some nice words and good food may do the trick too.

I like to add to the complex mix of factors scalability and fault-tolerant. Scalability can be obtained vertically and/or horizontally.

By a vertical scalability - I mean obtaining more throughput using more powerful hardware. Typically, if a system is not providing sufficent throughput, the institute replaces the hardware with a faster and bigger one. This is not a bad approach given the continuous Moore's law of increasing computing power per unit cost. But this is a wasteful approach. In many cases, the decommissioned hardware is used to serve second tier services, or not as demanding service.

Another scalability is horizontal scalability, or Google scalability. Instead of a single (or a few) powerful hardware, Google uses thousands off-the-shelf PC running in parallel. If more capacity is needed, Google adds more PC. (One of the problem facing Google is the tremendous amount of power required to run this army of PCs for the same computational throughput. I am not sure about this remark, but I inclined to believe that it is true!)

Inherent to this scalability issue is fault tolerant solutions. Redundancy is the best approach to provide fault tolerant to critical systems. Running multiple expensive big server in parallel just to provide fault tolerant is expensive. On the other hand, off-the-shelf PC are inherently error prone, but they are relatively easy and cheap to replace.

I am more in favour of Google's approach.

This implies a different approach to software architecture in building the application. Stating the obvious again, developing an application to run on many nodes and expecting any node to fail any time, but continuous to provide a non-stop service is quite different to developing the same application which based on a single never-to-fail computing node.

Ok, I cannot avoid coming back to Fablusi. For Fablusi, I have put into the software architecture the capability to run in parallel an army of PC as servers. First, the computational intensive part is shifted to the client side. The server is basically a huge storage and serve whatever required by the client. Second, each connection to the server is "stateless" meaning that any responding server would have sufficient information from the client request to execute the needed operation. My hope for Fablusi's future is a single unified platform to serve ONE single pedagogy to anyone connected to the Internet - the "Google" of role play. I understand that role play simulation is just ONE of many powerful pedagogies. I just hope that it serves the purpose we built it and serves it well.

The problem I mentioned earlier about the integration between LMS and CMS is very real and faces Fablusi too. If Fablusi is to integrate with other pedagogical approaches in a course, it must be able to function across server boundaries and domain boundaries. The solution has already been developed and in the public domain. It is just a matter of identifying the problem and applying the solution.

No comments: