Merce

Distributed applications

Spread the good app

  • Distributed applications are any applications where the code executes on multiple computers. If just the infrastructure components are installed on multiple computers, but all the application code executes on one computer, then it is generally not referred to as a distributed system. For instance, if there are three computers, acting as (i) Web server, (ii) application server, and (iii) database server, and all the code is on the application server, then this is not considered a distributed system, even though three computers are involved in serving each HTTP request. (Of course, if part of the business logic is in stored procedures on the database system, does this quality as a distributed system? Grey areas galore exist.)

    HPC (High Performance Compute) clusters are one example of distributed applications. So are most modern e-commerce systems, because they do parts of their processing on different computers. Some servers store data and process searches, some others handle payment processing at credit-card checkout, others handle the Web interface, etc.

  • Three million HTML pages Since we come from the Unix and TCP/IP background, we often look at distributed software designs whenever a single computer may be overloaded. One common situation where we use a distributed design is when there is a need to process very large amounts of data in batch mode. One customer approached us with a requirement where there were tax records of 3.3 million tax assessees in a few thousand text files and we needed to generate one HTML page per assessee with his tax details. The customer asked one team (from one of India's largest software companies) to estimate the time needed to create these 3.3 million files and they gave an estimate of about 12 months, with one server running 24x7. Their approach involved Java code and a relational database.

    We took the data, split it manually into a few separate chunks, and wrote Perl code to upload all data into Berkeley DB files. Our code then searched through the DB files and aggregated all the rows for each user, writing out this data into HTML. The random access speed and light overheads of Berkeley DB made it the right choice for this application. We used 3-4 desktop computers for the job and completed 3.3 million files in about three weeks, all computers running 24x7.

  • PRISM A more complex distributed system involved a high-performance cluster system which delivered a certain degree of fault tolerance and supercomputer-level performance. This was used for real-time risk analysis of traders and investors in a stock exchange, and was called PRISM. This was an industrial-strength HPC application with C code for all the core modules and MPI for low-latency communication.

  • Communication between the parts of a distributed application does not always require sophisticated message passing or high-speed network protocols. We have built distributed systems which use scp to push out batch files from one computer to another. A queue system with files as messages is often fast enough in systems where messages are passed infrequently, e.g. once a minute. It has the advantages of simplicity, debugging ease, fast prototyping, and reliability. We find it amusing when we see engineers recommending ActiveMQ or ZeroMQ for every distributed system design.

RELATED READING