BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Debate: Why are most large-scale websites not written in Java?

Debate: Why are most large-scale websites not written in Java?

This item in japanese

Nati Shalom of GigaSpaces recently asked why most large-scale websites were written in languages other than Java. This question touched off a large debate in the Java community, and InfoQ took the opportunity to learn more about the major viewpoints surrounding this issue.

In his post, Shalom noted that many of the sites that he knew of used a LAMP (Linux, Apache, MySQL, PHP/Perl) stack, and that several have developed custom filesystems like Google's GFS or utilized caches like memcached. Shalom noted similarities in the scalability solutions developed for both large-scale web applications and large-scale financial applications:

On the Data Tier we see the following:
  1. Adding a caching layer to take advantage of memory resources availability and reduce I/O overhead
  2. Moving from a database-centric approach to partitioning, aka shards
On the Business Logic Tier:
  1. Adding parallelization semantics to the application tier (e.g., MapReduce)
  2. Moving to scale-out application models to achieve linear scalability
  3. Moving away from the classic two-phase commit and XA for transaction processing  (See: Lessons from Pat Helland: Life Beyond Distributed Transactions)

Shalom then questioned how these similar solutions could have such different application stacks. One possible reason, which Shalom noted, was put forward by Todd Hoff - the LAMP stack is both powerful and free, and Java is used but as an ancillary component rather than as the core.

Some other opinions:

  • Justin Sher was quick to point out that eBay, GMail, Amazon, hi5.com and Google AdWords are built on top of Java
  • Shane Isbell pointed to cultural differences, questioning whether the stereotypical web developer is more interested in social networking sites and 'eye candy' than the stereotypical Java developer, and also commented that financial companies had greater budgets and tended to scale with hardware, whereas web companies tended to scale with software.
  • Another person suggested that the prevalence of Java solutions in financial applications had to do with partnerships between large Java EE vendors and financial institutions
  • Angelo Andreetto, who referred to several years of experience with financial companies, believes that a conservative approach to potential risk leads to the selection of Java-based solutions over heterogeneous software stacks
  • Someone else commented that the consequences of downtime for financial institutions were generally larger than for web companies
  • George Coller said that the question was mis-stated, and that the question should really be why isn't Java EE used more

Mickey Ohayon of GigaSpaces had a more detailed response:

In a technical perspective:
  • developing in Php / Perl is very fast and simple whereas JEE is more complex
  • historically speaking the knowledge, hosting services and developers are more available
  • LAMP proved to be stable and common whereas JEE was more of an infrastructure
  • JEE requires application servers that sometimes are overkill for a web system
  • The light web languages (Php/Perl) are more flexible to changes in the short run (as part of poor architecture that is based on Non-MVC, of course in the long run the cost of any change is dramatically higher)
  • The deployment and testing of java application is far slower and requires relatively strong machines
In financial perspective
  • JEE developers are far more expensive than Perl / Php
  • The learning curve and time to market are longer
  • Hosting of JEE application servers is more expensive

Jilles Van Gurp of Nokia commented that Java EE is optimized for the enterprise domain, which tends to have a different set of needs different than a large-scale consumer-oriented website:

These websites have relatively simple data base structures; relaxed requirements for things like transactions and persistence layers (mysql + non-transactional & ACID backend is good enough in most cases); virtually no requirements for heavy duty web service stacks; etc. Basically all the stuff J2EE is excellent for is just mostly overkill for implementing consumer oriented websites. You don't need the fancy IDEs; uber-flexible messaging buses; outrageously complicated transactional logic; etc.

Instead the focus is on extreme scalability; memory usage; cpu usage; caching; etc. Those things can be addressed with off the shelf components like squid, apache, distributed linux filesystems etc. They can also be addressed with Java components too but it requires that you have some J2EE experts around to integrate them. These are not exactly easy to recruit due to current scarcity on the job market and tendency of these people to end up in extremely well payed enterprise type jobs.

Van Gurp also believes that Java is well positioned for the future:

Finally, I think all this is changing. Running the Java implementation of ruby or php can give a nice security, performance, scalability and managability boost to your php or rails application. You'd be a fool not to try this if you are operating large scale deployments of these systems. This is still relatively unknown to php and ruby developers and quite many simply don't care about performance enough to do anything about it, instead preferring to invest in hardware. But once they make the shift to deploying on php or ruby on Java application servers, they'll discover that there is a world of additional components that can further enhance their applications. Arguably Google's web development tool chain (partially open sourced) is the state of the art in extremely large scale & rapid protyping web development. And writing the application logic is done 100% in Java from the web developer point of view. To the best of my knowledge, Google has no large scale deployment of php or similar architectures in their web UI layer (I'd be interested to learn if this is not true).

After watching the debate unfold, Shalom described his agreement with Michael O'Keefe's opinion, which encompassed several of the viewpoints described above. Shalom also mentioned that there appeared to be a convergence trend in the market, with tools such as Spring on Rails and Caucho's Java-based PHP implementation, and that the challenge of developing a scalable site would bring LAMP stacks and Java closer together in the future.

What do you think?

Rate this Article

Adoption
Style

BT