The Brazilian National Healthcare System has been called the largest Enterprise Java application ever built, with over 2M lines of code, and a domain model of 350 classes. The application models all of the domain concepts one could imagine in a country-wide health care system and is bringing a level of automation that is creating enormous value for the public healthcare system as well as for the people of Brazil. This case study, the only one of it's kind, takes a detailed look at the architecture, interesting solutions, lessons learned, and future directions for the project.
Problem Domain
Brasil is one of the few nations in the world with a completely free public healthcare system. Like any public service of that magnitude, many operational problems exist. Parts of the system were largely paper based and very little integration was available between existing IT systems between the government and local health care providers. For example:
- No scheduling system existed that could allow health care providers to book patients into appointments with each other. For example, a patient requiring a visit to a heart specialist would need to find an available specialist on their own, often by waiting in long lines at multiple specialists and booking duplicate appointments until an early enough opening could be found.
- Lack of information about patient history such as history past operations or medical conditions at any point of care.
- The same patient was often recorded multiple times in a number of separate databases.
- Inability for the government to do any medical resource capacity planning or react to shortages since important statistics such as births, deaths, medical operations and disease diagnoses were stored in un-integrated paper-based systems.
- Inability to detect and prevent abuse of the medical system since IT systems that modeled health legislation and policy was not integrated with point of care systems.
- Many local doctors had no information systems at all and patients were required to wait all day long due to a lack of scheduling.
With these problems in mind, the building of a health care automation system called Siga Saude was commissioned. The system is designed to handle all aspects one could imagine in a public healthcare information system including scheduling and inventory management of doctors and equipment, billing, disease tracking, reporting/auditing, regulatory compliance, and security access control.
The system was first deployed to automate all health care units in the city of Sao Paulo, the largest city in Brasil, and the 4th largest in the world with over 20 million people. Today the application is in production in Sao Paulo and 20 other cities and work is in progress to expand the system to other cities in Brasil, as well as to other countries also interested in automating their national healthcare system, such as the Portuguese speaking countries of Angola and Mozambique.
Solution Overview
The application is based on EJB 2.1 + Struts and utilizes a well defined layered architecture using established EJB design patterns including data transfer objects, session facade, service locator, and business delegates. Development was done in Eclipse, testing and deployment done on the JBoss application server. A rules engine (Drools) was used in certain areas, and the application is currently deployed non-clustered on a Dual Xeon 3.1 server with 4 gigs RAM, on Linux, running JBoss 3.2.7.
From conception through to requirements and development, over 150 people were involved in defining what the system should do, including doctors and health specialists. This system was broken down by the development team into the following modules:
The line of code counts in the module above include lines in the services layer and domain model specific to those modules. In addition, there are 100K lines of code that is shared across modules in the services layer, as well as 570K shared lines of code in the domain model.While the sheer number of lines of code may be daunting, actually 58% of the line of code count consists of generated code, as will be discussed in the drill down on the use of annotations below.
Each of the business models above consist of JSP pages, Struts actions, and business delegates that call into the services layer. The services layer consists of EJB 2.1 session beans with generated session facades. The Domain model consists of pojo's with annotations used to generate entity beans. The next drill down will go into detail on the Scheduling system from url to sql. The over all layering in this module is consistent with the rest, and should impart a good understanding of how the system is built.
Drill down: Schedule an appointment with a specialist usecase
One of the most important use cases in the system, both from a business and technical perspective is the ability for a receptionist at a small local doctors office to schedule an appointment for a patient with a specialist. The use case is implemented within the Scheduling module but touches many of the other modules in the system.The scheduling appointment use case looks like this:
First, the receptionist will look for free slots in the schedule. They can post queries for an specific doctor, for a specialty, for a type of medical equipment, for an specific medical procedure, etc (click here to see the input screen), and see "getSlots" in the sequence diagram below. When they find a free slot (chooseSlot() on the sequence diagram), they will locate the profile of the patient using the National Health Card module, either by scanning the barcode on the patient's health card or looking them up by name, birth date or any other contact info if they forgot the card (see searchPatient(). Then, on the same screen they will enter the type of procedure needed (see second input screen, see saveAppointment() on the sequence diagram). If the patient needs a procedure that is not available on the clinic, either because there are no free slots or because the clinic does not provide that type of service, the receptionist or the doctor can look for free slots in other clinics.
This sequence diagram shows how a typical use case is executed in the system. First, a Struts Action class (ScheduleAction) process the user requests and deals with the presentation logic. When data needs to be retrieved or a business rule needs to be invoked, the Action class calls methods in the POJO object that implements the Business Delegate design pattern (ScheduleCF). This class is automatically generated from the Session Façade class (ScheduleSF) and makes the web layer independent from the technology used for the services layer. The Session Beans that implement the Session Facçade design pattern (ScheduleSF) delegates the execution of the business rules to other classes. For example, the class AppointmentService implements the methods lockSlot() and searchForSlots(). When the use case needs to persist data, the setAppointmentVO() method is called on an EntityBean. Queries are not performed using Entity Bean mechanisms, but using an optimized query component called Searcher.
The Session Façade objects are responsible for providing transaction services, database connections and security services to the rest of the application. The service objects are POJOs, so they can't access Entity Beans directly from the web layer, so a session bean layer was needed.
When a slot is locked, a timestamp and a lock id are stored in the slots database table. Both these data are stored in a SELECT FOR UPDATE command, to avoid race conditions. The timestamp is used to determine if a lock has expired or not. The lock id is used to avoid conditions where an object tries to save an expired lock. When the object that requested the lock decides to save the appointment, it compares the lock id received with the current lock id stored in the table. If they are different, the new information can not be saved. This way, if the lock expired between requesting the lock and saving the appointment, the application can detect this and warn the user. This use case implements a logic lock. Neither database lock nor EJB locks are held for long times, so the business rules can be better implemented.
The (simplified) class diagram for the domain objects involved in this usecase is:
Drill down: Use of Annotations for Code Generation
When the project started, XDoclet 1 was used to generate deployment descriptors, value objects, queries, session façades, Struts forms and actions, data validations and local and remote EJB interfaces. Code generation was a key strategy in the project, since the project short time frame asked for techniques that would give more productivity for developers and decrease the number of bugs. Thus, the XDoclet original templates were changed to generate "expert's code".
The XDoclet approach worked well for the first 10 months of the project and helped the developers to achieve the productivity they needed. After a while, as the code increased, the generation time also increased to the point where the productivity was been seriously affected by the generation time. The team decided then to move to another code generation strategy using Java 5 Annotations.
Several XDoclet drawbacks could be solved using the Annotations strategy:
- Allows incremental generations of files like deployment descriptors: with XDoclet, when a class changes, it reads all the classes again to generate a new deployment descriptor. Using Annotations, it is possible to generate only the part of the deployment descriptor that was affected by the classes changed. As most of time the developers will change only a few classes, this saves a lot of time.
- XDoclet 1 requires access to subcomponents source code at generation time, increasing coupling and build complexity. All the developers needed to have access to the whole source code, and not only to compiled jars, as it would be expected. With annotations, this can avoided.
- Annotations can be processed in runtime, so sometimes it pays off to use this feature instead of generating code.
The team developed processors for EJB 3.0 annotations and for other custom annotations that would generate a code similar to the one generated by XDoclet. Although EJB 3 annotations were used, the code generated was EJB 2.1 complient.
The templates used by the Annotation processors where Velocity templates. The biggest challenge when moving from XDoclet to writing your own templates was to build a set of templates and helper classes as comprehensive as XDoclet 1 already has.
Using Annotations, the generation time decreased a lot. Using XDoclet, if one class was changed it took 1 minute and 50 seconds to geneate code from 400 classes. Using APT, the same process took 10 seconds.
Drill down: How a Rules Engine simplified business logic
Some parts of the system have to deal with business rules that change over time, or from city to city, due to government legislation. For example, if a clinic wants to provide X-Ray services, it has to be compliant to several rules, like having an X-Ray machine, have an authorized radiologist, etc. Since these rules change often, it is better to leave them outside the code, so they can be changed without changing the code.
In SIGA-Saúde, this problem was solved using the Drools rules engine, that implements JSR-94 - Java Rule Engine API. The rules processing is done by a SIGA-Saúde component called Decision. This component works with a rules set and a working memory. The rules set works with a set of messages, where each rule has one message that is described by a class that implements the br.com.vidatis.common.decision.message.RuleMessage interface. When a rule or a subset of rules is satisfied, their related messages are removed from que queue. This way, it is possible to track the rules processing. If there are no messages in the queue at the end, all the rules were satisfied. A rule can trigger several different actions, including a new set of rules. The rules are described in an XML file and processed by the Drools engine.
Example of the XML file:
<rule name="if_clinic_code_then_checksum_digit_valid">
<parameter identifier="clinic">
<java:class>br.atech.smssp.domain.clinic.vo.ClinicVO</java:class>
</parameter>
<java:condition>clinic.getCode() != null</java:condition>
<java:condition>!clinic.getCode().equals("")</java:condition>
<java:consequence>
//clinic code should be mandatory and valid
if(br.com.vidatis.common.decision.rule.CodeRule.isValidCone(clinic.getCode())){
ruleMessage.markRuleAsValid("if_clinic_code_then_checksum_digit_valid");
}
</java:consequence>
</rule>
<rule name="if_maintainer_code_then_checksum_digit_valid">
<parameter identifier="clinic">
<java:class>br.atech.smssp.domain.clinic.vo.ClinicVO</java:class>
</parameter>
<java:condition>clinic.getMaintainerCode() != null</java:condition>
<java:condition>!clinic.getMaintainerCode().equals("")</java:condition>
<java:consequence>
//Maintainer code is optional, but should be valid if it is informed.
if(br.com.vidatis.common.decision.rule.CodeRule.isValidCode(clinic.getMaintainerCode())){
ruleMessage.markRuleAsValid("if_maintainer_code_then_checksum_digit_valid");
}
</java:consequence>
</rule>
<rule name="if_maintainer_code_then_checksum_digit_valid_OPTIONAL_EMPTY">
<parameter identifier="clinic">
<java:class>br.atech.smssp.domain.clinic.vo.ClinicVO</java:class>
</parameter>
<java:condition>clinic.getMaintainerCode() != null</java:condition>
<java:condition>clinic.getMaintainerCode().equals("")</java:condition>
<java:consequence>
//Maintainer code is optional and can be empty
ruleMessage.markRuleAsValid("if_maintainer_code_then_checksum_digit_valid");
</java:consequence>
</rule>
<rule name="if_maintainer_code_then_checksum_digit_valid_OPTIONAL_NULL">
<parameter identifier="clinic">
<java:class>br.atech.smssp.domain.clinic.vo.ClinicVO</java:class>
</parameter>
<java:condition>clinic.getMaintainerCode() == null</java:condition>
<java:consequence>
//Maintainer code is optional and can be NULL
ruleMessage.markRuleAsValid("if_maintainer_code_then_checksum_digit_valid");
</java:consequence>
</rule>
In this example, when data about a clinic is going to be saved, the service class will call Drools rule engine and ask for applying the validation rules. If all the rules passed, the data can be safely saved. These validation rules change as government regulation changes. Changes on the rules are frequent and thus it is important to separate validation rules from the code, so maintenance can be done without restarting the application.
Lessons Learned
Code generation was a key success factor on this project. It gave more productivity to the developers and made the code homogenous. Having an homogeneous code is very important when you have 50 developers working in independent teams, all of them sharing the same basic components. Using XDoclet, however, after a while, had an unwanted side effect, since the generation time took too long. Moving to annotations allowed to decrease generation time and still keep the benefits of code generation.
Good communication is essential. In a huge project as this one, there are different "tribes" that sometimes seem to speak different languages. Making them to understand each other's point of view is a hard and sometimes frustrating task. In particular, communication among these tribes is troublesome:
- Developers vs. Architects - You have to eat your own dog food. When architects participate actively of the development activities they can propose better solutions and increase productivity. This also helps to increase the "level of hapiness" of the team, what influences the productivity also.
- Requirements analysts/customer vs. developers/architects - fails in the communication of these two teams often cause delays and frustration. A good specification is essential, but understand the motivations of both groups and make them act as a team can not be solved only with a good use case specification.
- Extraordinary developers vs. other extraordinary developers - the project had an extraordinary team of developers, each of them with the mind full of good ideas to make the software. In the short time frame the team had, however, it was impossible to test all the revolutionary ideas that poped up everyday. This caused some frustration on the team. In a project like this, it is important to keep the goal well defined and make everyone understand why each decision was made. One thing that helped on this aspect was that the team had periodically refactoring weeks. During these weeks, the team discussed the problems, what could be changed to solve them and refactored the system during that time. A week devoted to refactoring, without any new code development, to fix your code helps to use the new ideas and improve productivity.
If you have a well defined architecture, with rules, code generation, design patterns, components, it is easier to make small or medium refactorings on the software. Big refactorings are always hard. However, strict rules also make harder to implement that 20% of the application where the architecture does not apply very well. It is important to know how far you should go in terms of enforcing the architecture standards. Sometimes, during the project, the developers adapted the solution to the architecture when this was not the best way to do it. On the other side, the parts of the system where the rules were not very strict, like in the user interface, each developer chose their own solution, and this had some undesired consequences like bad quality in parts of the system and difficulties to maintain the code.
In a huge application, when you have lots of dependencies and code, the development cycle tends to become long. Sometimes, adding a simple field to a screen means to change many parts of the system, what takes time. This is frustrating both to developers and to customers. The tools become too heavy and everything seems to take more time than it should. The application was broken in several components, but this is not enough to solve the problem completely. Although J2EE has advantages, it also brings more complexity to the code and this has a price in terms of productivity, which is tough to explain to management and customers.
When the team started this project, many people said that it was impossible to do it in that timeframe. Lots of frustrated experiences of building a public healthcare information system had already happened in other countries. So, a last lesson learned from this experience: when people say something is impossible, don't let that stop you of making it possible.
Future Directions
Enabling testing outside the container
It really bothers the team that they can't do testing outside of the container. The reason they couldn't do it initially was because they were forced to start with some legacy code. The Government rules and regulations module was a massive EJB 1.0 system. The new parts of the application are increasingly POJO-based, with code generation being used to produce all the entity bean and other container artifacts. However, deploying to the container, even with an incremental change takes more than the developers are willing to accept.
Moving to a POJO based architecture
The current system clearly has a lot of container dependencies and EJB plumbing code (even though most of that is hidden by generated business delegates and session facades). However, the architecture they are planning to move towards is completely POJO based, including business rules. The team would like to refactor all of their session beans into one single session bean interceptor, which will then delegate calls to the services layer. The purpose of the session bean interceptor is to easily provide middleware services such as transactions and thread management. They've thought of using Spring but based on their existing architecture they change was deemed to expensive. Since their existing session beans are all generated, it is very easy to change the code generation logic to go through the session bean interceptor instead of move off of EJB entirely. Another factor that makes this refactoring possible is the existence of a business delegate layer which will hide these massive changes from the web tier.
Why don't they rid themselves of EJB altogether? The reason is because eventually they will be required to be able to expose their application to other groups in the health ministry which will need to access the systems, and they were planning on simply handing the people (from the city of sao paulo) the business delegates which they can begin using. Since the business delegates hide all aspects of Remoting, and they know that their partners will be working off of Java, the integration is expected to be fairly painless.
How will they account for different transaction types? They will have different method with different transaction demarcations and have a mapping of target methods to transaction types so they know which call method to execute. Incidentally, this session bean interceptor and transaction strategy is described fairly similarly in Floyd Marinescu's EJB Design Patterns book. :)
Finally, eventually they would so like to get rid of entity beans, perhaps even just persist value objects directly with Hibernate. The biggest cost they are facing with this potential change is testing. They have to test things thoroughly, and unit tests simply can't capture everything.
AJAX to simplify web UI's
AJAX was identified as a tool to simplify the UI workflows for their non-technical end users. For example, on the scheduling use case, auto-completion could aid in typing in doctors names, procedure names, specialty names. There is no reason to call a popup and do a query, then pick from a big list. Also, when working on big long forms, it would be better to use AJAX to persist the values of form fields into the HTTPSession step by step, just incase someone clicks off their page by mistake or loses their work.
AJAX has already been deployed recently. AJAX has been used to navigate in a huge procedures tree, while retrieving the elements as needed. This enabled us to deliver a much better interface to the user and decrease data entry time. Click here to see the screenshot.