Tuesday, October 06, 2009

Developing Libraries - Give Minimum Privileges


A library should be developed in such a way that, irrespective of who is using it, how they are using it, it should have the same behavior. The consistency of the data/contract should be always maintained.

For example, if a library is developed for reading/writing some data in the given format, then it should not allow any user to do anything more than that. If the main API is calling two public classes one for validation, another one for reading/writing data, then this library can be misused. User can directly use the public class that is used for reading/writing data, and can read/write by escaping all the validations. That can lead to many inconsistencies in the data. For that purpose, all other classes/methods should be given minimum privileges, so that other libraries cannot access those classes/methods.

Let's suppose, if the developer of the library changes one internal method from private to public, then potentially, it could be used by many developers. Writing comments, and clearly mentioning in release notes will not stop the developers from using that public method. If that method is not doing certain validations and doing the processing, then it may lead to an inconsistent state of the application/data. If the code is sufficiently complex, then debugging would become very difficult. Most of the time, developers debug in such a way that, there was some problem in their code, and they don't try to debug the underlying libraries. If the inconsistency was caused by the library, then user would spend significant time for debugging.

Few non-technical people say that, developers should read the documentation and implement accordingly. They should understand that, if a developer wastes significant time because of a problem in a library, which can be avoided in the first place, then they will never support that library again.

If you are supposed to expose an API for certain functionality, and if you decided to develop two libraries, where first one is exposed to the user, and second one is used by the first library, then understand that you are doing wrong (or may be a blunder), unless your requirement is one of the rare scenarios which occurs less than 0.01% of the cases.

If you develop two libraries, and the first library uses second one, then most of the time, the following may happen.

Since, there are two different libraries, both would have few public APIs, and any user can call those, unless some convoluted security restriction is applied. But, if we apply that kind of security restriction, the application would become very unclean and unmaintainable.

If you are doing something in the first library, and then calling the second library to finish it, then it is possible that user can call the second library directly and finish the work without doing the work done in the first library. If the first library contains only validations or similar operations, then the user will not see any problem initially, but later on the entire application would be in inconsistent state.

If you are doing the final thing in the first library, and calling the second library for validations before doing the final operation, then it is possible that, they can replace the second library altogether, and may do everything without the validations. That also causes the application to be in inconsistent state.

If you implement few operations completely in the first library, and for other operations, if you are calling the second library, then, you should think why you need to do like that. Instead, you should separate them completely, and say that, if user wants to do first set of operations, then use first library, otherwise, use second library.

In another case, your first library may be just a wrapper, and it just calls the APIs in the second library. Then, there is no use of the first library, and it can be removed very well.

If first library is used just for business logic, and is not aware of database, and second library is used for db operations, and if the operations are complex, then there may be disadvantage in doing that. If there is an API in the first library, and if it is not aware of the DB, then it may have to do all the handling of synchronization, locking, or atomicity in that layer itself. But, if it is moved to DB layer, then the database can handle all those, and developer does not need to them.

If you are using two libraries for essentially a single purpose, then you should think many times before going ahead with that. Of course, this is not applicable, if both the libraries have different security modules. For example, if you are exposing a webservice, and using another library in that, then you don't need to worry about that library, because user cannot call it directly. It would be applied only to the libraries where the security context is same.

Having said that, I can see one very good advantage with having multiple libraries. We can write extra-ordinary design docs with many flowcharts, components and class diagrams. Those who do not write code full time would be overwhelmed with that, and they appreciate it very much, and you may get a promotion immediately.

No comments:

Post a Comment