Mar 30, 2010

The Problem of Incomplete Javadocs

The Problem of Incomplete Javadocs

Good and comprehensive documentation is crucial for the success of open source software. But creating such documentation takes time and energy, is boring and has almost no immediate rewards. Consequently, documentation of open source frameworks is (too) often incomplete or outdated.

However, whenever there are users of a framework there is example code that uses the framework's API. And if there is example code, the question arises whether information about how to use the framework's API can be extracted directly from example code.

We think so, and thus started to study how documentation could be completed by automatically mined documentation. So far we concentrated on mining documentation required fore developers that plan to extend a given baseclass and created what we called "subclassing directives" from code.

In a nutshell, subclassing directives are generalizations of frequently made observations in code like "Subclasses of Wizard always override its method addPages()" or "Reimplementors of Dialog.createContents() may call its super implementation." etc. Our findings are summarized in our paper "Mining Subclassing Directives" published on the 7th Working Conference on Mining Software Repositories 2010 which takes place in May 2010. The Extended Javadoc View presented here is a result of this research work.

This post describes the basic concepts behind the Extended Javadoc View, provides some examples of how mined documentation could be integrated non-intrusively into Eclipse, and how others may extend the view to provide their own documentation providers. Please note that this project is still work in progress. That means that there is much more work ongoing (see Sketchbook Page about the proposal) and appreciates your feedback.

The Extended Javadoc View

The extended Javadoc View is essentially an aggregator of different information sources for a single code element like a class, method, field or parameter. It is designed as a replacement for the existing Eclipse Javadoc. It provides basically the same functionality as the Eclipse Javadoc View. Let's walk through the existing documentation providers.

Javadoc Tab

The screenshot below shows the view displaying the standard Javadoc information of the JFace Dialog class.

But replacing on view with another one is not a big deal. The interesting part comes with the other tabs in the view: Subclassing Directives and Subclassing Patterns. These tabs contain mined information about how developers typically extended the selected code element. Let’s look on the Subclassing tab in more detail now.

Subclassing Directives Tab

As said above, subclassing directives are generalizations of frequently made observations in example code like "Subclasses of Wizard always override its method addPages()" or "Reimplementors of Dialog.createContents() may call its super implementation". The screenshots below give two examples for these mined directives are presented to a user.
The first screenshot gives a quick summary which methods are typically overridden by subclasses of JFace Wizard. The second screenshot shows a detailed look on Wizard's addPages() method and informs a developer which methods are frequently called within the control-flow of addPages(), namely, Wizard.addPage() and Wizard.addPages(). For both methods the percentage is given how frequently these methods actually have been called to allow developers to decide whether these methods are relevant for him and his task at hand or not.

Such subclassing directives are currently mined for almost all Eclipse 3.5 classes were extensions of these classes could be found in our example code base.
However, displaying which methods to override and to call is just one thing you can do with an extended documentation provider. Let's look on the Subclassing Patterns tab in more detail.

Subclassing Patterns Tab

Subclassing patterns try to group observed extensions of a base class into typical extension patterns, i.e., they cluster subclasses by similarity to find patterns in data. For illustration of the results, look on the following screenshots below. The first picture shows the frequent subclassing patterns found for the JFace ViewerComparator class. It states that typically either the method ViewerComparator.compare() is overridden or ViewerComparator.category() but typically not both at the same time (even if possible). It also states that extenders typically stick with the first pattern (~82%) and only in 19% follow pattern two.

Also for the JFace Dialog class some patterns can be found. Here developers typically overwrote the createDialogArea() method and often the methods okPressed and configureShell. However, also other patterns exist that directly respond to buttonPressed events.

For JFace Wizard two patterns can be found: The standard pattern (overriding performFinish and addPages) and a mixture of several other ways of extending Wizard.


Venturing a look at the Future

To my opinion, the current Extended Javadoc View is an interesting approach that shows what can be found in client code. But much more things can be found in code that might be used to enrich existing documentation. But to make this come true much more aspects need to be considered. We are currently working on a draft for a task-oriented, crowd-sourced API documentation which grounds (at least partially) on mined documenation. How do you feel about that? Does this sound interesting for Eclipse? We appreciate your comments!

10 comments:

  1. Nice view, however, "should not override dispose" is clearly a wrong suggestion. Whenever they allocated some resources in the wizard the *have to* dispose them.

    ReplyDelete
  2. You're right. The initial idea was to use RFC 2119 terms MUST, SHOULD, MAY, SHOULD NOT, and MUST NOT.

    Then SHOULD NOT means "that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label."

    But your example shows that SHOULD NOT is misleading and should be replaced by something more meaningful like "in special case".

    Thanks for pointing this out :)

    ReplyDelete
  3. Very nice view ! Thanks for the presentation and the link.

    "Does this sound interesting for Eclipse?"

    Of course!

    I'm working with eclipse 3.3 so i can't test. I'm wondering if the view presents the informations quickly? I read that a database is needed. Is there any performance issues?

    ReplyDelete
  4. I added a little section "How it Works Under the Hood" which hopefully answers your performance and implementation question:

    http://code.google.com/p/code-recommenders/wiki/ExtendedJavadoc#How_it_Works_Under_the_Hood


    Eclipse 3.3... I wonder how many people are still using 3.3 and 3.4. Are there any statistics about that somewhere? If there is a significant amount of people still using previous versions of Eclipse I should reconsider my packaging and make things available for 3.3 too?

    Thanks.

    ReplyDelete
  5. Very interesting! I think generating documentation is a great approach in general to solve the documentation problem (being outdated and all). Your approach of learning from usage patterns is particularly interesting and I could imagine it also being useful in other settings than what you describe (large frameworks), like for generating documentation from few canonical examples or tests.

    ReplyDelete
  6. Great view! Will use it in the future.

    ReplyDelete
  7. Nice work and screenshot
    Is it possible to separate the engine witch collect the statistics (patterns) for the UI?
    I was thinking for integrating with other IDE (Netbeans?)
    Thanks

    ReplyDelete
  8. Thierry,
    integrating with Netbeans sounds great! The analysis, however, is just a small part of the work. The setup – especially getting the code examples in an easy consumable way for the analysis – is the major challenge. Is there an equivalent to p2 update sites - like the one of Yoxos - for Netbeans?

    Anyway, currently we investigate concepts that allow everyone to contribute within his or her IDE directly to the view, i.e., there will be a light-weight client-side analysis engine that works on your projects in your workspace. And your help on developing Netbeans integrations or client-side engines in general is highly appreciated!

    If you like to start a larger discussion, just join the code-recommenders mailing list and discuss your ideas with us there: http://groups.google.com/group/code-recommenders

    All the best,
    Marcel

    ReplyDelete
  9. The problem with "must" and "should" is that they're prescriptive statements. Since the Subclassing Directives tab only describes actual usage tendencies, I'd suggest changing the terms to "always", "usually", "sometimes", "rarely" and "never".

    That way, when users are dealing with a special case or trying to remove an anti-pattern from a large code base, they won't be put off or misled by the prescriptive statements; and the rest of the time, they'll infer them.

    ReplyDelete
  10. @SeaHen, excellent point. Would you open an issue at https://bugs.eclipse.org/bugs/enter_bug.cgi?product=Recommenders to track progress on this?

    ReplyDelete