Jul 4, 2012

Code Recommenders 1.0 - Code Completion on Steroids for Eclipse Juno


With Eclipse Juno, Code Recommenders stripped off its incubation status and released v1.0.0 to the public! In this post, we give a quick overview on all the new shiny code completion engines we've built in the past months for Eclipse - and introduce those who haven't heard of Code Recommenders yet to the idea of mining source code and more to build intelligent IDEs. This article is a copy of the Java Tech Journal Article written for the Eclipse Juno Special Edition. Get a copy to read more about other great innovations in Eclipse 4.2!

Teams and technologies change. Embrace it.

Developers and CTOs know: frequently changing teams and the “liberal” use of the latest and greatest technologies make your burndown charts and project cost estimations burst just within a few weeks. To compensate for the brain-drain and to lower the entrance barrier, team leads plan a fair amount of their team’s time for documenting software, doing reviews, and working together in pairs in front of one screen to evenly spread the knowledge about how to use APIs in the entire team – and to newcomers in particular. If done right, this is the best a team lead can do to make the team effective in the long run. But Code Recommenders thinks there is more a team lead can do...

Leveraging the hidden gems in your source code

API designers typically have some expectations as to how developers should use their APIs, i.e. they expect their clients to call certain methods at certain points in time and in a particular order. The challenge to all newcomers is to learn about these implicit expectations and API usage rules fast enough to be of help to the team they joined.

Sure, the API documentation may contain the necessary information. Somewhere. But between you and me: How often do you read the API documentation of, say, JButton with its 381 methods to figure out how to use it? Don’t you rather use Google to find a code snippet that does what you need? Or, if Google can’t help because you are programming against an in-house library, don’t you prefer looking at your existing code base to see how your colleagues successfully used that API before? And sometimes you just scroll through the code completion proposal pop-up to see which proposals may sound best for what you are trying to achieve, right?

There is nothing bad about it; it’s just horribly ineffective and costly. And apparently API documentation as it is today is only of limited use for developers.

This is where Eclipse Code Recommenders comes in. Code Recommenders is an extension to Eclipse’s Java Development Tools that analyzes code of existing applications, extracts common patterns of how other developers have used and extended certain APIs before, and re-integrates this knowledge back into your IDE in form of (i) intelligent code completion, (ii) extended API documentation, (iii) sophisticated example code search, and even (iv) bug detection tools – all powered by the implicit knowledge of the programming masses. If you like, you can think of Code Recommenders as bringing the idea of Web 2.0 into your IDE – or as we call it sometimes: Code Recommenders is about creating the IDE 2.0.

The remainder of this article will give a short overview over Code Recommender’s completion engines coming up in Eclipse Juno – as part of the Eclipse Java Developer Package, the Eclipse RCP packaging. Alternatively, if you start with any other package, it can be installed from the Juno Release Train update site.

Recommenders’ Code Completion Engines
Code Recommenders 1.0 adds five new code completion engines to Eclipse:

  • Intelligent Call Completion
  • Intelligent Code Snippet Completion
  • Intelligent Overrides Completion
  • Call Chain Completion
  • Subwords Completion


Intelligent Call Completion

The Intelligent Call Completion engine probably illustrates the idea of Code Recommenders best. When dealing with Framework APIs, developers frequently have to deal with complex APIs. For illustration, consider the public API of javax.swing.JButton which consists of 381 (!) public methods. This is a huge API of which a developer typically only has to know a small subset. The remaining, say, 360 methods unnecessarily bloat the API (from a API user’s viewpoint) and thus increase the complexity and burden of learning and using this API.

All potential completions on JButton

This is where Code Recommenders’ Intelligent Call Completion comes in. It assists the developer by recommending only those methods that are actually relevant for the task at hand. For instance, given the fact a developer just created a text widget makes it obvious for Code Recommenders which methods that developer should want to use next – even if the developer doesn’t know it himself (yet):



Recommenders' intelligent Completion on SWT Text after calling 'new Text()'

As of this writing (1.0.0), Code Recommenders partially supports the Java Standard Library, namely the main packages under java.* and some packages under javax.*. As the recommendation models are generated from the Eclipse Juno Release Train code base only, packages like java.awt or javax.swing are not yet supported, as no data was available at generation time.


Intelligent Code Templates

Code templates are helpful when code is needed to iterate over an array of objects or when creating a getter for a property of a class. But code templates really shine when developers have to use APIs they are not familiar with. Code templates then serve as additional documentation that quickly shows how to use an API, and thus can save developers a lot of time that would otherwise be needed for reading API documentation.

Eclipse maintains more than 70 of such Java code templates ranging from simple loops up to complex API usage patterns like creating an SWT Button or Composite. Unfortunately, developing templates for API usage patterns is an extremely costly and tedious job and consequently only few templates on how to use complex APIs of, say, JFace, Eclipse UI or even the Java Standard Library exist.

This is where Code Recommenders comes in again. In the previous section we showed how to recommend single methods to invoke on an object. Code Recommenders’ Templates Completion takes this to the next level by recommending not only single methods but complete sets of methods:

Intelligent template completion on JDT ASTParser


As you probably have noticed in the example above, Recommenders' templates completion can be applied on existing usages but also works on type names as the example below shows:

Intelligent Template Completion on JFace TableViewer type name

 The final result after applying the template then looks as displayed below:
Resulting code snippet for JFace TableViewer


Warning: The generated template proposals do not reflect method execution ordering constraints, i.e. the order of the proposed method may have to be changed manually after insertion.


Intelligent Overrides Completion

Similar to recommending method calls, one can also recommend which methods a developer should typically override. This is what Recommenders’ Intelligent Overrides Completion does. There is much more to say about how classes can be extended, but we’ll save this for another article about Code Recommenders’ Extended Documentation Platform – a platform that extracts valuable (extension) patterns in code and enriches existing API documentation with these patterns.

Recommenders' intelligent overrides completion on JFace Dialog


Subwords Completion

Another noteworthy extension engine is Code Recommenders Subwords Completion. As an experienced Eclipse user you probably know JDT’s CamelCaseCompletion. This engine is nice but requires you to remember the exact uppercase letters of the completion proposal you want to insert.

Subwords makes this more convenient. The idea is simple enough: You do not have to type a name from the beginning to find a match in the content assist pop-up. It helps if a developer does not know if one has to “find” – or – “get” an element:
Recommenders' Subwords completion on JDT's CompilationUnit

And it’s even sophisticated enough to understand a rough shorthand e. g. dclr for declaration or combinations of words such as 'ty + dclr' , which finds all proposals containing the words 'type' + 'declarations':

Recommenders' Subwords completion on JDT's AST


Note: Subwords does not fall into the group of intelligent completion engines, i.e. it does not need any training data and thus works out of the box with any framework or API.


Also note that Subwords completion is a replacement of the JDT's standard Java content assist and cannot be used together with JDT's proposal computer. Visit Subwords preference page to enable Subwords for your installation.


See Deepak Azad's original blog post for more detailed description. Thanks for permission to reuse.


Chain Completion

The last engine I’d like to introduce is Recommenders’ Chain Completion. Sometimes you need to access objects that can be reached by invoking several method calls in a row – so-called call chains. Usually you have to find these call chains yourself by traversing the API call graph manually, and evaluating whether each potential chain may return an instance of the required type.

Code Recommenders’ call chain completion automates this for you. It quickly traverses the whole API call graph and finds all possible paths through the API that may return an appropriate object:

Recommenders' Chain completion on IStatusLineManager in a ViewPart


Note: Templates and Chain completion don't run on the default content assist list and deactivate themselves if they figure out you put them on the default content assist. You can put them on the second or third content assist list and press ctrl+space twice (or more) to get to the subsequent content assist lists.


What’s coming next?

The Code Recommenders completion engines we introduced in this post are just a teaser. There are many more exciting features coming like extended (mined) documentation platform, personalized code search engines, code snippets miner, stacktrace search engines, and many tools more. Keep an eye on this project which has exciting things in delivery. Promised.


Thoughts and new ideas for Code Recommenders?

We are curious to hear how you like these new completion engines and what features you'd like to see for the next release. The Recommenders forum is a good place to discuss new ideas - or Twitter for really short ones.

Mar 28, 2012

BOF on Eclipse Code Recommenders at ECON

I managed to get the last open space for a BOF about Code Recommenders. It will take place at 8:30-9:30pm in Reston Suites B. I planned to give a little broader introduction to how Code Recommenders work and which other tools we are working on but could not demo in the session.

I'm planning this for 20 minutes, which will leave a fair amount of time for discussion existing features, future plans, and new ideas.

Looking forward to meet you in Reston Suites B tonight.

P.S.: A slightly different but extended version of the slides of the Econ Code Recommenders talk is available here.

Thanks Eclipse Community!

The Code Recommenders team says thank you! to the Eclipse community for this shiny award:


Origin: Anne Jacko EclipseCon 2012 Flickr photo series

We are very proud on this award and we can promise that we won't stop pushing the limits of applying crowd-sourcing and machine learning inside our most favorite IDE to make developer's life easier. To all attendees of EclipseCon: If you haven't see Code Recommenders in action, it's time. Marcel Bruch demos Code Recommenders tomorrow Wednesday at 13:30 in the Grand Ballroom BC and gives a preview of what will be in Eclipse Juno.

Don't miss that event!


The Code Recommenders team, Hugin, and Munin say thank you! again







Feb 21, 2012

Eclipse Code Recommenders 0.5 released!


We are happy to announce our first official release (don't call it 'drop' anymore) of Eclipse Code Recommenders. After a silent v0.4 release drop, we drop, pardon, released Recommenders v0.5 today wiht a huge set of changes. Enjoy the release!


Bugzilla Summaries


New Feature Namespaces
We outgrew our initial naming scheme. For v0.5 we changed the namespace of our features and plug-ins completely. As result, using the update manager wont work as expected. You have to uninstall previous versions and install code recommenders from one of our update sites .

No Project Builders!
This is probably the most important change during this release. Earlier version of Code Recommenders required you to activate Recommenders on a per-project basis. On activation, a builder was added to the project that analyzed file changes on save - or whenever Eclipse decided to run the builders (workspace fresh etc.). The results of these analyses were stored the in a ${project.home}/.recommenders/data folder. With v0.5 this is past.
Code Recommenders now analyzes the all compilation units on-the-fly whenever code completion is triggered. Developers now can easily use Code Recommenders in any project without changing any project configuration file. You can configure the completion engine's behavior in "Preferences » Java » Editor » Content Assist » Advanced".

Important:
Please note that currently only Eclipse APIs like JFace, SWT, JDT, Workbench etc. are supported. Adding support for other frameworks is planned for v0.6 and v0.7.

Completion Engines solely based on JDT
Related to the previous change, we switched from the WALA bytecode analysis toolkit to Eclipse JDT as primary analysis tool for code completion. WALA will, however, still be the analysis toolkit of choice for sophisticated code analysis and data exports. There is yet no better analysis toolkit available that suites our needs.

Call Chain Completion solely based on JDT
With v0.5, Code Recommenders ships a new version of its chain completion engine completely based on JDT. Thanks to Marko Martin who contributed this to Code Recommenders. Since this code is a major rewrite of the existing engine, please report any issues you experience to bugzilla.

Extended Documentation Platform
Based on Stefan Henss' work in last year's Google Summer of Code project, this release ships with an updated version of Stefan's Extdoc View . Internally, we changed the code to be easily extensible by you. Check the Extended Documentation Platform ISV Documentation for more details on that.

Improved Subwords Completion
We've made a lot of small changes to subwords. The most significant one is that you can use it as replacement for the Java completion engine (go to Preferences»Java»Editor»Content Assist»Advanced to activate and deactivate Subwords and Java proposals engine).


[HEAD] For assignments, subwords completion now takes into account the variable name of the assignment.


The exact implementation of the matching and ranking strategy is subject to change. If you have ideas on how to improve the current strategy, look at the code and send your ideas to the forum .

Temporary Deactivation of Template Completion
In v0.2 we added a new template completion engine to Code Recommenders. Due to the large internal refactorings, this engine was left out of this release but will be added again in v0.6.

What else is in the pipe? Code examples search!
This time Code Recommenders ships without official code-search support. However, there is some work going on in this area. Previous updates offered first draft of code-search client in the experimental section of the update site. For v0.5 we removed this feature from the update sites as it's currently developed by Tobias Boehm, a Student at Darmstadt University of Technology as part of his Master thesis. For the curious, the sources can be found in Code Recommenders' Labs repository and the latest prototyp can be installed from this update site.

Just a collection of screenshots to see what Tobias is working on and what's coming soon...


Code Examples Search: Select a local variable in your code and the "code examples" extdoc provider will show you all code snippets that use a variable of the same type as selected and used/defined in a similar to your code.

Free-style code search query window: predefined queries are not good enough? Do query for methode, types, try-catch blocks that use an arbitrary number of certain types, methods or literals as you like... there are almost no limits.

Enjoy it but keep in mind it's an early prototype. For instance, it's currently consuming quite a lot disk space due to some verbose indexing settings etc. Please send all you comments and request regarding local code-search to the forum. Aaron, not all your queries can be expressed yet. But we have them in mind ;)


Enjoy this release!

Dec 2, 2011

How should code search work?

Hey all. As you probably all know, this is the Code Recommenders project blog we typically use to announce new features, releases or other noteworthy topics related to Code Recommenders. This blog post is a bit different than the others: It's our first "guest blog post" written by Tobias Boehm, a master student doing his master thesis in the scope of the Code Recommenders project.
This post is basically a brain dump how he thinks code search engines like Google Code Search, Krugle, or Koders *should* work and how we should be able to use them. He introduces an early prototype of a code search query language (which he will implement using Xtext) and a client tightly integrate into the Eclipse IDE. His work will be based on the previous code search engine and Eclipse client we already blogged about a few months ago here: "Why is Google Codesearch not google for code search?"


This blog post is a "heads-up! Your feedback is wanted!" post. So, please do not hesitate to ask tough questions or provide any other kind of feedback. All kind of feedback is appreciated and will help Tobias to catch idea bugs early! If you are interested in joining the work on code search engines, get in contact via the Code Recommenders forum. We are looking forward to your feedback. 


Thanks,
Marcel



How should code search work?
We all know that learning an API is hard. But we do it day by day by day...  When learning a new framework we often ask things like "Which classes should I extend?", "Which methods should I invoke on this object?", "How can I create an instance of this particular type?", "How does the code of others look like that is similar to mine?". Hopefully, some documentation is available that answers these kinds of questions. But how many frameworks do you know that have such excellent documentation? If you know a few: How many do you know which have not?

Let's assume that there is some documentation somewhere. Who wants to work through heaps of documents just to know how obtain an instance of the famous IStatusLineManager? In my opinion, the best possible documentation already exists. It's existing and tested code which is available in masses in code repositories where these API are actually used, the classes are extended, the methods called and the objects actually instantiated. The code is there, it just has to be found!

But searching for source code is still a tedious task. Although many search engines exists and even Google has a product targeted at code search - they are not too useful in certain situations. First it seems most of them treat source code the way web search engines treat websites they index - as plain text. While that might make sense for some code search use cases - it is not enough for most others. Moreover they are too generic to be useful. Most code search engines seem to identify the programming language the code is written in, yet they are not using the language-specific semantic that lies underneath.

And then there is "availability". For a developer to be able to search for source code this source code has to be indexed by the search engine. So the code available is always restricted to open source code publicly available in the web with all personal and company repositories being unused.

Lastly, there is no IDE integration. What that means is that every time we want to issue a query we are shifting focus away from our IDE to a website.

In this blog post, I describe may plans on how to implement a Code Search engine with Apache Lucene. I'll go through a set of sample queries and explain what get's indexed and how developers can query the index to solve common day-to-day tasks.

If you are interested in code search and are maybe seeking for good alternative to (almost closed) Google Code Search or want to build your own code search engine for your own company - please continue reading and don't hesitate to ask questions about it here or in the Code Recommenders forum.


Query
As said above, search query capabilities of todays code search engines are somewhat limited. Code Recommenders might come to rescue here. The heart of this prototype currently in development is a novel query language. This query language must be very simple to use. We want to create queries very easily. Yet it must be so powerful that we can express all the requests we might have to a code base. What might these requests be? Before we dig into the search criteria let's take a step back and think about what it is we would like to find. Are we interested in source files? Probably not. Java developers don't think in source files. At the bottom level we think in classes, methods and maybe even smaller blocks of code. Then these are probably the units of code we want to find. Now what questions might a developer have? She might for example want to find methods that have a certain name. The query will look like the following.

METHODS WHERE Name IS "set.*"

This query will return those methods with a name starting with "set". Let's say we are interested in methods that add something to a java.util.Set.

METHODS WHERE CalledMethods CONTAINS {java.util.Set.add}

By combining multiple search criteria and using negation the developer is able to refine the query to get exactly what she needs.

METHODS WHERE ReturnType IS {org.eclipse.jface.action.IStatusLineManager}
AND +IsPublic AND !IsStatic

A Query that will search for public, non-static methods that should return an instance of IStatusLineManager. When using the prefix "+" and "!" the developer can explicitly mark criteria as mandatory or non-occurring. If we omit the prefix the condition is optional and the results we get might not meet the criteria. In this particular example we would want the query to look like this.

METHODS WHERE ReturnType IS {+org.eclipse.jface.action.IStatusLineManager}
AND +IsPublic AND !IsStatic

This way we can be sure that the methods we find will return IStatusLineManager. What if we - for whatever reason - are interested in public methods that use an IStatusLineManager, are annotated with SuppressWarnings and that should be constructors? Here's the query for that.

METHODS WHERE IsConstructor
AND UsedTypes CONTAINS {+org.eclipse.jface.action.IStatusLineManager}
AND +IsPublic
AND ANNOTATED WITH {+java.lang.SuppressWarnings}

This is really just supposed to be an example of how detailed we can get. There are many more criteria available and they are not bound to just methods. Many of the criteria applicable to methods can be applied to classes too. We might as well search for classes with a certain name.

CLASSES WHERE Name IS "set.*"

With a more complex query for example we search for abstract classes that implement the interface ASTVisitor preferably using the type java.util.Set.

CLASSES WHERE +IsAbstract
AND ImplementedInterfaces CONTAINS {+org.eclipse.jdt.core.dom.ASTVisitor}
AND UsedTypes CONTAINS {java.util.Set}

Or how about classes that contain deprecated methods?

CLASSES WHERE
CONTAINS METHOD WERE ANNOTATED WITH {java.lang.Deprecated}

The list goes on. And we don't stop at classes and methods. Sometimes we would like to bring in more context information. For instance a typical question a developer sometimes asks is how other developers handled a certain exception. Did they close the stream afterwards or did others just log the exception?

CATCHBLOCKS WHERE CaughtType IS {+java.io.IOException}

What we can express here is a question many of us ask themselves over and over again. What have other developers done in my situation?

Repositories
The quality of the results is dependent mainly on the quality and the volume of the search index. The public index will consist of many open source types (http://eclipse.org/recommenders/documentation/completion/) that we can offer code examples from. A dilemma arises when we think about precious code that lies dormant in hundreds of types and thousands of methods in the developer's company repository. These sources are most likely not open source and hence can't be put into a public index. The solution is a private search index that is build and stored inside the company infrastructure. The query can then be performed on the public as well as the local search index.

IDE integration
Queries of this kind can then already be used easily from inside the Eclipse IDE. The editor assists with things such as the query grammar and resolving of types. The goal is to make it as efficient as possible to express a query that reflects what the developer searches for. While a query of this kind can easily be created by hand its full potential is still not exploited. In many use cases where the developer would like to find code examples, the query will consist of information that reflect the user's current code context. That might be the interfaces the current class implements, the overridden method we are in and the uninitialized type that we desperately need an instance for. Most of this information the query consists of are usually ones that the IDE could provide. So why shouldn't it? By abstracting further and bundling the search into a few easy to understand search types the query complexity is now hidden underneath plain proposals in the Eclipse content assist popup window. Let's consider the following situation.

IStatusLineManager slm = null;
|<^Space>

We just declared an IStatusLineManager and set it to null. One of the proposal would probably be a search for code that initializes this type in some way and might find code that for example creates an actual instance of IStatusLineManager or just a method that returns it. Do we now copy the whole method or even the whole class? Or do we just reuse an already existing method that we now knows exists? It's up to us the developer how to use the code examples that are found.

How does that sound. Like something that could make your life easier? Let us know.