Problem background
Today almost every software is built using existing frameworks/APIs. Most of framework libraries often provide a large number of classes and methods. Furthermore, libraries provided by different companies and different organizations follow different styles. Hence it’s always been a burden for the developers to learn new APIs for each new project that they work with. It’s a fact that the best documentation for a framework is the existing code itself. When working with new projects, the developers sometimes get stuck with a particular piece of code and even though they are certain that the other developers might have faced (and solved) the same problem previously, it is really hard to traverse through the entire source code to find out relevant pieces of code.
Full text search over source code isn’t really helping in these kind of situations as the target is to find out code snippets which have a particular set of structural characteristics. For an example a developer may need to search for a code snippet in which an object of particular data type (or several objects) is used and a particular method (or several methods) is called on that object.
A brief description about the Solution
By building an IDE extension which helps developers to easily find out code snippets which are in similar context in which the developer is currently involved in, it is able to increase the framework learning speed of the developer. Since full text search is not enough, the search engine should be able to capture the structural information of source code.
Furthermore it has to be fast enough to provide a comfortable experience to the developer. There should also be a good scoring system for the results so that the most important snippets come up in the list.
A brief description about available Code Search tools for Eclipse IDE
As discussed by (Böhm, 2012) , there are few tools available for Eclipse IDE which provide structural information based search over source code, namely Strathcona, Java Tools Language, Java Development Tools, JQuery ( not the JavaScript Library ) and CodeGenie. With all of them excluding Strathcona, developer has to manually query for data types, method signatures, etc. Some of them have disadvantages such as index is not updated frequently, index is hosted online and someone has to manually update the index, etc.
Considering the information being captured and the options been given for the search, JDT provides excellent tools for java source code searching. Its search index is saved on client side and updating the index happens upon saving a file. Below table represents the information captured by JDT Search Index. Source: (Böhm, 2012)
Type
|
Method
|
Field
|
Name
Modifiers
Implemented Types
Declaring Type
Type Arguments
Extended Types
Annotations
Fields written
Fields read
Used Types
Used Methods
|
Name
Modifiers
Declaring Type
Return Type
Parameter Types
Checked Exception Types
Annotations
Fields written
Fields read
Used Types
Used Methods
|
Name
Modifiers
Declaring Type
Type |
Advantages of using the proposed solution over available tools
The proposed solution will use an indexed storage on client side as same as JDT hence getting results will be fast. Comparing to JDT indexing, the proposed search engine will be able to capture a lot more structural information of source code such as Try/catch blocs, variable usage, etc. (Please refer to the topic “Prototype Implementation” to view a full list of information being captured).
The proposed solution will detect developer’s current context automatically and provide a list of similar code snippets in a condensed snippet form. So the search query will be created automatically rather than requiring the developer to manually execute the queries. This a similar kind of approach as used in Strathcona, but comparing to the information captured by Strathcona, this plugin is far ahead.
Project Description
The objective of the project is to implement a plugin for the Eclipse IDE, which will suggest existing source code snippets to the developer depending on the context in which the developer is involved in.
Scope
Since the way of capturing structural information differs from language to language, this plugin will only target the projects which are built using Java language. Furthermore it will provide example code snippets only from the existing source code which is available in local workspace. Hence it will be more suitable for projects which already have enough amount of existing source code.Features
The users will be able to enable the plugin for a particular project via the IDE preferences window so that it is able to enable the indexing of source code on demand, preventing unnecessary performance issues. The preference window will also provide additional options for the plugin.The users will be able to get a list of suggested code snippets in a condensed snippet form so that they can quickly decide which snippets are worth a closer look. By clicking on a result, users will be able to open the source code file which contains the relevant code snippet, in a new editor view.
Prototype search engine Implementation
Please refer to this git repository of the prototype implementation. A prototype of a code search engine has already been developed as a Master's thesis by Tobias Böhm. It is implemented using Apache Lucene. His project’s main objective was to allow developers to execute search queries by using a query language called Lucene QL which is built upon Lucene Query Syntax.
The indexing mechanism implemented in the prototype is outstanding. This prototype is able to capture a large amount of structural information under five main entities data type, method, field, tryCatch block and Variable Usage. Below list is a full set of structural information captured by indexers of prototype implementation. Source: (Böhm, 2012)
Type
|
Method
|
Field
|
Try Catch
|
Type Handle FriendlyName AllDeclaredMethodNames DeclaredMethodNames DeclaredFieldNames AllDeclaredFieldNames FullText FieldsRead Annotations FullyQualifiedName ImplementedTypes ExtendedTypes UsedTypes InstanceofTypes AllImplementedTypes AllExtendedTypes DeclaredFieldTypes UsedMethods OverriddenMethods DeclaredMethods ResourcePath Timestamp Modifiers ProjectName | Type Handle FriendlyName ReturnVariableExpressions DeclaredFieldNames AllDeclaredFieldNames FullText FieldsRead FieldsWritten ParameterTypesStructural Annotations FullyQualifiedName UsedTypes ParameterTypes ReturnType InstanceofTypes DeclaredFieldTypes DeclaringType CheckedExceptions UsedMethods ResourcePath ParameterCount Timestamp Modifiers ProjectName | Type Handle FriendlyName FullText FullyQualifiedName UsedTypes FieldType DeclaringType ResourcePath Timestamp Modifiers ProjectName | Type DeclaredFieldNames AllDeclaredFieldNames FullText FieldsRead FieldsWritten UsedFieldsInFinally UsedFieldsInTry FullyQualifiedName UsedTypes UsedTypesInTry UsedTypesInFinally InstanceofTypes CaughtType DeclaredFieldTypes DeclaringType UsedMethods UsedMethodsInTry UsedMethodsInFinally ResourcePath Timestamp ProjectName |
Variable usage | |||
Type Handle VariableName VariableType DeclaringMethod UsedAsParameterInMethods UsedAsTargetForMethods VariableDefinition |
There’s also an extdoc provider named local code samples implemented using this search engine prototype. The local code samples extdoc provider is currently able to provide code snippets suggestions depending on variable type and called methods.
For an example, when a developer selects a variable name or a method call in the editor, list of code snippets in which a similar type of variable/ similar method call is used, will be displayed in the extdoc provider view given that the indexing of code base is completed.
Implementation of proposed solution
The prototype is using Apache Lucene to implement the search engine. Since Lucene is using an Inverted Index for the index storage, even though insertion may take some time, querying will be pretty much fast. Since viewing a list of condensed code snippets should be done in a short amount of time, it suits best for the local code search plugin.Considering the fact that the above mentioned search engine does a great job with indexing and there is a good amount of code which is worth to keep, it make sense to improve the search engine and use it for the local code search plugin rather than building a search engine from the scratch.
Targetting Deliverables for GSOC
As mentioned by Marcel (refer to this comment) the searcher and indexer do not work together very well. Sometimes accessing the search index fails. This need to be looked in deeper and fixed since it is the heart of everything. This will be a main task of the project.
The implemented extdoc provider already triggers search and displays results when a filed name or method call is selected. The extdoc provider can be extended to trigger search upon selection of many more structral elements in the editor. Search can be triggered upon selecting,
1. Data Types (upon selecting Extended, Implemented or Declared Types)
2. Return Types (find methods which returns similar types)
3. Overidden Methods
4. Annotations
5. Checked Exceptions
2. Return Types (find methods which returns similar types)
3. Overidden Methods
4. Annotations
5. Checked Exceptions
The representation of code snippet results in condensed form need improvements. Currently it only displays a simple code summery of each snippet. A screen shot of a result list is shown below.
This can be improved to provide more expressive summary so that the developers will be able to quickly decide which snippets are worth looking at. Below is a list of things which can be added to summary view so that it will be more expressive.
1. Containing Package
2. The Class name/ Method Name
3. Whether it is a abstract class/ Interface
4. Whether it is a abstract method/ overriden Method, overloads available, etc
5. May be diplay two or three surrounding code lines in muted text
2. The Class name/ Method Name
3. Whether it is a abstract class/ Interface
4. Whether it is a abstract method/ overriden Method, overloads available, etc
5. May be diplay two or three surrounding code lines in muted text
Further possible improvements to prototype
Please refer to the bugzilla issue for more information.
As discussed by Tobias, there are several improvements which can be done for the indexing.
- Structural information such as Class Castings and thrown exceptions are not captured by current indexers.
- Improvements also can be done to indexing variable usage as currently it only captures information of variables which are declared inside the same block where it is used.
For an example, if a variable is declared as a class variable and used inside a method, usage of variable isn’t considered when the method is being indexed.
- The current implementation doesn’t provide a custom ranking mechanism for search results.
Scoring is all left to default Lucene scoring which uses a Vector Space Model (VSM). It is possible to affect default scoring of Lucene by implementing our own query classes. This will help to receive a search result in which the most relevant code snippets are on top of the list.
These improvements needs further discussions, specially implementing a scoring system. Some of these improvements may be achieved during the GSOC if time permits.
No comments:
Post a Comment