May 15, 2011

A Subword Matching Completion Engine for Eclipse?

A few hours ago stumbled across a post at stackoverflow.com titled  "Eclipse: Fulltext autocompletion for Java?" where a programmer is essentially asking whether Eclipse' code completion supports partial proposal matching.


I copied the question here to make this post a bit more self-contained:

================================================



Hi, can I have fulltext autocompletion for Java @ Eclipse? Let's demonstrate:
Final piece of code:



getVariants().add(new Variant(MediaType.TEXT_XML));
How do I code now:
getv[ctrl+space].a[Enter]new V[ctrl+space, down arrow, Enter]M[Ctrl+Space, Enter].text_x
Basically, Eclipse completes word "TEXT_XML" when I provide letters "TEXT_X".
How would I like to code:
getv[ctrl+space].a[Enter]new V[ctrl+space, down arrow, Enter]M[Ctrl+Space, Enter].xml
and Eclipse should realise I meant "TEXT_XML" (fulltext autocompletion).

================================================

Well, thanks to JDT's quite comprehensive APIs this is not too complex to implement, and I wrote down a simple version of this completion engine (10 locs).

This is how it looks like in action now:





But now I wonder:

  1. How often do you just remember a few letters but don't know the complete name of a method or member?
  2. Would you like to see such a completion engine in Eclipse?
  3. What else would be missing?

I would be glad if you could send your thoughts to the code recommenders forum.


Best,
Marcel

Stay in touch. Follow me on Twitter.


Update #1: Neat research on this topic.




I just found a paper (almost) exactly on this topic. Could you check out these screencasts?
Maybe it's even more than you/we would like to see :-) But I think that's pretty cool stuff and having a similar completion engine for Eclipse could be very, very cool. Internally it's based on Hidden Markov Models which learn how people abbreviated certain identifiers (like  stx --> 'setText') in the past. The nice thing about it is that we don't have to come up with our own matching algorithm but just use what the programmers used before... I like the Han's work :)

Update #2: Prototype available.



After a quick and dirty monday afternoon hack a first prototype is ready. In a nutshell, it takes your prefix entered and creates a regular expression from it. For example, when entering button.stx|<^Space> the engine only presents those proposals that match this regular expression .*s.*t.*x.* which finally may match "setText(text)". This is roughly what Maxime proposed in his comment below. 

In addition it now respects upper case letters as suggested by Deepak. For example button.sT<^Space> now evaluates to  '.*s.*T.*' whereas button.st would evaluate to .*s.*[tT].*.

Currently, it does not reduce the proposal score as recommended by another comment nor does is give precedence to those words that match the prefix - but we may do that if you find this more useful. Just drop a comment in the forum mentioned below.


Please find the installation instructions here:
    http://www.eclipse.org/forums/index.php/mv/msg/209269/671088/#msg_671088

Please send your comments to the forum here:
    http://www.eclipse.org/forums/index.php/f/211/



Stay in touch. Follow me on Twitter.


12 comments:

  1. I guess the point is not just to find a method whose name you don't completely remember, it's also to spare some keys to type. For example, we could just skip some unnecessary letters. Example: "nam" => "setName", "getName". Or, even "gnam" => "getName"

    ReplyDelete
  2. I just found a paper (almost) exactly on what you are saying. Could you check out these screencasts? http://vimeo.com/11664433 http://vimeo.com/19369928

    It's what you would like to see, right ? Maybe it's even more than you/we would like to see :-) But I think that's pretty cool stuff and having a similar completion engine for Eclipse could be very, very cool... I'll study the paper...

    Could someone imagine to use this completion engine? Someone who cant'? :)

    ReplyDelete
  3. I tend to use Cntr+O to get the quick outline and use this to search for method names if I don't remember them completely. I think having pattern matching on method names for code completion would be super cool.

    ReplyDelete
  4. I often wanted exactly this feature when I didn't remember a method name exactly.

    But the important point is: These completions should only be "second class" compared to the current completions, which look for methods beginning with the given input. That is because in most cases the developer _does_ remember the method and starts with the beginning of the name, not somewhere in the middle.

    So if these completions were added at the bottom of the suggestion list, that might be great!

    ReplyDelete
  5. Ok, prototype here you go:

    Please find the installation instructions here:
    http://www.eclipse.org/forums/index.php/mv/msg/209269/671088/#msg_671088

    Please send your comments to the forum here:
    http://www.eclipse.org/forums/index.php/f/211/

    ReplyDelete
  6. Thanks for your responses Lars, Maxime, and Anonymous! It's always great to get your response on such kind of thoughts. Thanks!

    ReplyDelete
  7. - I would like exact matches to be shown first e.g in the screenshot above all 'layout' methods should be above 'getLayout'. A similar problem exists in JDT open type dialog, try searching for 'SWT' :)
    - Content assist already supports camel case. e.g. parent.gLay<^space> would match 'getLayout'. Probably we could support parent.Lay<^space> to also match 'getLayout'. Note that 'L' is upper case here, I am not so sure if ignoring the case would give predictable results.

    ReplyDelete
  8. Great. After your update it works now also for super.S-> onSearch.

    As said in the forum it would be cool if override methods could use the same stuff.

    ReplyDelete
  9. Do plan on also chaining this to the Recommendation Engine? You already support prefix matching, so would this not be the next step?
    Also i think, as some said before me, that matching prefixes should be shown first to be able to completely replace the default Content Assist.

    ReplyDelete
  10. Actually, now after working with this for about 5 minutes, i am wondering: Is there some way to suppress duplicate proposals when using multiple proposal engines?

    ReplyDelete
  11. Hi Michael!

    Regarding subwords prefix matching:
    Currently there is no appropriate ranking mechanism for proposals that match the regex. One way to get around the prefix matching issue might be using string distance measures like JaroWinkler to rank proposals higher that match some parts of the text a user enters. Please check http://www.eclipse.org/forums/index.php/t/210080/ for details. Your comments on this - and further ideas - are welcome. Just add them to the thread.

    Regarding filtering of duplicates:
    Eclipse doesn't permit filtering of duplicates - for several reasons. Sometimes it's just not a wanted behavior. However, I see that this is annoying if lots of duplicates are shown. Ranking mechanisms, for example based on JaroWinkler might help here a bit. Do you have any idea how Eclipse could be changed to match your needs?

    ReplyDelete
  12. Maybe an off topic comment :)

    The IntelliJ Idea feature I really missed in Eclipse IDE is language injection in string literals.
    You could take a closer look at http://blogs.jetbrains.com/idea/tag/language-injection/. Indeed it offers almost all the language support, not only Code Completion.

    Regards

    ReplyDelete