AutoRefactor: Eclipse plugin to automatically refactor Java code bases

pron · on June 7, 2015

See also Jackpot[1][2] (James Gosling himself was involved with the project) which is now integrated with NetBeans.

For example, given the rule:

       $1.isDirectory() :: $1 instanceof java.io.File
    => !$1.isFile()
    ;;

It will rewrite this:

    java.io.File testFile = new java.io.File("/tmp");
    if (testFile.isDirectory()) {
        System.err.println("/tmp is a directory");
    }

into that:

    java.io.File testFile = new java.io.File("/tmp");
    if (!testFile.isFile()) {
        System.err.println("/tmp is a directory");
    }

Another cool feature is that simply placing the rule files in a library's META-INF/upgrade directory, will activate them in the IDE, offering automatic refactoring to users of the library (i.e. if the library is in your project's classpath).

Jackpot is used extensively by NetBeans.

[1]: https://bitbucket.org/jlahoda/jackpot30/wiki/Home

[2]: http://www.devoxx.com/display/DV11/Jackpot+3.0+-+Large+scale...

JnRouvignac · on June 9, 2015

I am not a fan of Jackpot. Such DSLs are hard to build to keep expressiveness, and yet, they remain hard to use for the user. I am a lot more enthusiastic about the approach taken by Refaster.

JnRouvignac · on June 7, 2015

Thanks for the reference. I'll have a look.

delecti · on June 7, 2015

After reading through about half of those examples, I can't go on.

The only one I saw that was potentially useful was the multi-catch. Half of the rest were things I had never even seen, and couldn't imagine letting slip by a code review (who uses control flow without brackets?), and the other half were so trivial that I wouldn't even waste the time to run the tool for them (why is the non-null case even preferable to do first? most people I know prefer the input validation step first anyway).

JnRouvignac · on June 7, 2015

I am the author of AutoRefactor. If you only need multi-catch refactorings and have never seen the other pieces of code, then you are the luckiest Java developer I have heard off. You are probably working on a green field project with a small team of exceptionally talented people. For people not in that case, chances are they are inheriting a very large code base originally written with Java 1.4 or (shudder) even earlier. They are working with some people that never manage to get the simplest indentations right, let alone the code patterns you saw. The size and age of the code base made for numerous failed attempts at running checks type. Finally, the team is so big that no human can review all the code changes pre or post commit. Believe me, this exist and I think this is probably most of the code written.

Concerning the not null case, I agree this one is very subjective. I hesitated a lot before adding it. It is not as broad as what you think it is. It is only touching if statements when the then and else clauses are similar.

Say if (o == null) return null; else return o.toString(); I am arguing that the cognitive load is less important when the if statement is inverted. i.e. if (o != null) return o.toString(); else return null; Or with ternary operator: return o != null ? o.toString() : null;

You can disagree and then not run this particular rule.

delecti · on June 7, 2015

Admittedly, none of the code I work with on a regular basis is older than ~4 years, but some of that has changed hands several times (hot potato project nobody wants to own), and actually still runs on Java 1.4.

I suppose I'm willing to accept that I've been lucky though, and god help anyone whose code has the sorts of problems I wrote off as being unacceptable.

ExpiredLink · on June 7, 2015

Yep, the examples are not convincing. Some even make to code worse.

JnRouvignac · on June 7, 2015

Which ones?

adam-f · on June 7, 2015

You've got a change under "// should not be touched"

http://imgur.com/CSHnOuP

JnRouvignac · on June 9, 2015

Thanks for reporting.

Actually the comment is wrong in that case using BigDecimal.valueOf() is better since it could reuse the immutable object, so that is better for memory. I'll rework the samples to make it more explicit.

JnRouvignac · on June 9, 2015

Fixed.

ExpiredLink · on June 7, 2015

The IfElseIfSample.java converts a clear if-else to a if-elseif without else. As if the programmer had forgotten the else.

JnRouvignac · on June 7, 2015

I am trying to understand your point but I fail.

Why should there always be a else to an if-elseif?

Secondly, in the original if-else, the if in the else clause does not have a else. Why is it not a problem for you? It seems contradictory...

ExpiredLink · on June 7, 2015

> Why should there always be a else to an if-elseif?

Because that's good style!

> in the original if-else, the if in the else clause does not have a else

An if can live without an else. BTW, there are many books and online resources about programing style.

JnRouvignac · on June 7, 2015

Thanks, I am reading many things about good code and I have never seen such advice.

"Good style", "best practice", all of this comes from an observation made by somebody who can justify it is good practice by some real evidence. If you were using if-elseif like a switch statement, then maybe there should be an else. If the if conditions are not related, then I don't see why there should necessarily be one.

Please enlighten me by pointing to the relevant online resources.

wtetzner · on June 7, 2015

What should go in the else? A comment saying "// Do nothing here"?

ExpiredLink · on June 7, 2015

> What should go in the else?

What's left. What else should go in else?

BTW, the original example not disimproved by AutoRefactor was okay.

manyxcxi · on June 7, 2015

I'm viewing the code from my phone, so I have not looked through the entire repository to understand the styles that you can define and apply (if that's possible) but the sample ins and outs are definitely the type of things we document in our team best practices. I've generally fallen into the IntelliJ IDEA crowd the last view years and have found it to be VERY customizable and able to do a lot of the refactoring I see in the examples. As well as new line cleanup, max width, spaces/new lines for brackets, parens, and braces, etc. etc. The question I have (and the reason for me explaining that I haven't had a chance to properly review the posted repo) does the library go beyond that or simply add this missing functionality to Eclipse? Kudos though for his development though, I found the ability to define language style guides via configuration and pass them out to my team and set the formatters to run on save has cleaned up our code bases mightily. This is absolutely useful work.

Off topic: I love what Eclipse (and the many great people behind it) has given to me over the years and I still use Eclipse based derivatives (Android development) but a couple years with IDEA and I can't imagine going back.

UPDATE: I just saw a link to the samples HTML and this appears to go beyond what IntelliJ offers in some cases (as far as I can tell).

overgard · on June 7, 2015

Why? (I don't mean that sarcastically: what's the idea?)

lnanek2 · on June 7, 2015

Just looking at the sample input, a lot of things are inarguably better removed: https://github.com/JnRouvignac/AutoRefactor/tree/master/samp...

For example, replacing multiple catch blocks with multi-catch is definitely going to make source code shorter and clearer, easier to maintain, and less likely to develop bugs as people work on it. Especially since the catch blocks usually all do the same exact thing, they just had to be written out. Modern Java just allows you to catch multiple at once: catch( Exception1|Exception2 e ) {

Instead of: catch( Exception1 e) { ... } catch ( Exception2 e) { ... }

Where the ... is probably duplicate code in danger of falling out of sync.

alok-g · on June 7, 2015

I have seen many developers who write the type of code that this tool fixes. I thereby love it. (Note: I am not associated with this in any way).

In case your question is why bother to fix to begin with, even such simple simplifications make the code easier to maintain and to apply higher-level refactoring.

RMarcus · on June 7, 2015

I think some examples would go a long way.

mullr · on June 7, 2015

http://autorefactor.org/html/samples.html

JnRouvignac · on June 7, 2015

That's a good question.

I can explain for me: why do I fix the code like this? I spend a lot of time reading code to understand it, fix bugs, add features. Sometimes I read code and realize it is unnecessarily complicated, written in a not very concise way. Which increases the burden of reading code. This time that I spent, I do not want to have to spend it again, so I put back in the code the knowledge I gained from it. Martin Fowler argues for this too: http://martinfowler.com/articles/preparatory-refactoring-exa...

That my selfish reason for doing it. It's also the boy scout rule: http://programmer.97things.oreilly.com/wiki/index.php/The_Bo...

Finally another quote from Donald Knuth that I am summing up as "code for human beings, not for machines": https://en.m.wikiquote.org/wiki/Donald_Knuth (see the quote from "literate programming")

Somebody commented such refactoring rules are part of their coding rules. I argue such rules are rarely expressed anywhere. Furthermore, if we were to write them down, I think there would be too many of them for anyone to remember. Putting them all in a tool relieves the team as a whole: I get tired to repeat the same things over all code reviews. Sometimes, I am even so tired that I forget to comment on some of them. Sometimes I even violate the rules I put in the tool because I was thinking so much of the problem I was trying to solve!

Another problem are inherited code base where the amount of technical debt is so overwhelming that it's impractical to fix it by hand.

Finally, a last reason could be to upgrade automatically to new language features that make code more concise or more readable. Think about the introduction of generics in Java 5, project Coin features in Java 7 or lambdas in Java 8.

You can also think about it as good code hygiene, pretty much like a good code indentation and formatting is a requirement for green field projects today.

I have a few other reasons outlined here: http://autorefactor.org/

I nearly forgot one use case! I know a company transcompiling code from COBOL to Java. The resulting code is so huge (hundreds of thousands of LOC) and dirty (you would not believe it) that they are considering using AutoRefactor to clean up the mess.

reagency · on June 7, 2015

Google recently published a paper about their extensive work in automated code cleanup:

http://research.google.com/pubs/pub43322.html

And their open source refactoring system: https://github.com/google/Refaster

JnRouvignac · on June 7, 2015

Thanks! I will take a look.

JanecekPetr · on June 7, 2015

This is great! In a bigger team and/or with an old application, I tend to do these changes manually, all the time, for every piece of code I touch. And while in the long run things get better, it's inevitably a losing battle if the others don't care. Thank you for putting your time into this project!

JnRouvignac · on June 7, 2015

I am happy if it helps you too

TheLoneWolfling · on June 7, 2015

I wish these things were more language-agnostic.

There are all sorts of fancy refactoring tools out there, and most (all?) of them seem to be intimately connected with the language in question.

I mean: take compilers for example. GCC isn't C-only. Or rather there are lots of compilers built around the same framework. Why can't it be the same for refactoring tools? They both are doing effectively the same thing, after all. Turing one form of code into another form that's better, for some definition of better. The distinction being that compilers tend to not have the input language be the same as the output language.

JnRouvignac · on June 7, 2015

Interesting question. What would prevent this to happen indeed?

One reason I can think of is that for GCC all input languages are converted to the same intermediate representation (IR). From there, it's easy to apply the same optimisations, generated code, etc.

For refactoring tools, I have never heard of such IR. The benefits of going for an IR are not obvious either. The drawbacks are huge: a lot more code, difficulty to build an IR rich enough to represent the semantics of all input languages, and there's an extra mapping to go back to the original code by keeping indentation and formatting.

While it is more obvious for GCC: you need another representation than the AST to compile to assembly, optimisations are run on the IR, there is no need to take care of the original source formatting.

Languages also have very subtle differences in meaning/behaviour making it harder to reuse refactorings across languages.

Maybe it's possible to go with your idea, but I think it would require a big team working on it. Who's willing to spend the money or the time when the current approach seems satisfactory?

TheLoneWolfling · on June 7, 2015

I always thought that refactoring tools should be compilers.

That way you can reuse all of the parsing code / DCE code / flow analysis / etc / etc.

The extra mapping is something a "normal" (i.e. single-language) refactorer has to do anyways. Assuming its a half-decent refactoring tool its not just doing a find-replace.

The subtle differences / drawbacks you mention a compiler also has to deal with. Actually, everything you mention a compiler also has to deal with. You could say the same about compilers (there's no actual need for an IR with a compiler, for instance. You can operate on the AST directly. And you don't need to build an IR rich enough to keep semantics in all languages unless you're supporting multiple languages. Etc. Etc.) And yet we have GCC.

JnRouvignac · on June 7, 2015

I agree the drawbacks equally apply to GCC. I agree a simple compiler does not need an IR. Some compiler optimisations cannot exist without an IR.

However I was wondering where a refactoring tool would need an IR. I found one: a CFG. For this, I need a three-address code representation. As a hypothesis, let's suppose I can build one that is independent from the input language. Once I have found how to refactor it, I then need to describe which code must be rewritten to what, in the input langage. This is the part where I find it very hard to do. I have the impression it will fall short here.

That would be an interesting experiment :)

wtetzner · on June 7, 2015

Does anyone know if this plays nicely with Lombok?

orangecheese · on June 7, 2015

I guess it might depend which features of lombok you use. Lombok vals don't comply with the Java language spec so you might have issues there (as so many other tools e.g. IntelliJ). Friends don't let friends use Lombok...

wtetzner · on June 7, 2015

Instead friends let friends write and maintain enormous amounts of boilerplate when they could have just used @Value?

orangecheese · on June 7, 2015

The fact that you are resorting to DIY language extensions rather than picking a more appropriate language for the job should tell you everything you need to know.

wtetzner · on June 8, 2015

Haha, I completely agree with you. Unfortunately I didn't get to pick the language.

JnRouvignac · on June 7, 2015

Do you expect AutoRefactor to work with code using Lombok?

I did not try, but I don't expect so. Why? Because AFAIK, Lombok enabled code is not valid java code. So I don't expect that Eclipse JDT (basis for AutoRefactor) will be able to parse it. The only way to make this work would be to work on the code generated by Lombok, then do the round trip back to the original Lombok source. Patch welcome!

If someone tries it out, I would be happy to read the feedback.

BTW I think AutoValue could work better thanks to the use of standard annotation pre processor.

wtetzner · on June 7, 2015

> Do you expect AutoRefactor to work with code using Lombok?

No, I don't expect it to, but that doesn't mean I wouldn't like it to :)

> AFAIK, Lombok enabled code is not valid java code.

Lombok enabled code is valid Java syntax, as it is implemented as an annotation processor. However, Lombok enabled code won't type-check (or run properly) without actually running Lombok.

orangecheese · on June 7, 2015

Not all Lombok code is valid Java code due to the way Lombok hijacks the annotation processor using implementation specific apis to perform operations not supported by the annotations spec.

JnRouvignac · on June 7, 2015

Ok that could work then. Do you mind trying Lombok + AutoRefactor please?

wtetzner · on June 7, 2015

Yeah, I'll let you know how it works.