Code Refactoring, Optimization, and Pareto’s Principle
Any successful programmer will say that they would love to write the cleanest and most optimized code around. Unfortunately, bugs arise, code is ambiguous, or you don’t have the time to rethink an algorithm to optimize its performance. This leads me to the three points in the title:
Code Refactoring, simply put, is rewriting already working code “to make it amenable to change, improve its readability, or simplify its structure, while preserving its existing functionality.” [Wikipedia] In a lot of cases this is a good idea. Especially if others need to review your code, maintain functionality, or add new features in the future. This starts to become a problem when you try to do too much at one time. I’ve noticed that a lot of people group refactoring and optimization into one large category of chaos. In my experience, it is best to optimize then refactor or at least take your time optimizing so that refactoring isn’t needed. This will allow you to create more efficient working code then make it readable to others. Doing them both at the same time will confuse you, reviewers, and the compiler (error, seg fault, …ah).
Example [Wrong way]:
Coder: Wow, I could really use a HashSet instead of an ArrayList to optimize this code. [Start rewriting...] Function xyz is really messy, I’ll clean it up then come back to the HashSet structure. Function 123 is messy too …
Example [Right way]:
Coder: Wow, I could really use a HashSet instead of an ArrayList to optimize this code. [Start rewriting ... Finished with change in structure ... unit tests were successful ...] Function xyz is really messy, I’ll look to see if its still needed and then try to clean it up.
Non successful coders will forgo inefficient working code for efficient non-working code. This is where optimization and Pareto’s Principle play a role. Pareto’s or the 80-20 rule states that 20% of your code will be responsible for 80% of the running time. Naturally, one should focus on optimizing the 20% of code to get the most speedup. Optimization should be the result of taking working code and making it more efficient. Planning and thought have to go into this process. Some going blindly into the project and hope to rewrite whole classes to gain a speedup only to find that their project no longer compiles/works/completes all the unit tests. Focus on simply refining small parts at a time and continue to test as you progress through. Optimized code should be your way of putting icing on the cake; you take already existing code and you give it that extra touch of efficiency.
As an example, a project that one may be working on has a runtime of ~1 hour to complete the task. It would seem completely worthless to spend hours optimizing/refactoring a piece of code that only takes 1-2 seconds to execute. Ideally; yes, you should make the code as best you can, but one must focus on the largest piece to tackle. Successful coders notice this and will attack the code with the largest runtime.
What I have found that is extremely helpful when optimizing is take enough time to think, design, and implement. This will allow you to write cleaner optimized code that may not have to go through the refactoring process (or as much). Also start with basic improvements in speed like changing a structure or protocol. Often these carry a lot of different pros and cons which you can judge specifically to your data. Overall, take time optimizing so that its done correctly so that people that maintain your code in the future will thank you!