> Adding intermediate variables like "newline_delimited", "lowercased", "sorted", etc... is just pure noise.
I don't fully agree. Even when each step is fully descriptive it greatly helps to see the intermediate result, and names can be used for that as well. It is true that some intermediate names are not required, partly because some steps are best understandable in conjuction with neighboring steps (e.g. `sort | uniq -c` is a very common idiom and splitting them would do much harm than good), but a healthy dose of names would help in general.
I would say that there are three major steps in this particular pipeline: normalization (`tr -cs A-Za-z '\n' | tr A-Z a-z`), frequency calculation (`sort | uniq -c`), and extraction of first ${1} largest entries (`sort -rn | sed ${1}q`). So it is reasonable to have two additional names between them. Or you can name each step with a function so that you don't need intermediate results to understand that (`norm-words | calc-freqs | keep-largest ${1}`).
> It's the equivalent of a newb programmer putting a comment over each line of simple code explaining in English what that code does, despite it being clear already.
That is more about repeating functional parts without describing any intention. Comments about intents and clarifications are absolutely fine. For example:
counter += 1; // increment the counter by one (bad)
counter += 1; // increment the global counter, don't need synchronization here (better)
g_counter.incr_without_sync(); // (even better, but not always possible)
I don't fully agree. Even when each step is fully descriptive it greatly helps to see the intermediate result, and names can be used for that as well. It is true that some intermediate names are not required, partly because some steps are best understandable in conjuction with neighboring steps (e.g. `sort | uniq -c` is a very common idiom and splitting them would do much harm than good), but a healthy dose of names would help in general.
I would say that there are three major steps in this particular pipeline: normalization (`tr -cs A-Za-z '\n' | tr A-Z a-z`), frequency calculation (`sort | uniq -c`), and extraction of first ${1} largest entries (`sort -rn | sed ${1}q`). So it is reasonable to have two additional names between them. Or you can name each step with a function so that you don't need intermediate results to understand that (`norm-words | calc-freqs | keep-largest ${1}`).
> It's the equivalent of a newb programmer putting a comment over each line of simple code explaining in English what that code does, despite it being clear already.
That is more about repeating functional parts without describing any intention. Comments about intents and clarifications are absolutely fine. For example: