Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On a much slower computer...

  time -p wc big10.txt
  1284570 10956950 64886660 big10.txt

  real         2.76
  user         2.68
  sys          0.08
Trying this as novice with k3.

Because novice, 2 out of 3 counts are incorrect and probably not the fastest solution used.

Total "words" in the example was simply AWK's NF. But looking at big10.txt there anomalies such as words separated by "--" instead of space.

Here I used non-space character followed by space. Far from accurate but not too far.

  1.k: 
  w:0:"big10.txt";v:,/$w
  m:v _ss "[^ ] " / "word": char followed by space
  #w   / lines
  1+#m / words
  #v   / characters

  time -p k 1

  1284570
  10019630
  63602090

  real         2.70
  user         2.40
  sys          0.28

Counting lines with sed

  time -p wc -l big10.txt
  1284570 big10.txt

  real         0.13
  user         0.06
  sys          0.07

  sed -n '$!d;=' big10.txt
  1284570

  real         0.29
  user         0.19
  sys          0.09


That is a slow computer, mine is a pre-haswell i3.

  $ time -p sed -n '$!d;=' big10.txt
  1284570
  real 0.07
  user 0.06
  sys 0.00

  time -p mawk 'END {print NR}' big10.txt
  1284570
  real 0.04
  user 0.03
  sys 0.00

  $ time -p gawk 'END {print NR}' big10.txt
  1284570
  real 0.14
  user 0.13
  sys 0.00

  $ time -p wc -l big10.txt
  1284570 big10.txt
  real 0.02
  user 0.02
  sys 0.00


Revised 1.k.

  w:0:"big10.txt";v:{" ",x}'w;u:{#v[x] _ss " [^ ]"}'!#v;t:{#w[x]}'!#w

  #w / lines
  +/u / words
  +/t / chars
Counts for words and chars are closer but still short due to inexperience using k.

But it appears the script is now faster than wc.

  time -p wc big10.txt

  1284570 10956950 64886660 big10.txt
  real         2.78
  user         2.66
  sys          0.12

  time -p k 1

  1284570
  10956830
  63602090

  real         2.57
  user         2.42
  sys          0.14




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: