I usually run 'w' first when troubleshooting unknown machines

spudlyo · on March 27, 2018

I too instinctively run `w` whenever I log into a machine, and that instinct helped me land my current job.

One hour long component of a (now deprecated) SRE interview loop was for the candidate to SSH into a series of EC2 instances and debug issues which got progressively harder as the interview wore on.

I had wasted a substantial amount of time on the first and easiest problem by really overthinking it, and not trying the simplest of debugging techniques first. By the time I got to the final, hardest problem, I had just over 5 minutes remaining. The interviewer gave me a pass to skip it, but I was having fun, and really wanted to take a crack at it.

The final problem was to try to figure out why logging into a particular machine with SSH was slow. While I sat waiting for a prompt, I had a number of thoughts. Is a reverse DNS lookup timing out? Is there a huge i/o load on the machine? Am I going to have to wire up strace to `sshd` and log in again?

When I finally get to a shell prompt, I instinctively run `w` and it just hangs. I hit ^C, `strace` it, and discover that it's blocking on:

    fcntl64(5, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) ...

I look up a bit, and discover that file descriptor 5 is /var/run/utmp. So `w` is trying to get an advisory lock on utmp and failing. Then it hits me, `sshd` is likely also trying to acquire a lock on utmp, failing, and then eventually timing out.

A little bit later, I've found and killed the the rogue program that held the lock, and SSH logins were fast again. Solving that last problem so quickly really boosted my spirits, and gave me the energy to push through the harder interviews that came later in the day.

Thanks w!

ahh · on March 27, 2018

That actually seems like a great idea for a work sample test.

MichaelRenor · on March 27, 2018

In my opinion, a bit too much trivia to be a valuable indicator of SRE success. It would be really great if a candidate (who rated themselves well in systems debugging) could walk you through the strace output of a command like that.

spudlyo · on March 27, 2018

I really like that idea. It might be fun to be given some `strace` output (with the initial `execve` and writes to stdout/stderr redacted) and then be asked to determine which UNIX command it was, or more broadly what it was doing.

da02 · on March 27, 2018

How did you find the rogue program?

eric_the_read · on March 27, 2018

I would probably use lsof(8)

metaobject · on March 27, 2018

Or maybe fuser

hitekker · on March 27, 2018

Good memory and good story.

lazyant · on March 27, 2018

My goto for initial troubleshooting a server is:

uptime # uptime and CPU stress

w # or better yet:last |head # who is/has been in

netstat -tlpn # find server role

df -h # out of disk space?

grep kill /var/log/messages # out of memory?

ps auxf # what's running

htop # stressed? , look out for D (waiting on I/O typically) processes

history # what has changed recently

tail /var/log/application.log # anything interesting logged?

fapjacks · on March 27, 2018

I've wasted time not checking for inode availability, so I'd add a check for that to this list:

df -hi

JadeNB · on March 28, 2018

> w # or better yet:last |head # who is/has been in

The post argues that 'last' will not always give the desired information:

> Obviously, I could also use 'last' to see who's been on the box recently, but this isn't the whole story. It's totally possible to "ssh root@box /path/to/command" and never start a login shell, which then leaves no trace in the lastlog, but then goes on to break something on the box. The syslog is how you'd find this.

jillesvangurp · on March 27, 2018

I can recommend setting up audit beat & kibana or similar. Auditbeat is a recent addition to the Elastic beats agents and it sends audit logs for a lot of things, including ssh logins, system calls, and you can monitor changes to files/directories as well. So, you can flag boxes where people are poking around in /etc, and see which boxes are being accessed via ssh by which user.

We have this and a few other elastic beats on most of our vms in amazon baked into the amis we use. So anything deployed by us starts sending lots of data to our logging cluster for metrics, auditing, internal application events, stacktraces from our docker infrastructure, syslogs, etc.

checkyoursudo · on March 27, 2018

I run my own web server and email server for my law firm. I'm the only one with credentials to log in (ash), and probably the only one who would even know how to do it, and possibly the only one who knows we have servers.

And I still run w first thing almost every time out of habit.

What am I looking for? A session I accidentally left open somewhere else? Unauthorized access? A friend? Dunno...

RobAley · on March 27, 2018

You're leaving your firm incredible vulnerable to the "checkyoursudo gets hit by a bus" scenario aren't you? At least let the rest of your firm know you have servers and give some credentials (for another account with sudo permissions) to e.g. a managing partner or similar (with instructions to never use them unless you die).

HenryBemis · on March 27, 2018

My though exactly! For the company of checkyoursudo, he's what is called a Key Person Risk.

But in my examples, I never use the "hit-by-a-bus". Sysadmins tend to frown upon that comment.

I use the "win-the-lottery-go-to-Fiji-and-never-look-back". It always makes them smile :)

AnIdiotOnTheNet · on March 27, 2018

The problem with the "win-the-lottery-go-to-Fiji-and-never-look-back" as an example is it doesn't quite get the point across because they'll have time to gracefully transfer knowledge and after that they're just somewhere else in the world so I can still theoretically fly over there and beat them with a wrench until I get what I need.

Death is a real thing that really happens to people, and from an organizational perspective it is valuable to keep in mind that no one in your company is immune to that.

wccrawford · on March 27, 2018

Yeah, it's not the same at all. I had an ex-employer that fired me multiple times, and came back to me to get forgotten passwords multiple times.

After a little soul-searching, and deciding I didn't want to harbor anger, I helped him with the ones I remembered each time.

Had I actually been hit by a bus, that wouldn't have been possible at all.

I'm sure that if I'd hit the lottery and gone to Fiji, I'd have been even more likely to help him with those passwords.

In the end, not-burning-that-bridge did help me earn more money as he hired me back several times, and I demanded more money each time until I was asking almost as much per hour as he was getting from his customers and he simply couldn't afford me.

checkyoursudo · on March 27, 2018

I actually basically had this happen to me before, too. With two different employers.

Aside from one of them being super annoying, over and over again, helping them out was the right thing to do.

It's not like it cost me anything other than a couple of minutes of time.

If someone already screwed you over, then... But otherwise, why not

pertymcpert · on March 28, 2018

How do you get fired by someone multiple times?

frandroid · on April 2, 2018

Not learning the lesson multiple times. :)

dorgo · on March 27, 2018

The "hit-by-a-bus" scenario for me is expressed quite often where I work. I have no problem with it, if it is not repeaded a dozen times. Then it starts to sound like a threat..

dragonwriter · on March 27, 2018

“depart without notice and without looking back” can be just as much of a threat, though, especially if the person raising the scenario is a Key Person themselves.

makeset · on March 27, 2018

Nice, and apparently you're not the only one:

https://en.wikipedia.org/wiki/Bus_factor

"The bus factor is a measurement of the risk resulting from information and capabilities not being shared among team members, from the phrase 'in case they get hit by a bus'. It is also known as the lottery factor, ..."

fapjacks · on March 27, 2018

I'm curious if I could get a chuckle by saying "Hit by an Uber"...

jethro_tell · on March 27, 2018

I'm laughing. Might be too early but I am.

kozak · on March 27, 2018

I use "go to a vacation" as a euphemism for that.

twodave · on March 27, 2018

I use "get hit by a bus" as a euphemism for vacation

mitjak · on March 27, 2018

A bus filled with vodka and zen

d0lph · on March 27, 2018

I thought it was death, getting fired, or quitting.

galdosdi · on March 28, 2018

I guess I have worked with different types of sysadmins.

A good sysadmin has, or at least expresses traits of in their work, pessimism and realism. We have to constantly viscerally feel and know that any component can fail at any time. Remembering your own mortality goes along well with that.

Being easily disturbed by this is not a trait I would like to see in someone with these kinds of responsibilities.

fapjacks · on March 28, 2018

So what you're saying is that Stoicism is not only a philosophy for soldiers and prisoners, but for sysadmins as well.

galdosdi · on March 28, 2018

Yes, very well put.

rthille · on March 28, 2018

I never worry about being hit buy a bus, I assume it'll be quick :-)

tokenizerrr · on March 27, 2018

Reboot into single user mode, reset root password, and done.

ilikepi · on March 27, 2018

The issue is the new admin has no way to know whether all the processes running before the reboot are configured to come up automatically, and no sense of what external dependencies the server has. Further, the admin is forced to deal with problems reactively at boot time, rather than having the opportunity to gain an understanding of the server setup in advance.

arca_vorago · on March 27, 2018

Not if grub is encrypted, better even behind FDE.

justaj · on March 27, 2018

How does booting for remote login in such a setup work though? You'd have to be physically present in order to enter the passphrase.

arca_vorago · on March 28, 2018

For that I do an ssh shim at initram (with portknocking) for key entry preboot

amdavidson · on March 28, 2018

I would love to read a tutorial on that if you know of one.

rthille · on March 28, 2018

https://hamy.io/post/0005/remote-unlocking-of-luks-encrypted...

jlg23 · on March 27, 2018

One can also assume that any competent replacement for the "checkyoursudo who was hit by a bus" is able to log into machine after a reboot.

I rather rely on beginner level sysop skills than on management understanding the implications of "this is my access to your mail server, don't handle with care, don't handle at all unless I am run over by a bus".

walshemj · on March 27, 2018

One way we did it at BT was all the root paswords where placed in an separate envelope per machine, and kept in the fire safe.

rsync · on March 27, 2018

I do that same thing, but I have scripted it in .login ...

  echo ""
  /usr/bin/netstat -f inet | /usr/bin/grep --color -C 100 "MYHOSTNAME.ssh"
  echo ""
  echo ""
  w
  echo ""

(note, the netstat command arguments are for FreeBSD ...)

So, when I log in, I see all current network connections, and any current SSH connections are highlighted in red.

Then double carriage return and then the output of 'w'.

rhizome · on March 27, 2018

You don't need the quoted empty string with echo to generate a newline, just `echo` is fine.

checkyoursudo · on March 27, 2018

In case anyone didn't already figure it out, (ash) was an autocorrect error from (ssh).

gowld · on March 27, 2018

Why? I'm sure it's fun for you, but it's terribly unprofessional for the lawyers to use that setup.

checkyoursudo · on March 27, 2018

Unprofessional?

No. There aren't any ethics rules implicated in the way we do this.

No client files are stored on these servers.

ktpsns · on March 27, 2018

Interesting that nobody mentioned htop so far (https://en.wikipedia.org/wiki/Htop). It is my favourite command to get a quick glance on the computational facilities of a computer (memory, cores) and what it is doing (load, fancy ps/top). htop is not installed everywhere, but it is easy to make a static build and scp it to the questioned host.

Another very handy command is

  sudo netstat -atpn

which shows you the processes and owners of open TCP/UDP ports. The argument combination is as weird as "ps aux" that I just memorized it by heart.

jaipilot747 · on March 27, 2018

I remember it by imagining netstat wearing pants. There is plain ole netstat and there's netstat wearing pants.

thriftwy · on March 27, 2018

Then there's grep -whor which is also super useful.

ajuc · on March 27, 2018

Thanks! I didn't knew about -o.

My most common grep phrase is grep -iIRn - good for searching in code (when IDE can't find something). You can remove -i if you care about case, but I mostly used it for plsql code, and that's not case-sensitive.

blowski · on March 27, 2018

What does that do?

thriftwy · on March 27, 2018

Pinches whole-words (by regex usually, e.g. '[A-Z][A-Za-z]*Exception') out of files recursively. Then throw in sort and uniq -c.

jakeogh · on March 27, 2018

example? I tried a few things, clearly since -r it's not a single file grep, but throwing in -h with -r is confusing. I thought I was pretty good at grepping...

-r everyone needs, but usually -R is a better default since it does not ignore symlinks

-h I practically never use since I'm almost always using grep to find what file I need to further investigate and letting the stuff down the | line deal with it

-o makes sense, kinda, I want to do that in the next stage of the pipe, maybe I have used it once? cant remember

but -w... I feel like I'm missing something there.

thriftwy · on March 27, 2018

I think you have dramatically different usage patterns than me. I use -o all the time. I only know -w for a limited time, but it's one of my favorite options since you can filter out a lot of garbage. You could replace it with grep -P '\bpattern\b', I guess.

I think what you are doing here is search, but -o and -w shine at data extraction and data manipulation. Things you would use Excel for, except on multiple GB files.

jakeogh · on March 27, 2018

Very interesting. I'm still floundering for a use case... can you give an outline if it's difficult to make a specific example?

If you are not using search, but are using a recursive (-r) grep, then you are feeding a bunch of data to your pipe but don't care what file it came from. I get that... I mirrored the SEC's ftp back when it was ftp... and it's interesting for correlation and name stuff... but that's not going to work because it's freeform, and it's millions of small files... so you are using (large) files with a known format? Something like spectral datasets where the fields are passed with the data prefixed with it's important tags? Maybe logs?

thriftwy · on March 27, 2018

As I have already said,

    grep -whor '[A-Z][A-Za-z]*Exception' * | sort | uniq -c

for a histogram of exceptions inside logs downloaded from a bunch of nodes.

As for -w, my favourite case is https://unix.stackexchange.com/questions/110645/select-lines...

As for -o,

    grep -o ........-....-....-....-............

It may look like Morse code, but what it does is greps UUIDs out.

jakeogh · on March 27, 2018

Ahhh. TIL!

djKianoosh · on March 27, 2018

cool stuff, i was messing around with this. To then take that list of exceptions and find them back the actual files again, tack on at the end: `| grep -Fwf - *`

y4mi · on March 27, 2018

> which shows you the processes and owners of open TCP/UDP ports

naw, that would be `netstat -lntup` (listening, nummeric, tcp, udp, programm)

your mentioned command shows essentially all currently active tcp network streams. https://explainshell.com/explain?cmd=sudo+netstat+-atpn

btw, glances is even more complete than htop

https://github.com/nicolargo/glances

SpeciesInvader · on March 27, 2018

I've been using glances for about a month and am still getting a hang if different UI layouts. It's great for almost any box I'm checking on.

BuildTheRobots · on March 27, 2018

I use both htop and atop, usually htop first as it's more graphical, but atop gives nice IO, wait and NET stats too which is extremely useful.

jakeogh · on March 27, 2018

atop i sweet for overview, but for IO I reach for iotop

v_lisivka · on March 27, 2018

nmon is also good. Check it out.

mdekkers · on March 27, 2018

also, dstat

poooogles · on March 27, 2018

netstat is deprecated and has been for a while, you should be using ss from iproute2. Most of the commands should map directly so it's pretty drop in.

kchoudhu · on March 27, 2018

There's are more kinds of UNIX than Linux, and not all of them have deprecated perfectly functional tools.

mmillin · on March 27, 2018

Only on certain Linux systems is it deprecated. On the BSDs netstat is still the preferred tool.

bloopernova · on March 27, 2018

I like ss if only for the mnemonic "ss 4chan" which sort of but not quite maps to "ss -4tan" ( https://explainshell.com/explain?cmd=ss+-4tan )

y4mi · on March 27, 2018

Thats glorious, gotta remember that one.

Chan is a japanese name ending for children, various communities have created several Anime mascots over the years. Theyre generally suffixed with that -tan, as a cute misspronounciation of chan.

https://en.m.wikipedia.org/wiki/OS-tan

Btw, 4tan should be that 4chan mascot. Ive seen it previously on sankaku complex, but its been too many years ago. Cant find it right now.

/Edit: i probably should mention that you shouldnt visit sankaku complex at work. Its very... Questionable with nudity

mappu · on March 27, 2018

> that 4chan mascot

https://en.wikipedia.org/wiki/Yotsuba%26! (SFW)

telchar · on March 27, 2018

This thread has been worth is just to learn about explainshell. I'm so happy that exists.

dod9er · on March 27, 2018

Uh ok, but ss -atpn isnt very readable on my terminal due to linewrap. Anyone with suggestions ?

defen · on March 27, 2018

    ss -atpn | column -t

Or pipe into `cat`

sridca · on March 27, 2018

What is `ss`-equivalent of `netstat -atpn`?

liveoneggs · on March 27, 2018

I just use ss -nape most of the time

philamonster · on March 27, 2018

Prefer netstat -tulpen.

TCP, UDP, listening ports, user, PID and process name, extended detail.

gowld · on March 27, 2018

netstat -penult (as in penultimate)

What's the ultimate netstat?

gjvc · on March 28, 2018

sudo netstat -tuple

:-)

sateesh · on March 27, 2018

I run netstat -tunapl easy to remember

mdekkers · on March 27, 2018

nobody mentioned htop

htop is my trusted standby. Fell in love with netdata though. https://github.com/firehol/netdata - since we all do our best to stay away from getting shells on the servers, netdata is awesome

bloopernova · on March 28, 2018

netdata is very interesting, thank you for sharing it.

It seems to do something very similar to Cockpit, but netdata has way more stuff out of the box.

discreditable · on March 27, 2018

If you're ssh-ing random boxes they may not have htop.

mdekkers · on March 27, 2018

...my cracking days are long behind me, I no longer ssh into random boxes :)

rhizome · on March 27, 2018

To be fair, nobody mentions it until someone mentions it.

comboy · on March 27, 2018

Can anybody explain to me why this command can return instantly but "lsof -i" sometimes takes ages to complete?

jakeogh · on March 27, 2018

you need -n, otherwise it's trying to RDNS the IP's. It's a dumb default... chatting on the network without you asking while at the same time censoring the data the kernal was using... similarly "ls" might take forever (which matters if it's a big folder and you are piping it's output or just want |head) because it wants to sort (use -f to fix). The ls is a sensible default, often it's for human consumption, and the sort is way faster than a bunch of DNS lookups that could take an arb long time.

lsof has a good case to join coreutils.

ktpsns · on March 27, 2018

A similar issue as with the "-n" flag for netcat exists for "ls": Frequently "ls" is aliased by default in users home directories:

   $ type ls
   ls is aliased to `ls --color=auto'

On slow file systems (for instance, think of shared machines with large directories, NFS and heavy load) this can take ages to give output. Instead, if you just call /bin/ls, this call only calls http://man7.org/linux/man-pages/man3/readdir.3.html and thus gives output promptly.

kbenson · on March 27, 2018

I've seen \ls used to get the unaliased system version of ls. Likely useful for those commands where you don't want the aliased version and aren't sure if they live in /bin or /usr/bin.

ssebastianj · on March 27, 2018

Nice, didn't know about "\ls"! I wonder what's the difference between "\ls" and "command ls". Yes, it's a bit longer to type but the output is the same, I'm asking about what happens behind the scenes.

comboy · on March 27, 2018

thanks!

EngineerBetter · on March 27, 2018

Genuine question: how many HN readers log on to boxes with user accounts that belong to humans, where some state may have been mutated?

My experience of the last five years is so heavily weighted to (effectively) immutable infrastructure that checking to see who had been on a box hadn't event crossed my mind.

mrweasel · on March 27, 2018

I do that daily. Most of the services we run consist of 1 - 8 application servers ( often closer to 2 than 8), so things like Docker doesn't make much sense. Even though we of cause try to automate as much as possible, using things like Ansible, we often log in to servers directly to verify changes. Database servers are normally manually managed, to some extend, so we'll always login directly.

When I'm on-call and get an alarm on a server, checking that colleagues aren't logged in is normally the first thing you do. If a service fails it's more often than not someone doing maintenance, and they just forgot to tell you, or disable monitoring. And as someone else pointed out, also check disk usage.

Hacker News can be a little blind to the fact that most software projects are rather small and modern servers are really powerful. It's actually pretty rare that someone manages enough infrastructure for a single service that logging isn't a viable option.

technion · on March 27, 2018

Even the word "small" is quite relative. One of my clients is big enough that you've probably heard of them, and their website/app (which is the company's only thing) is one LAMP server. That was only last year upgraded from a Windows 2003 WAMP server on a desktop. I tell friends what I did there and they say "wow you must be really good at Puppet at scale" based on assumptions.

thunfischbrot · on March 27, 2018

Most people have yet to discover what a tiny little machine can do if you spend some time addressing bottlenecks. I have fond memories of running websites on the cheapest VPSs I could find, and getting very respectable performance from them with little more than cutting down on resource-hungry services, going static html whenever possible and being vigilant about any resources loaded by the browser.

See also https://lowendbox.com/ and https://hackernoon.com/10-things-i-learned-making-the-fastes...

ryandrake · on March 27, 2018

I run a couple of very small scale (mostly static content) web sites and E-mail serving and a few other services on a single "Cheapest VPS" and have never gotten interested in Puppet or Ansible or any of the other things mentioned in this thread. It just seems like adding abstraction and automation on top of things that are not worth abstracting or automating. To me, these things are Yet Another Software That Could Fail. When I need to change a configuration file, I ssh in, sudo vi /etc/blahblah.conf and get on with my life.

jethro_tell · on March 27, 2018

I just use versioned Makefiles for my very simple personal projects with my artifacts and configs in git. Pull the repo to the box, and run the make file. I commit config file changes then pull and make as needed. Everything is simple, it's manual automation so there is nothing running, but it's repeatable.

I put the bootstrap into an OS package and my server images point to my personal repo so I can configure run and build time dependencies as needed. At the end of the day, I can get a new server, run update, install my package and walk away and everything should be running, but it takes almost no additional overhead over doing it all by hand the first time.

theonealtair · on March 27, 2018

I also was never interested in Puppet, however I was only interested in Ansible, and only use it as sort of an automated check list when setting up my server, in case I need to rebuild the server, the tedious steps can be automated. But I've never used it for multiple server management.

kelnos · on March 27, 2018

So what happens when the hard drive fails and you lose all the data and configuration on the box?

nickpsecurity · on March 27, 2018

Avoiding the huge, fast-changing software for configuration management doesn't mean you can't have CM or high availability on cheap boxes. The OpenVMS approach (see Section VI) combined good filesystem, clustering, optional OS-level virtualization, and distributed lock manager to have clustered systems that ran for years without downtime. Record was 17.

http://h41379.www4.hpe.com/openvms/whitepapers/high_avail.ht...

Those few building blocks done in a highly-robust way on one of the stable, Linux platforms could probably achieve the same thing. Just RAID 0, a good filesystem, clustering itself, and backups would get far. I know there's Linux-oriented products out there but I don't know if their availability or failover time have caught up yet.

ryandrake · on March 27, 2018

My VPS host fixes it and I restore from backup.

teddyh · on March 27, 2018

That’s what RAID and daily backups are for.

telchar · on March 27, 2018

Yep. I got to say to my boss recently "you know those 2-4 servers I requested? You can forgot about that now, I optimized some things and we have plenty of capacity now on the one machine we've been using." That's an easy conversation to have.

bauerd · on March 27, 2018

I share your viewpoint and am currently thinking through a similiar small-scale deployment. How do you handle logging? For single boxes I'd just use logwatch and friends, but for aggregating log output (both system/application logs) from a small-ish Consul cluster I feel the options are complete overkill. I've had a look at ELK, time-series backed stuff (e.g. Prometheus), but all I really want is log aggregation with search capability and optionally Regexp-based alerting.

It seems to me Logstash provides what I want, but I sure as hell won't run 3 JVMs and a Redis instance to aggregate logs.

tl;dr How to handle log aggregation within a small-scale cluster without losing your sanity?

ibotty · on March 27, 2018

Monitoring and logging are best separated. Prometheus is great even on a small scale. For log aggregation there is the relatively new oklog.

I have been running oklog for a customer successfully. You ought to deploy it like a regular (non-cloud-native) service though: When a server crashes, restore the oklog instance from backups, not spinning a new one.

hn_user2 · on March 27, 2018

For small stuff I still like the simple syslog. It’s usually built in and supports aggregating to a single server.

In the past have created a server just with syslog. Adjust all your servers syslog configs to point to that one. Then log in and use grep.

Not fancy. But gets the job done with a single line config change in your syslog configs.

arca_vorago · on March 27, 2018

Senior sysadmin here, and I agree this is one of the most common methods. I have also had great success tying in the ossec or pam module notifications into syslog messages that all go to the central syslog (nsyslog/rsyslog etc) server. The problem with the elk and similar stacks imho is lack of security and speed. Things this old school setup doesn't have a problem with. The problem is that too many managers don't like not having gui dashboards... which is why splunk et al have really taken off.

This re-enforces my idea that a purely terminal based business dashboard might be a cool product with a fairly large market. I've been eyeballing some of the go/ncurses work for this.

devonkim · on March 27, 2018

rsyslog or syslog-ng were preferred these days for better security and performance I thought? Unless you’re using something like stunnel to wrap the syslog messages that is.

devonkim · on March 27, 2018

If you’re not violently opposed to MongoDB Graylog is very popular. I haven’t heard bad things about it so far besides the MongoDB portion but log aggregation is one of those systems that may be ok with some hiccups from time to time as opposed to an actual primary business data store.

mrweasel · on March 27, 2018

We're kinda big on Splunk, everything that remotely looks like log data get shipped of to Splunk.

I've worked with ELK before and I'm not a fan, the usability is very low compared to Splunk. It's much cheaper though. At a previous employer we switch from Splunk to ELK and the number of searches we did on a daily basis dropped to almost zero. Before that almost every support "ticket" would start with a Splunk search.

You could also try out https://www.humio.com. I only seen it deploy by one customer, but it looks nice.

jsmeaton · on March 27, 2018

We use Sumologic and it’s really nice. Much better than ELK. I’ve only used splunk in a evaluation context a long time ago, so I’m not sure how they’d compare. Splunk is just so expensive.

kondor6c · on March 27, 2018

I found Sumologic to lack features that ELK has such as packetbeats. But you still have to manage the Sumologic forwarding agent and keep under your ingest limit. Additionally my mind never exactly flowed with their query syntax. They refreshed the UI recently and I liked it (aside from compounding my overutlization of tabs haha), you now more easily/graphically select time series from the graph. I can see how it is nice to not have to worry about re-indexing and what type of device the underlying data rests on. I have not tried the ELK hosted solution so some of my criticisms could apply to it as well.

Carasso · on April 3, 2018

We use Humio alongside Splunk and they have an office in SF. Nice people, price is nice too :)

AzMoo_ · on March 27, 2018

I just have a server dedicated to syslog and have rsyslog running on everything else.

kaitnieks · on March 27, 2018

For simple, non-critical stuff this is the simplest solution I've come up with - if you don't need real-time aggregated logs (i.e. you only want to archive them) and you have log rotation configured, you can simply use cron with "aws s3 sync" or rsync for the logs folder.

greenleafjacob · on March 27, 2018

Check out papertrail.

Fradow · on March 27, 2018

On my personal dedicated server, I don't have any kind of "modern" things, I do everything the old way, so I do log manually. It's a feature, it's a way for me to learn old-school sysadmin (and I have so little things on here anyway automation is not needed).

For my startup, I often run bash on Heroku because I do migrations manually (again, it's a feature, I'm too inexperienced to have automated migrations that work everytime, I prefer to be already on it if it breaks). Sometime when something breaks I'll also poke around the filesystem (which is a copy, so no fear to break anything).

Basically, I'd say the smaller your team, your uptime requirements and your traffic is, the less you need automation, and the more you are susceptible to login directly to a box (I combine all of that: team of 1, no uptime requirement, not enough traffic to even max the most basic server).

majewsky · on March 27, 2018

Counterpoint: Even on my private VPS, I have all the configuration as code. It gives me peace of mind knowing that when a server comes crashing down for whatever reason, I can reinstall it and bring all services back online with not more than one hour of time invested. Time is precious.

(Also, when someone asks me "how do you configure X", I can just link them to the corresponding place in my system configuration repo on Github.)

zacmps · on March 27, 2018

I'd be interested in seeing the config if it's public?

Crontab · on March 27, 2018

Probably https://github.com/majewsky/system-configuration

majewsky · on March 27, 2018

Exactly. The Readme is horribly outdated, but the basic remarks are still true.

eythian · on March 27, 2018

I have a small handful of VPSes for personal stuff that, more and more, I'm moving away from manual configuration to using ansible.

The reasons for this are several, but the most pressing are that if one falls over I can spin it up again quickly somewhere else, and I can have a lot of commonality in the configurations between them, so they all have backups configured the same way, or setting up letsencrypt requires editing things in only one place.

It also means I don't have to dig through a bunch of config when I want to work out how stuff is set up three years later, I can just look in one version-controlled directory in a central location.

raverbashing · on March 27, 2018

Not to mention that while configuration management "magic" has taken over (Puppet/Chef - or even something higher level) you still need to set up those and you still need to know where to look when things go wrong.

lsc · on March 27, 2018

>My experience of the last five years is so heavily weighted to (effectively) immutable infrastructure that checking to see who had been on a box hadn't event crossed my mind.

In web scale production, you mostly have to worry about your fellow sysadmins troubleshooting the problem, and they mostly won't be too mad about you clearing state.

But not everything is web scale production. Productionizing an app is a lot of work. Yes, yes, "Cloud is reliable!" and that's kinda true if you write your application to deal with any failure at any time. the reality is that the hardware can and will sometimes go away without warning, and you can't call up your oncall and tell them to head to the datacenter to fix it. all that recovery is done at the application layer; you had better hope you managed to make the data redundant enough. The whole idea behind the "cloud" is to take most of the lower level sysadmin work and make developers do it; and that's a fine way to some things, but... not so fine when it comes to other things.

(This is the value proposition of a VPS system rather than a "cloud" instance. when the hardware a VPS is on goes down, some poor bastard's pager goes off, and they are expected to wake up, drag themselves to the datacenter and fix it. The gamble is "is being down for a few hours now and again, when you can reasonably expect to be brought back up in the same configuration cheaper than writing your software such that you can always start over on a new node?" The VPS is for the former, the "Cloud" for the latter.)

Add to that, well, a lot of businesses need to run proprietary software. I personally, well, let us just say that for the last three years, the systems I am dealing with at work involve FlexLM.

so corp and smaller sites are littered with one-off systems... systems where if they break, a pager goes off, and someone fixes the problem. Sometimes that works better than the "web scale" stuff... like I've yet to see a "web scale" posix-ish filesystem that meets the user expectations the way NFS does, and I've never seen a nfs server that didn't require someone on pager.

closeparen · on March 27, 2018

Our containers are immutable, but the bare metal hosts they run on aren’t.

movedx · on March 27, 2018

No, but the "bare metal hosts" should be configured through configuration management and all logs sent "off site". I've been in the same boat and never had to log into a bare metal box. K8s and Ansible will do that for ya.

freehunter · on March 27, 2018

I work on those "off site" machines that collect your logs and I'm sitting on one as root right now.

Ansible doesn't get you everywhere.

movedx · on March 28, 2018

Yikes! We use time series databases, so we simply export the data from them to analytical software or our local machines.

InfluxDB is a simple command to extract what I need and then work with it from there.

closeparen · on March 27, 2018

Puppet deploys a variety of host-level agents like L7 routing, dynamic config updaters, the scheduler agent (of course), metrics, logging, and tracing collection. Usually they work, but it’s necessary more often than you’d think to find out why one has fallen over and fix it, and to run diagnostic tools to investigate performance anomalies.

avip · on March 27, 2018

The age of immutable inf. did not completely free us from occasional hotfixing production issues, because immutable deployment takes time, and if it's downtime you're accountable (as sysadmin/devops/whatever).

Rowern · on March 27, 2018

So I am not the only one to have a slow eployment because re-building AMI and re-provisioning everything takes quite some times.

I also had a difficult time to explain that a prod deploy of 30min (image creation, deploy with blue green) is normal for this kind of inf... Did you face the same thing?

scrollaway · on March 27, 2018

Rebuilding AMIs is not a thing you should be doing every deployment. Sounds like you are on AWS so use proper containers on ECS or EBS. Docker itself caches pretty aggressively. Decompose your projects as well so that the independent parts build and deploy without rebuilding everything else in the project that hasn't changed.

At the end of the day, if you're on continuous deployment, a commit should be rebuilding only what it touches. We have 4 min long deployments + 1.5 min tests and I definitely don't think we're optimizing aggressively.

devonkim · on March 27, 2018

Things get awkward when versioning standards require all application components to have the same version because several version numbers running around is cognitive overhead that engineers can’t afford in many situations. With more than about 10 components I’ve usually seen it turn into “deploy 10 services that have 10 changes, 9 of which have one commit that bump a version number up.”

Many places still keep producing very stateful software (sometimes even very much by choice) that is better off managed through Puppet / Chef rather than an immutable containerized approach. If your software needs to take an hour and a half to shutdown, for example, you have to get a bit creative with your deployment strategies.

gpmcadam · on March 27, 2018

A stitch in time, saves `kill -9`

gowld · on March 27, 2018

Sounds like you are cutting corners on redundancy. Saves $$$, but risky.

shirro · on March 28, 2018

When you have a small number of systems, a small team (possibly one person) and a relatively simple configuration it can be hard to justify spending time and money on a lot of deployment and automation. It isn't that deploying systems as you suggest isn't a better way of doing things. It is just the benefits are far greater at scale. At the scale I typically work it would be over-engineering most of the time.

markwillis82 · on March 27, 2018

We have an immutable hosting platform with ansible, but some clients are hosted on their own boxes so we still end up logging into their hosting to sort out their issues (normally they have more problems than we do).

But they aren't interested in investing in more infrastructure when a single LNMP server does the job.

subway · on March 27, 2018

even on immutable hosts it isn't surprising to see somebody has ssh'd in for debugging purposes. they might be attaching a debugger, taking a core dump, or doing a number of other things that could cause a drastic perf hit while not actually mutating the host's state otherwise.

jameshart · on March 27, 2018

Why take the risk of continuing to run a tainted host, though, if you can just tear it down and spin up a new, clean, untouched one?

I think there’s another level we need beyond treating servers as pets or cattle, which is treating servers as wild animals; after you’ve captured one and interacted with it, you’ve doomed it because now it has the scent of humans on it.

subway · on March 27, 2018

Sure, but that does nothing to prevent the person who invoke the debugger from inadvertently triggering prod alerts, possibly encouraging more folks to ssh in to see what's going on.

gowld · on March 27, 2018

Good hiring and training practices prevent people from invoking the debugger on live prod machines

subway · on March 27, 2018

There are times when it has to happen. It certainly shouldn't be an every day occurrence, but even in the best environments, there are times when SHTF only under a production workload, and you need an understanding of why yesterday.

drinchev · on March 27, 2018

Mostly `root` with ssh-keys for authentication. Some provisioning scripts ( ansible ) that run under root and pretty much that's it.

I haven't seen human users in passwd, since virtualisation kicked in couple of years back.

chillidoor · on March 27, 2018

I do this daily, mainly because most of our infrastructure isn't immutable yet :(

cup-of-tea · on March 27, 2018

I do. In science we share clusters/supercomputers.

sneak · on March 27, 2018

You are at the right hand side of the curve. Professionals do it that way, yes. Many smaller or less-well-managed organizations have a lot more warts.

gyrgtyn · on March 27, 2018

How, after like 20 years goofing around on linux, have I never heard of `w` ?

spudlyo · on March 27, 2018

Do you know about `comm`? Given two sorted files it will show you in three columns.

    * unique lines to file1
    * unique lines to file2
    * lines common to both files

You can pass it various options to suppress any of the three columns.

I once had an interview at Facebook where one of the problems was easily solved by `comm`. I found it funny that the interviewer, or anyone who reviewed the interview question, had never heard of it. I was a good sport about it though, I ended up writing a janky Perl script that roughly implemented `comm` to solve the problem, which (modulo Perl) was what they wanted me to do.

theoh · on March 27, 2018

In that vein, I was asked about implementing n-way merge of logfiles in an interview once. The key insight/recollection they wanted was: use a heap to implement a priority queue. Not sure it was a great question as jumping to employ a data structure like that might suggest premature optimization.

rthille · on March 28, 2018

yeah, I'd guess `cat *.log | sort`

theoh · on March 29, 2018

This was for Zeus, the high-performance web server guys. So I guess it was a thing for them to optimize in general: a USP.

If you specify that the files are way too big to fit in memory, well, that's a different story.

gyrgtyn · on March 27, 2018

`comm` at least sounds vaguely familiar. Thanks for reminding me; looks useful.

mrb · on March 27, 2018

One time I briefly looked at the man pages of all the binaries under /{bin,sbin} /usr/{bin,sbin} that I didn't know about. A basic Debian install has only 555 binaries (https://pastebin.com/raw/VnrqDdq0). Maybe 100-200 are new to you and are worth checking out. Do it.

js2 · on March 27, 2018

This is a good strategy. I grew up on SunOS before Linux. It had (has?) a fantastic set of man pages. You could start with "man intro" and go from there. Looks like it still leads you to the "1M System Administration" section:

https://docs.oracle.com/cd/E19683-01/816-0211/6m6nc66m6/inde...

The Linux intro man pages are still pretty useless sadly.

drinchev · on March 27, 2018

I remember I used `w` to check if someone is in the box and then use `talk` to open a chat with that user.

sireat · on March 27, 2018

I remember 'w' from reading Cuckoo's Egg some 25+ years ago as a teenager. https://en.wikipedia.org/wiki/The_Cuckoo%27s_Egg

I still do it instinctively on my own boxes.

scott_s · on March 27, 2018

I thought the same thing. I usually do `who`, `top` and `uptime`. But `w` is far better than `who`.

sah2ed · on March 27, 2018

Same here, although I'm only 14 years in.

icedchai · on March 27, 2018

You're both newbs. ;) My first Linux box ran kernel 0.99.10.

dvh · on March 27, 2018

I run df first, very often something stopped working because the disk space ran out

aepiepaey · on March 27, 2018

...and then df gets stuck in uninterruptible sleep (due to a file system hang), and ^C and ^Z does nothing.

jandrese · on March 27, 2018

Then you know it is most likely a NFS/CIFS issue or hard disk failure.

Downside is you need to log in with a new shell and tread lightly because it's easy for anything to get stuck in that state. Checking the syslog for NFS errors is a good place to start, or inspecting the fstab to see what is supposed to be mounted.

executesorder66 · on March 27, 2018

How would you avoid that, and still find out the disk usage?

zbentley · on March 27, 2018

You wouldn't. If the filesystem is hung, or if some common path is never yielding out of blocking-no-matter-what-calls (e.g. stat()), then the presence of the hang itself would indicate an issue. The isolation process for me would probably be something like:

1. df -h; notice that it hangs.

2. Log in to a new shell, 'strace' the old process or a new one doing the same thing, see what path it was choking on.

3. If the breakage is on an external/network filesystem, reboot the host in almost every case. Unless it was happily completing day 364/365 of some incredibly important task elsewhere, it's just not worth my time to remount a dead share and go clean up everything that was broken trying to talk to the old one. I've had database servers lose some random NFS share that the DB process wasn't using, then crash months later due to PID exhaustion because some monitoring script in cron that kept trying to talk to a somehow-corrupted mountpoint and hanging forever. Yes, timeouts and client programs should be able to handle these failures perfectly in theory. Given my experience, I have very little faith in theory matching up with reality.

4. If it's on an internal drive, check dmesg/syslog (if I can) for any smoking guns. Reboot and see if the problem goes away. If it does, unless I can find something blindingly obvious indicating that the issue was transient and unlikely to reoccur, I'm probably reprovisioning the system after a hardware diagnostic. Even if the server isn't critical and just serves a cat blog or whatever, it's not worth my time and repeated head-scratching to deal with issues like this more than once per host.

5. If I need data off of the questionable filesystem, I'll get it exclusively via a recovery environment; not worth the risk in the case of a flaky/failing drive otherwise (this applies even if the server itself was virtualized). I hope the server had some sort of LOM console set up so I can do that, otherwise someone's getting travel expenses for my trip onsite.

Edits: grammar.

insanejudge · on March 27, 2018

Interesting (well, interesting to me) note on the nfs case, on modern linux, `umount -l` should be able to unmount pretty much anything. You'll often still be left with a pile of processes stuck in uninterruptible sleep depending on the scope of the 'random share', but at the very least it can staunch the bleeding and let you move around.

TBH I get rather claustrophobic when I can't `w` with aplomb.

sevagh · on March 28, 2018

I had a similar incident (I was able to ctrl-c out of the hanging df -h though). Luckily dmesg gave me something super clear (like `nfs host <blah> unreachable`), so I did a `umount -l` (lazy) on `/mnt/net/<blah>` and things were OK.

emmelaich · on March 27, 2018

Simple ..

   df -h &

cat199 · on March 27, 2018

# df &

thereyago.

Symbiote · on March 27, 2018

That's one thing I don't need to do. If the disk gets to 80%, I get an email from the monitoring system. 90% and I get a text.

jiri · on March 27, 2018

Anecdocte: our sysadmin set similar thing up (at 90%) at one of my boxes. but conflicting configuration of debian os also reserved 15% of capacity for OS, so system became completely unresponsive at 85% full disk, without emailing message about free space - 85% was for our purpose effectively 100% full disk.

Still dont know what lesson I should learn from this story.

more-coffee · on March 27, 2018

Reserved capacity as in `sudo tune2fs <volume> | grep Reserved block count`? Those should already be excluded from available diskspace, so that is kind of interesting.

jiri · on March 27, 2018

I am not sure how he was getting free space for email warning, but in final state, running df showed plenty of space on disk, although writing to files signals "no disk space" error.

db48x · on March 27, 2018

Reserved capacity is reserved for use by root, so when a process running as root runs df (or any similar command or syscall), then it sees that capacity as available. Kinda useless these days, but was very useful back when you might have server daemons and a bunch of normal users with shell accounts on the same machine.

teddyh · on March 27, 2018

Huh? I have never seen that behavior, and I can’t reproduce it as described. Did it ever work that way? Do you have any reference for this?

db48x · on March 28, 2018

Well, there's man tune2fs:

-m reserved-blocks-percentage Set the percentage of the filesystem which may only be allocated by privileged processes. Reserving some number of filesystem blocks for use by privi‐ leged processes is done to avoid filesystem fragmentation, and to allow system daemons, such as syslogd(8), to continue to function correctly after non- privileged processes are prevented from writing to the filesystem. Normally, the default percentage of reserved blocks is 5%.

-g group Set the group which can use the reserved filesystem blocks. The group parameter can be a numerical gid or a group name. If a group name is given, it is converted to a numerical gid before it is stored in the superblock.

teddyh · on April 1, 2018

I meant this bit:

> when a process running as root runs df (or any similar command or syscall), then it sees that capacity as available.

I have never seen that. I have only ever seen the “real” available space being shown by tune2fs, not df or anything similar, which have always shown the space available after subtracting the reserved space.

letientai299 · on March 27, 2018

Could you please share how did you setup to be able to get such notification?

LilBytes · on March 27, 2018

Not OP, but we use Nagios XI for such alerting, it provides SMS and email monitoring for infrastructure (disk space as an example) and service monitoring. [1]

Their open source NCPA (Nagios Cross Platform agent) plugin for Windows and Linux hosts is pretty great, though we were stung when we found out it didn't natively support SPARC as of yet unless you compiled your own client from the source (I did). [2]

Plenty of other monitoring services equivalent to Nagios offer equivalent services. Nagios was just the flavour I suggested to my current employer because of my familiarity with the product over my last few employments.

If you don't want to fork out the licensing for the supported version (XI), their open source version is free and relatively easy to deploy using Ansible if you don't mind writing in PHP. [3]

Edit: words, grammar.

[1] https://www.nagios.com/products/nagios-xi/

[2] https://github.com/NagiosEnterprises/ncpa

[3] https://hobo.house/2016/06/24/automate-nagios-deployment-wit...

more-coffee · on March 27, 2018

I will give a simplified example of the only product I've used for this: Zabbix. Disclaimer: I'm not a sysadmin but a developer, who happened to work somewhere they used Zabbix for monitoring all kinds of network devices, databases, services, etc. Perhaps there are better solutions available, I welcome any discussion on that.

Deploying Zabbix roughly consists of setting up one (or more) Zabbix Master servers, and installing a 'zabbix-agent' process on each device you want to monitor. The agent process extracts a variety of statistics about the system and makes them available to the Monitoring server, either in push ("active") or pull ("passive") fashion.

The Master logs all of these statistics over time. You can then define 'Triggers' that apply logical tests to statistics, such as "in the last 5 minutes, was the free diskspace < 10GB". When this happens, it triggers an event.

Then elsewhere you have defined your Notification rules that act upon generated events, to send out an email or text message. https://www.zabbix.com/features#notification

This was obviously really simplified, and there is so much more you can do, but I hope this at least gave you a basic picture.

mdekkers · on March 27, 2018

Have a look at netdata: https://github.com/firehol/netdata

navinsylvester · on March 27, 2018

It's very straight forward to setup. I have used Prometheus for the same and so far the experience has been much better compared to Zabbix.

# node_exporter on the client to get metrics # prometheus on server to pull the metrics from node_exporter # alertmanager for raising alerts # grafana on top of it for pretty graphs(optional)

vk23 · on March 27, 2018

Have a look at nagios. It's a popular open source server/application monitoring tool. You can define custom alerts and monitor basicly everything.

softawre · on March 27, 2018

Outside of nagios, you can also use pricier but arguably more fully featured setups like New Relic + PagerDuty.

recentdarkness · on March 27, 2018

I don’t I rather run top to see some stats of the system. Mostly I am checking for slow downs and that shows me load mem consumption etc. Diskspace is rarely the case

Most of the time some database connection is laggy

dspillett · on March 27, 2018

Or, where available, htop.

linedash · on March 27, 2018

atop is best top. One of the few bits of software I evangelise.

Writes to replayable binary logfile with 10 minute system-state snapshots and uses process accounting to ensure it see's every process during that time.

Gives more metrics than any other top; including network, disk and all the counters that you have to check the man page to know what they refer to.

Its counterpart atopsar lets you replay the data for specific stats in an easily viewable format; i.e; - atopsar -m - this shows the memory stats for todays logfile in 10 minute increments.

It goes on every single server I manage without exception. With atop you can actually see why it died instead of guessing from old log entries.

Screenshot of atop : https://www.atoptool.nl/images/screenshots/genericw.png

Example output of atopsar:

  # atopsar -m
  
  *snipped*  2.6.32-896.16.1.lve1.4.51.el6.x86_64  #1 SMP Wed Jan 17 13:19:23 EST 2018  x86_64  2018/03/27
  
  -------------------------- analysis date: 2018/03/27 --------------------------
  
  00:00:02  memtotal memfree buffers cached dirty slabmem  swptotal swpfree _mem_
  00:10:02    14027M    928M    839M  6335M    1M   4645M        0M      0M
  00:20:02    14027M    963M    842M  6393M    1M   4641M        0M      0M
  00:30:02    14027M    756M    873M  6617M    1M   4638M        0M      0M
  00:40:02    14027M    576M    871M  6596M    3M   4634M        0M      0M

https://www.atoptool.nl/index.php

teddyh · on March 27, 2018

I sometimes hear about “atop”, wonder “Why don’t I have this installed?”, install it, discover that it starts (and requires) two additional daemon processes, at which point I remember, and promptly uninstall it again.

linedash · on March 28, 2018

Yes; the bit that manages the process accounting and the other bit for writing the log files...

Personally I consider two processes and 40MB of ram to be negligible for the benefits it brings.

You can indeed use it as a standalone top without either of these processes too. You're just giving up one of the main benefits (replayable logs) outside of the extra stats.

In short; you're moaning about what exactly?

avip · on March 27, 2018

If you have a tiny fleet of instances, datadog free plan would solve that for you (not affiliated, it's a good product).

Symbiote · on March 27, 2018

> up 23 days > Right there, I can see that okay, the box hasn't been rebooted recently

Is the implication of "recently" that it should have been rebooted every couple of weeks or something?

My desktop is at 54 days (since I moved desk, I think) and picking a random Hadoop node:

> 10:26:47 up 303 days, 11:56, 1 user, load average: 44.12, 50.56, 47.48

It's private, so kernel updates aren't a security issue.

(Keeeping this on-topic, "history" tells me the most frequent thing I do on this node is "sudo iftop"; we've been doubting the accuracy of our monitoring system's network utilization graphs.)

RobAley · on March 27, 2018

I think her thinking was "the problem started recently (~days), did the box get rebooted recently (~days) which might indicate when the problem started", rather than it should have been rebooted recently.

subway · on March 27, 2018

w; df; dmesg; top

usually there's a loud sob somewhere in between.

agumonkey · on March 27, 2018

reading w df in my head made me laugh

rurban · on March 27, 2018

> If you want to impress me, set up a system at your company that will reimage a box within 48 hours of someone logging in as root and/or doing something privileged with sudo (or its local equivalent). If you can do that and make it stick, it will keep randos from leaving experiments on boxes which persist for months (or years...) and make things unnecessarily interesting for others down the road.

Ferrari does this. They do a lot of experimentation on their most expensive internal test equipment, and every now and then the whole box is re-imaged automatically, even if it's completely locked down from outside. It's their internal staff who is corrupting/improving the system. Only if it's a really good and well-tested improvement they will make it stick.

softawre · on March 27, 2018

Anybody running Chaos Monkey (from Netflix) does this, at lesat for their stateless services.

olefoo · on March 27, 2018

I do that too. I probably wouldn't if I hadn't learned on a shell server where that was how you found out who else was up.

SubiculumCode · on March 27, 2018

i did not know w was a command. I've used the 'last' command https://www.cyberciti.biz/faq/linux-unix-last-command-exampl...

chris_wot · on March 27, 2018

I never even knew about that command...

Freak_NL · on March 27, 2018

Me neither. I just asked two colleagues (one on MacOS, one on Ubuntu) and got the same “huh, I never knew?” response I had.

Sure there are tons of unknown commands on any OS, but a one-letter command you've never heard of somehow amplifies the amazement.

jlgaddis · on March 27, 2018

In a (normal/typical) shell, hit "<Tab><Tab>y" and be amazed.