2017-06-26 data

Learn just enough Linux to get things done

Different operating systems have long catered to different audiences: Windows for the business professional, Mac for the creative professional and Linux for the software developer. For OS providers, this sort of market segmentation greatly simplified product vision, technical requirements, user experience and marketing direction. However, it also reinforced workplace norms which bucket individuals into narrow, non-overlapping domains: business people can offer no insight into the creative process, and developers no insight into business problems.

In reality, knowledge and skill are fluid, spanning multiple disciplines and fields. The notion that “you can only be good at one thing” is not a roadmap to mastery but rather a prescription for premature optimization. You can only know what you’re good at once you’ve sampled a lot of things - and you may just find that you’re good at a lot of them.

For modern business analysts, bridging the gap between business and software development is especially important. Business analysts must be “dual platform,” able to leverage command-line tools available only on Linux (or OS X) yet still benefit from the power of Microsoft Office on Windows. Understandably, the world of Linux is intimidating for those with a business degree. Fortunately, as with most things, you only need to learn 20% of the information to accomplish 80% of the work. Here is my 20%.

Why modern business analysts should know Linux #

Due to its open source roots, Linux benefited from the contributions of thousands of developers over time. They built programs and utilities not only to make their jobs easier, but also the jobs of programmers who followed them. As a result, open source development created a network effect: the more developers built utilities on the platform, the more other developers could leverage those utilities to write their programs right away.

What resulted was an expansive suite of programs and utilities (collectively, software) that were written in Linux, for Linux - much of which was never ported to Windows. One example of this is the popular version control system (VCS) called git. Developers could have written this software to work on Windows, but they didn’t. They wrote it to work on the command line for Linux because it was the ecosystem which already had all the tools they needed.

Concretely, development on Windows runs into two main problems:

Basic tasks, like file parsing, job scheduling, and text search are more involved than running a command-line utility
Programming languages (eg. Python, C++) and their associated code libraries will throw errors because they are expecting certain Linux parameters or file system locations

Together, this means more time spent rewriting basic tools already available in Linux and troubleshooting OS compatibility errors. This is not a surprise - the Windows ecosystem simply wasn’t designed with software development in mind.

With the case made for Linux development, let’s begin with the basics.

The fundamental unit of Linux: the “shell” #

The shell (also known as the terminal, console or command line) is a text-based user interface through which commands are sent to the machine. On Linux, the shell’s default language is called bash. Unlike Windows users who primarily point-and-click inside of windows, Linux developers stick to their keyboard and type commands into the shell. While this transition is at first unnatural for those without a programming background, the benefits of developing in Linux easily outweigh the initial learning investment.

image-asset (41x).png

Learning the few important concepts #

Compared to a full-fledged programming language, bash only has a few major concepts that need to be learned. Once these are covered, the rest of bash is just memorization. I’ll restate for clarity: being good at bash is simply memorizing about 20-30 commands and their most common arguments.

Linux seems impenetrable to non-developers because of the way that developers seem to effortlessly regurgitate esoteric terminal commands at will. The reality is that they committed only a few dozen commands to memory - for anything more complicated, they too (like all mere mortals) consult Google.

With that out of the way, here are the main concepts in bash.

Command syntax #

Commands are case-sensitive and follow the syntax of: {command} {arguments..}

For example, in ‘grep -inr’, grep is the command (to search for a string of text) and -inr are flags/arguments which change what grep does by default. The only way to learn what these mean is to look them up through Google or by typing ‘man grep’. I recommend learning the commands and their most common arguments together; it’s too burdensome otherwise to remember what each and every flag does.

Directory aliases #

The present directory (ie. where am I?): .
The parent directory of the present directory: ..
The user’s home directory: ~
The file system root (or the parent of all parents): /

For example, to change from the current directory to the parent directory, one would type: cd ..

Similarly, to copy a file located at “/path/to/file.txt” into the present directory, one would enter cp /path/to/file.txt . (note the period at the end of the command). Since these are no more than aliases, the actual path name could be used in their place instead.

STDIN / STDOUT #

Anything you type into the window and submit (via ENTER) is called standard input (STDIN).

Anything that a program prints back out to the terminal (eg. text from within a file) is called standard output (STDOUT).

Piping #

**|**

A pipe takes the STDOUT of the command to the left of the pipe and makes it the STDIN to the command on the right of the pipe.
example: echo ‘test text’ | wc -l
**>**

A greater-than sign takes the STDOUT of the command on the left and writes/overwrites to a new file on the right
example: ls > tmp.txt
**>>**

Two greater-than signs takes the STDOUT of the command on the left and appends to a new or existing file on the right.
example: date » tmp.txt

Wildcards #

You can think of this like SQL’s % symbol - for example, you might write “WHERE first_name LIKE ‘John%’” to catch any first name starting with John.

In bash, you would write “John*”. If you want to list all of the files ending with “.json” in a folder, you would write: “ls *.json”

Tab completion #

Bash will often finish off commands intelligently for you if you start typing a command and hit your TAB key.

That being said, you should really use something like zsh or fish for autocomplete since it is hard to remember the commands and all their parameters - rather, these tools will autocomplete your commands based on your command history!

Quitting #

Sometime’s you’ll get stuck in some program and you can’t get out. This is a very frequent occurrence for beginners in Linux and it is extremely demotivating. Often, quitting has something to do with q. It’s good to memorize the following and try them all when you’re trapped.

Bash
CTRL+c
q
exit
Python
quit()
CTRL+d
Nano: CTRL+x
Vim: :q!

My memorized list of bash commands #

Here are the commands I use most frequently in Linux (sorted from most to least frequently used). As I mentioned before, knowing just a handful of commands will accomplish the vast majority of programmable tasks you need to perform.

cd {directory}
change directory
ls -lha
list directory (verbose)
vim or nano
command line editor
touch {file}
create a new empty file
cp -R {original_name} {new_name}
copy a file or directory (and all of its contents)
mv {original_name} {new_name}
move or rename a file
rm {file}
delete a file
rm -rf {file/folder}
permanently delete a file or folder [use with caution!]
pwd
print the present working directory
cat or less or tail or head -n10 {file}
STDOUT contents of a file
mkdir {directory}
make an empty directory
grep -inr {string}
find a string in any files in this directory or child directories
column -s, -t <delimited_file>
display a comma-delimited file in columnar format
ssh {username}@{hostname}
connect to a remote machine
tree -LhaC 3
show directory structure 3 levels down (with file sizes and including hidden directories)
htop (or top)
task manager
pip install --user {pip_package}
Python package manager to install packages to ~/.local/bin
pushd . ; popd ; dirs; cd -
push/pop/view directories onto the stack + change back to last directory
sed -i "s/{find}/{replace}/g" {file}
replace a string in a file
find . -type f -name '*.txt' -exec sed -i "s/{find}/{replace}/g" {} \;
replace a string for each file in this and child folders with a name like *.txt
tmux new -s session, tmux attach -t session
create another terminal session without creating a new window [advanced]
wget ${link}
download a webpage or web resource
curl -X POST -d "{key: value}" http://www.google.com
send an HTTP request to a web server
find <directory>
list all directory contents and their children, recursively

Advanced and infrequently commands #

I find it’s good to keep a list of commands that are useful in certain situations (eg. which process is blocking a certain network port), even though those situations don’t happen very often. These are some uncommon commands I keep nearby:

lsof -i :8080
list open file descriptors (-i flag for network interfaces)
netstat | head -n20
list currently open Internet/UNIX sockets and related information
dstat -a
stream current disk, network, CPU activity & more
nslookup <IP address>
find hostname for a remote IP address
strace -f -e <syscall> <cmd>
trace system calls of a program (-e flag to filter for certain system calls)
ps aux | head -n20
print currently active processes
file <file>
check what a file type is (eg. executable, binary, ASCII-text file)
uname -a
kernel information
lsb_release -a
OS information
hostname -l
check the hostname of your machine (ie. the name so other computers can reach you)
pstree
visualize process forks
time <cmd>
execute a command and report statistics about how long it took
CTRL + z ; bg; jobs; fg
send a process in current tty into background and back to foreground
cat file.txt | xargs -n1 | sort | uniq -c
count unique words in a file
wc -l <file>
line count in a file
du -ha
show size on disk for directories and their contents
zcat <file.gz>
display contents of a zipped text file
scp <user@remote_host> <local_path>
copy a file from remote to local server, or vice versa
man {command}
show manual (ie. documentation) for a command, but you’re probably better off using Google