Structured Query Language (SQL - pronounced either “es cue el” or “sequel”) is a programming language used to manage data that is stored in a database. It is not a program itself but it is used in many programs, especially relational database management systems (RDBMSs).
While the syntax and semantics of the SQL language is defined in an official standard (maintained by ISO/IEC) it is common for RDBMSs to have their own dialects. These dialects are extensions to the standard, meaning that the core elements of the language will always be there but with some additions and quirks that depend on the exact RDBMS you are using. This page will use commands tested in SQLite, one of the most accessible and easy-to-use of the RDBMSs that uses SQL.
SQL is most useful when:
SQL is less useful when:
In these cases, working with the data directly in Microsoft Excel, LibreOffice Calc or Google Sheets might be a better option if the project is small. For in-between sized projects, importing into Python and using pandas for analysis is usually best.
SQLite is a database engine (it allows users to change the data in a database) that uses the SQL language (or rather its particular dialect of the language). It is not itself a program but rather a library – software that is embedded into a fully-fledged program. The database being interacted with gets stored in a single file on your local machine and SQLite can be run, offline, from your terminal. This makes it very simple for small projects and for beginners.
Install SQLite from the terminal with:
$ sudo apt install sqlite3
You can then open it with the following (but beware that you will be working in ‘memory’ as opposed to having what you do be saved to a file):
$ sqlite3
Usually you will want to have your work be saved to a file. To launch SQLite and create a new database file (or open an existing database file if you specify the name of one that already exists) the command is as follows:
$ sqlite3 database_name.db
Here is a cheat sheet that covers most of the common functionality of SQL:
CREATE DATABASE database_name;
SHOW DATABASES;
DROP DATABASE database_name;
USE database_name;
.save database_name.db
CREATE TABLE table_name(column_1 data_type, column_2 data_type, column_3 data_type);
INTEGER
: a whole numberREAL
: a decimal numberTEXT
: a string of text characters. Quotation marks are
needed.DATE
: a date (in YYYY-MM-DD format). Quotation marks
are needed. Possible data types that can be used in PostgreSQL:integer
: a whole numberdecimal
: a decimal numbermoney
: a decimal number with 2 decimal placesboolean
: a logical value of TRUE or FALSEchar(n)
: a fixed-length string removes trailing
blanksvarchar(n)
: a variable-length stringtext
: an unlimited-length stringCREATE TABLE table_name(column_1 data_type NOT NULL);
CREATE TABLE table_name(column_1 data_type UNIQUE);
CREATE TABLE table_name(column_1 data_type DEFAULT 'default_value');
SHOW TABLES;
.tables
DROP TABLE table_name;
Create a table with keys:
CREATE TABLE table_name(column_1 data_type PRIMARY KEY);
CREATE TABLE table_name(column_1 data_type, column_2 data_type PRIMARY KEY (column_1, column_1));
CREATE TABLE table_name (othertable_id data_type, FOREIGN KEY (othertable_id) REFERENCES othertable(id));
CREATE TABLE table_name (othertable_id data_type, FOREIGN KEY (othertable_id) REFERENCES othertable(id) UNIQUE);
INSERT INTO table_name(column_name) VALUES(value);
INSERT INTO table_name(column_1, column_2, column_3) VALUES(value_1, value_2, value_3);
UPDATE table_name SET column_1 = new_value WHERE column_2 = value;
DELETE FROM table_name WHERE column_name = value;
ALTER TABLE table_name ADD COLUMN new_column data_type;
ALTER TABLE table_name RENAME COLUMN old_name TO new_name;
.mode mode_type
.mode column
.mode line
.mode html
.mode insert
.mode tabs
.mode csv
.mode tcl
.headers on
SELECT * FROM table_name;
SELECT column_name FROM table_name;
SELECT column_1, column_2, column_3 FROM table_name;
SELECT * FROM table_name LIMIT 10;
SELECT column_name AS 'alias' FROM table_name;
SELECT DISTINCT column_name FROM table_name;
SELECT * FROM table_name WHERE column_name IS NOT NULL;
SELECT * FROM table_name WHERE column_name > value;
=
: equal to!=
: not equal to>
: greater than<
: less than>=
: greater than or equal to<=
: less than or equal toSELECT * FROM table_name WHERE column_name BETWEEN value_1 AND value_2;
SELECT * FROM table_name WHERE column_1 > value AND column_2 < value;
SELECT * FROM table_name WHERE column_1 BETWEEN value_1 AND value_2 AND column_2 > value_3;
SELECT * FROM table_name WHERE column_1 BETWEEN value_1 AND value_2 OR column_2 > value_3;
AND
OR
SELECT * FROM table_name WHERE column_name LIKE 'te_t';
SELECT * FROM table_name WHERE column_name LIKE '%substring%';
SELECT * FROM table_name ORDER BY column_name;
SELECT * FROM table_name ORDER BY column_name DESC;
SELECT * FROM table_name WHERE column_1 > value ORDER BY column_2 DESC;
SELECT column_1, ROUND(column_2, num_decimal_places) FROM table_name;
SELECT COUNT(*) FROM table_name;
SELECT COUNT(*) FROM table_name WHERE column_name = value;
SELECT SUM(column_name) FROM table_name;
SELECT SUM(column_1) FROM table_name WHERE column_2 = value;
SELECT MAX(column_name) FROM table_name;
SELECT MAX(column_1) FROM table_name WHERE column_2 = value;
SELECT MIN(column_name) FROM table_name;
SELECT MIN(column_1) FROM table_name WHERE column_2 = value;
SELECT AVG(column_name) FROM table_name;
SELECT AVG(column_1) FROM table_name WHERE column_2 = value;
SELECT * FROM table_name GROUP BY column_name;
SELECT column_1, AVG(column_2) FROM table_name GROUP BY column_1 ORDER BY column_1;
SELECT column_1, AVG(column_2) FROM table_name GROUP BY column_1 ORDER BY column_1;
1
is the
first column selected, 2
is the second, etc):SELECT column_1, AVG(column_2) FROM table_name GROUP BY 1 ORDER BY 1;
SELECT column_1, AVG(column_2) FROM table_name WHERE column_3 > value GROUP BY column_1 ORDER BY column_1;
SELECT column_1, COUNT(column_2) FROM table_name GROUP BY column_1 HAVING COUNT(column_2) > value;
SELECT * FROM table_1 JOIN table_2 ON table_1.column_1 = table_2.column_1;
SELECT table_1.column_2, table_2.column_3 FROM table_1 JOIN table_2 ON table_1.column_1 = table_2.column_1;
SELECT * FROM table_1 JOIN table_2 ON table_1.column_1 = table_2.column_1 WHERE column_2 = value;
SELECT * FROM table_1 LEFT JOIN table_2 ON table_1.column_1 = table_2.column_1;
ON
statement is
required):
SELECT * FROM table_1 CROSS JOIN table_2;
SELECT table_1.column_1, table_2.column_2 FROM table_1 CROSS JOIN table_2;
SELECT * FROM table_1 UNION SELECT * FROM table_2;
SELECT column_1
CASE
WHEN column_2 = 5 THEN 'Great'
WHEN column_2 >= 4 THEN 'Good'
WHEN column_2 >= 3 THEN 'Average'
WHEN column_2 >= 2 THEN 'Ok'
ELSE 'Poor'
END AS 'new_column_name'
FROM table_name;
WITH previous_results AS (
SELECT column_1 AS 'alias' FROM table_1 UNION SELECT * FROM table_2
)
SELECT * FROM previous_results JOIN table_3 ON table_3.column_2 = previous_query.alias;
.open dabase.db
.schema
.schema table_name
.show
.nullvalue 'NULL'
.width 15 20
.dump
.exit
Export to text file (can then open with a text editor):
.output path.sql
.dump --or `select * from table_name;`
.output stdout
Export table to CSV:
.mode csv
.headers on
.output path.csv
select * from table_name;
.output stdout
Export to HTML:
.mode html
.output path.html
select * from table_name;
.output stdout
Import table:
.read path.sql
Import from CSV:
.mode csv
.import path.csv