Ruben Gamboa
Professor
Actually, all of these are used in databases
Note: Sorted files and B-trees can handle comparisons with <, <=, >, >=, and =
But hash tables can only handle equality checks
And (rule of thumb) they are more efficient than the others
Instead of reading each tuple at a time, we can read many tuples at the same time, since they are on the same disk page
Disk I/Os is what matters, so we'll stop worrying about seconds
After all, disks get faster!
Since \(|S| \approx 14000\) or 140 pages, the Block Nested Loop join method takes up \(140+140^2 =\) 1.974 × 104 disk I/Os
That's a lot better, but it's still quadratic!
A good index will bring that up to 3.514 × 104 disk I/Os
But it's linear
You'd see an improvement in a larger school
E.g., with 100,000 students, it becomes
Here's a similar argument with selections
Suppose we need to compute \(\sigma(S)\), where \(S\) is the Students table at UW
Suppose each disk page can handle 100 students
Since \(|S| \approx 14000\) or 140 pages, without an index, we can execute the selection with \(140\) disk I/Os
SELECT title, year
FROM Movies
WHERE year = 2000
SELECT title, year, name
FROM Movies JOIN MovieExecs ON producerC# = cert#
WHERE year = 2000
CREATE INDEX
CREATE INDEX MovieYearIdx ON Movies(year)
CREATE INDEX StudentNameIdx ON Students(first_name, last_name)
DROP INDEX MovieYearIdx
SELECT title, year
FROM Movies
WHERE year = 2000
SELECT title, year
FROM Movies
WHERE year = 2000
SELECT movieTitle, movieYear
FROM StarsIn
WHERE starName = ?
SELECT starName
FROM StarsIn
WHERE movieTitle = ? AND movieYear = ?
INSERT INTO StarsIn VALUES(?,?,?)
Query | No Index | starName Index | movieTitle Index | Both Indexes |
---|---|---|---|---|
Q1 | 10 | 4.5 | 10 | 4.5 |
Q2 | 10 | 10 | 4.5 | 4.5 |
Q3 | 2 | 4.5 | 4.5 | 7 |
Total | 9.2 | 6.15 | 7.8 | 4.75 |
SELECT movieTitle, movieYear
FROM StarsIn
WHERE starName = ?
SELECT starName
FROM StarsIn
WHERE movieTitle = ? AND movieYear = ?
INSERT INTO StarsIn VALUES(?,?,?)
Query | No Index | starName Index | movieTitle Index | Both Indexes |
---|---|---|---|---|
Q1 | 10 | 4.5 | 10 | 4.5 |
Q2 | 10 | 10 | 4.5 | 4.5 |
Q3 | 2 | 4.5 | 4.5 | 7 |
Total | 6 | 5.6 | 6.15 | 5.75 |
SELECT movieTitle, movieYear
FROM StarsIn
WHERE starName = ?
SELECT starName
FROM StarsIn
WHERE movieTitle = ? AND movieYear = ?
INSERT INTO StarsIn VALUES(?,?,?)
date < 5/1/19 AND voter_id=5 AND precinct_id=3
Here are two different query plans:
SELECT *
FROM MovieStars
WHERE name < 'C'
SELECT *
FROM MovieStars
WHERE name < 'C'
\[R(x,y) \bowtie S(y,z)\]
foreach tuple r in R do:
foreach tuple s in S where r.y=s.y do:
add <r,s> to result
\[R(x,y) \bowtie S(y,z)\]
foreach tuple r in R do:
foreach tuple s in S where r.y=s.y do:
add <r,s> to result
\[R(x,y) \bowtie S(y,z)\]
foreach tuple s in S do:
foreach tuple r in R where r.y=s.y do:
add <r,s> to result
\[R(x,y) \bowtie S(y,z)\]
Cost: Sort(R) + Sort(S) + #Pages(R) + #Pages(S)
Cost of sorting depends on how much data we can load into memory
External sorts require a number of passes
For these tables, 2 passes should be enough
Each pass reads and writes data, so the total cost is
Cost: \(2\cdot2\cdot1,000 + 2\cdot2\cdot500 + 1,000 + 500 = 7,500\) I/O operations
Join Type | Cost |
---|---|
Index Join | 221,000 I/O ops |
Sort-Merge Join | 7,500 I/O ops |
Query plan space:
Estimate cost of each operation in query plan
Must also estimate size of result for each operation in tree!
Quality of optimizer is empirical: Does it find good query plans for typical queries?
SELECT attributes
FROM relations
WHERE cond1 AND cond2 AND ... AND condk
Cardinality of result = Max #tuples * RF1 * RF2 * ... * RFk
Assumes conditions are independent
Condition | Reduction Factor |
---|---|
col = value | \(1 / NKeys(I)\), for some index \(I\) on col |
col1 = col2 | \(1 / \max(NKeys(I_1),NKeys(I_2))\), for indexes \(I_1\) and \(I_2\) on col1 and col2 |
col1 > value | \(\frac{High(I)-value}{High(I)-Low(I)}\), for some index \(I\) on col |
StarsIn (name, title, year)
MovieStar (name, address, genre, birthdate, rating)
Similar to old schema, but with rating added to MovieStar
StarsIn
MovieStar
SELECT MS.name
FROM StarsIn AS SI, MovieStars AS MS
WHERE SI.name = MS.name AND SI.title='Star Wars' AND MS.rating>=4
SELECT MS.name
FROM StarsIn AS SI, MovieStars AS MS
WHERE SI.name = MS.name AND SI.title='Star Wars' AND MS.rating>=4
SELECT MS.name
FROM StarsIn AS SI, MovieStars AS MS
WHERE SI.name = MS.name AND SI.title='Star Wars' AND MS.rating>=4
SELECT MS.name
FROM StarsIn AS SI, MovieStars AS MS
WHERE SI.name = MS.name AND SI.title='Star Wars' AND MS.rating>=4
SELECT MS.name
FROM StarsIn AS SI, MovieStars AS MS
WHERE SI.name = MS.name AND SI.title='Star Wars' AND MS.rating>=4
SELECT MS.name
FROM StarsIn AS SI, MovieStars AS MS
WHERE SI.name = MS.name AND SI.title='Star Wars' AND MS.rating>=4
Method | Cost |
---|---|
Nested loop | 500,500 |
Push selections | 3,104 |
Push selections, BNL join | 1,902 |
Push selections & projections, BNL join | ~1,700 |
Clustered index on StarsIn | 15 |
SELECT count(distinct courses.crn), sum(courses.credits)
FROM enrolled, students, courses
WHERE enrolled.wnumber = students.wnumber
AND enrolled.crn = courses.crn
AND courses.subject = 'COSC'
SELECT count(distinct courses.crn), sum(courses.credits)
FROM enrolled, students, courses
WHERE enrolled.wnumber = students.wnumber
AND enrolled.crn = courses.crn
AND courses.subject = 'COSC'
EXPLAIN
SELECT count(distinct courses.crn), sum(courses.credits)
FROM enrolled, students, courses
WHERE enrolled.wnumber = students.wnumber
AND enrolled.crn = courses.crn
AND courses.subject = 'COSC'
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | students | index | PRIMARY | PRIMARY | 12 | 10473 | Using index | |
1 | SIMPLE | enrolled | ref | PRIMARY,wnumber | wnumber | 12 | students.wnumber | 2 | Using index |
1 | SIMPLE | courses | eq_ref | PRIMARY | PRIMARY | 7 | enrolled.crn | 1 | Using where |
EXPLAIN
SELECT count(distinct courses.crn), sum(courses.credits)
FROM enrolled, students, courses
WHERE enrolled.wnumber = students.wnumber
AND enrolled.crn = courses.crn
AND courses.subject = 'COSC'
EXPLAIN
SELECT count(distinct courses.crn), sum(courses.credits)
FROM enrolled, students, courses
WHERE enrolled.wnumber = students.wnumber
AND enrolled.crn = courses.crn
AND courses.subject = 'COSC'
CREATE INDEX courses_subject_idx
USING HASH
ON courses (subject)
EXPLAIN
SELECT count(distinct courses.crn), sum(courses.credits)
FROM enrolled, students, courses
WHERE enrolled.wnumber = students.wnumber
AND enrolled.crn = courses.crn
AND courses.subject = 'COSC'
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | courses | ref | PRIMARY, courses_subject_idx |
courses_subject_idx | 7 | const | 58 | Using where |
1 | SIMPLE | enrolled | ref | PRIMARY,wnumber | PRIMARY | 7 | courses.crn | 9 | Using index |
1 | SIMPLE | students | eq_ref | PRIMARY | PRIMARY | 12 | enrolled.wnumber | 1 | Using index |
EXPLAIN
SELECT count(distinct courses.crn), sum(courses.credits)
FROM enrolled, students, courses
WHERE enrolled.wnumber = students.wnumber
AND enrolled.crn = courses.crn
AND courses.subject = 'COSC'
CREATE INDEX enrolled_crn_idx
USING HASH
ON enrolled (crn)
EXPLAIN
SELECT count(distinct courses.crn), sum(courses.credits)
FROM enrolled, students, courses
WHERE enrolled.wnumber = students.wnumber
AND enrolled.crn = courses.crn
AND courses.subject = 'COSC'
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | courses | ref | PRIMARY, courses_subject_idx |
courses_subject_idx | 7 | const | 58 | Using where |
1 | SIMPLE | enrolled | ref | PRIMARY,wnumber, enrolled_crn_idx |
enrolled_crn_idx | 7 | courses.crn | 6 | Using index |
1 | SIMPLE | students | eq_ref | PRIMARY | PRIMARY | 12 | enrolled.wnumber | 1 | Using index |
SELECT count(distinct courses.crn), sum(courses.credits)
FROM enrolled, students, courses
WHERE enrolled.wnumber = students.wnumber
AND enrolled.crn = courses.crn
AND courses.subject = 'COSC'
Index | Cost |
---|---|
S(w#), E(crn,wnumber) | 20,946 |
C(subj), E(crn,wnumber), S(w#) | 522 |
C(subj), E(crn), S(w#) | 348 |