Schedule

Note that the schedule may be subject to change. Please check the course website frequently for the latest schedule.

For your reference: How to read & review a paper? How to give a talk?

Week Date Topics References Notes
1 09/02 Introduction lecture
2 09/09 Partition-based Method for String Similarity Join lecture
3 09/16 Partition-based Method for Set Similarity Join lecture
4 09/23 Heap-based Method for Overlap Set Similarity Join lecture
5 09/30 Heap-based Method for Approximate Entity Extraction lecture
6 10/07 Prefix Filtering for Set Similarity Search lecture
7 10/14 Prefix Filtering for String Similarity Search lecture
8 10/21 Product Quantization for Nearest Neighbor Search
9 10/28 Proximity Graph for Nearest Neighbor Search e.g., Pandas
10 11/04 LSH and R-Tree for Nearest Neighbor Search e.g., Trifacta, OpenRefine
11 11/11 Near-duplicate Passage Dection e.g., Magellan, Biggorilla
12 11/18 Program Synthesis for Data Wrangling
13 11/25 Thanksgiving - No Class
14 12/02 Data Cleaning Guest Lecture
15 12/09 Final Exam Final Exam