Data Mining
Streaming data mining
- Data Sketches - Yahoo's streaming data analysis library, supports approximate counting, sampling, and distribution metrics on streaming data.
- Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches - Suggested book to understand the topic
Frequent pattern mining
- Jiawei Han's Coursera course
- SPMF (Sequential Pattern Mining Framework) an open-source data mining mining library in Java
- PrefixSpan python - very simplified version
References and courses
Record Linking
- Pandas record linking tool - https://github.com/J535D165/recordlinkage
Evaluation
- A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints - Discusses B-cubed F1 score.
- Evaluation of coreference resolution systems - http://www.cs.cmu.edu/~yimengz/papers/Coreference_survey.pdf