Dataset underlying the paper "PreprintMatch: a tool for preprint publication detection applied to analyze global inequities in scientific publishing." preprint-paper-matches.csv lists all matches found by our algorithm between bioRxiv/medRxiv and PubMed, and preprint_affiliations.csv lists all extracted affiliations from bioRxiv/medRxiv. The Rxivist data dump (https://zenodo.org/record/4738007) was used for all preprint data, and the scrips to download PubMed data are available on our GitHub repository, https://github.com/PeterEckmann1/preprint-match. The full database dump, with all data used in the study, is available on Google Drive at https://drive.google.com/file/d/1ZoafhYUP-DO4Hd_4A_v7mbQLjN3JPzJv/view?usp=sharing. The PostgreSQL dat...