This data set contains anonymized data collected from Reddit (via the Pushshift API) and StackOverflow (from Kaggle's dataset). Each folder includes the data split by trimester. The schema of StackOverflow and Reddit-related files follows: Fields from StackOverflow question_id answer_id creation_date - answer creation_date score - score of the question/answer tags - all tags flagged for a question answer_count - number of answers for a question start_question - question's time of creation last_activity_date - last update on the question new_id - hashed id of the answerer q_new_id - hashed id of the questioner Fields from Reddit comment_id submission_id score - score of the question/submission subredd...