Python | BERT | Pandas | sklearn

Github | Google Colab

Our goal this summer was to deeply analyze a data set and make meaningful conclusions. Thus, we decided to analyze a data set that was very personal - our text messages. We used Google’s natural language processing model, BERT, as our main method of analyzing our data. First, we used BERT’s word embeddings and sklearn’s logistic regression to train a linear model in order to classify the author of a text message. Then, we used BERT’s word embeddings and T-SNE to graph our texts before the shelter-in-place, after the shelter-in-place but before school ended, and when the summer began.