Back2School with Vectors, Cosine Similarity, and Word2Vec

Tomorrow, I’ll be making a return visit to the high school where I spent a decade in the mathematics department as a teacher. I’ve got the chance to speak to ten classes over the course of six class periods and tell them a little bit about what I do as a data scientist.

Since many of the students will be familiar with concepts like vectors and trigonometry, I’ve decided to do an activity involving the Python gensim package and Word2Vec. Specifically, each student was asked to submit a “Tweet” about the most interesting thing they’ve done in the last couple of couple of weeks. I was given those Tweets last week and have prepared a little talk and code walk through about how we can use Word2Vec to identify similar Tweets by transforming unstructured text with word embeddings and comparing their cosine similarity.

I’ve decided to go ahead and share the code in a Github repo. If you’re interested in word embeddings, I hope you’ll find it helpful. I’m also posting the presentation I’m giving tomorrow below, but some formatting of indents, margins, etc. did get lost in the process of wrapping it in an iframe, so if you want to see it in the best possible form, check it out here.

 

Related articles:

Lucas Allen

Lucas Allen

For more than a decade, Lucas Allen was a high school math teacher and math team coach in Illinois. His 2012 Morton High School math team won the Illinois state championship. Recently, he made the jump from public education to the corporate world and is now working as a data scientist. He is interested in just about all forms of technology, including the TI-Nspire, Nexus devices, R, MOOCs, and more. You can follow , and if you are nice, he will probably follow you back.

More Posts - Website

Follow Me:
TwitterFacebookGoogle Plus


Sorry, comments are closed for this post.