Category Archives: Featured

Spark Dataframes and MLlib

Spark Dataframes and MLlib

spark-logoA couple of weeks ago, I got my first experience with Apache Spark. While I have yet to implement it with any meaningful problems, in my experience when working with a new tool or technology, just getting one’s feet wet can be crucial to getting a learning snowball rolling. Although Spark is primarily used for “big data” problems in data clusters, I have been experimenting with a very “small data” problem, a simple linear regression on California home prices. You can find the data set here. I’ve decided to put the resulting tutorial up on Tech Powered Math. Although there is nothing earth shattering in this post, I think some people will find it helpful for the following reasons.

  1. The method I used for working with the data is dataframes. Dataframes are a relatively new paradigm in Spark. They have only been available since Spark 1.3 in February 2015.
  2. I am using the Python API. While I suspect that PySpark is going to grow rapidly in popularity, there seem to be more resources for Scala at this time.
  3. I could find very few tutorials or even significant Q&A threads about using PySpark syntax and dataframes on Stack Overflow. That gives me cause to believe that even this simple tutorial about reading a CSV into Spark, doing some trivial data wrangling with dataframes, and performing a linear regression could be helpful to some individuals.

The code in the Jupyter Spark notebook below was completed with Spark 1.31.

TI-84 Plus CE Week

TI-84 Plus CE Week

It’s TI-84 Plus CE week here at Tech Powered Math. I’ve had a couple of weeks to interact with the new TI-84 Plus CE. I also recently had a nice informational chat with Texas Instruments’ reps about some TI-84 Plus CE news and how the TI-84 product line is evolving. All that added up to more than I… Continue Reading

Texas Instruments Launches STEM Behind Health

Texas Instruments Launches STEM Behind Health

It recently came to my attention that Texas Instruments launched a new initiative called STEM Behind Health. STEM Behind Health is designed to get students excited about health related careers. Activities from the initiative include a TI-Nspire document file as well as student worksheets and teacher notes. They were developed in conjunction with health care… Continue Reading

Favorite Podcasts for Data Scientists

One of my favorite learning methods is via podcasts. They allow me to multitask–exercising, driving, or doing chores–while listening to experts on a particular topic. Some of the podcasts I listen to are purely for entertainment (think Serial or StartUp) but many others are for educational purposes. As I’ve been trying to build up my… Continue Reading

My MOOC Study Strategies

My MOOC Study Strategies

If you’ve looked into MOOCs (Massive Online Open Courses) at all, you have probably wondered how successful students are at completing them compared to traditional courses. The short answer? Not very. I’ve seen various numbers floating around in a variety of studies, citing completion rates as low as 4% and as high as 8%, but… Continue Reading

Johns Hopkins Data Science Specialization Review

Johns Hopkins Data Science Specialization Review

It’s been a couple of weeks since Johns Hopkins issued final certificates for their Data Science Specialization on Coursera. I’m glad to say that I am now among the first crop of “alums” of the program. According to the last email we students received from our Johns Hopkins professors, about 2.3 million students have attempted… Continue Reading

TI-84 Plus CE Announcement Tomorrow

TI-84 Plus CE Announcement Tomorrow

It’s been a couple of years since Texas Instruments has announced a significant change to one of their major graphing calculator product lines, the most recent being the addition of color to the TI-84 line with the TI-84 Plus C. Tomorrow morning, TI will officially break their silence on a big update to that same extremely popular line… Continue Reading

Data Science Capstone Review

Data Science Capstone Review

  Overview of the Data Science Capstone Project and Approach The Johns Hopkins Data Science Capstone project concluded around Christmas last month. It was an interesting experience, and very different than the other classes. The project, a partnership with smartphone app maker SwiftKey, required students to create a predictive text web app that worked much… Continue Reading