Research Science and Technology

New course aims to train next generation of data scientists

Computer science professor Mitsunori Ogihara and graduate student Jerry Bonnell designed and taught a pilot class last fall in data science that aims to fill a widening gap of expertise in the field.
Ph.D. candidate Jerry Bonnell and computer science professor Mitsunori Ogihara. Photo: Mike Montero/University of Miami
Jerry Bonnell, left, a Ph.D. candidate, and computer science professor Mitsunori Ogihara piloted the undergraduate class “Data Science for the World” last fall. Photo: Mike Montero/University of Miami

The volume of data collected today is growing astronomicallyas our computers, phones, and smart devices track our every move, purchase, and desire.

Yet, the number of people who can sift through this data to find useful information remains a meager percentage of the workforce.

Yesha
Yesha

The need for more data analysts and data scientists is simply outpacing the supply. So, colleges and universities need to help close the widening knowledge gap.

This is the crux of an upcoming article in the Institute of Electrical and Electronics Engineers Computer Journal, written by University of Miami computer science professors Yelena Yesha and Mitsunori Ogihara, and graduate student Jerry Bonnell. It was the impetus for the Master of Science in Data Science program, now offered through the Graduate School. And it is why Ogihara and Bonnell designed and piloted the undergraduate class “Data Science for the World” during the Fall 2021 semester for students interested in the field. It will also be offered next fall.

“Many disciplines today including science, medicine, social sciences, and even humanities disciplines use data for discoveries or exploration of ideas,” Ogihara said. “So, a student of any reasonable undergraduate program today should have some exposure to data science.”

Nick Tsinoremas, founding director of the Institute for Data Science and Computing, and the University’s vice provost for research computing and data, agreed.

“We want all of the students at the University to have more of a data science education and to be more data aware because this is our future,” he said. “To make decisions in general today, one needs to be data aware. So, this course is part of our effort as a University to expose our undergraduates to data science.”

It comes at a time when many colleges and universities are trying to educate students in the language of data. However, unlike other Data Science 101 classes, Ogihara and Bonnell tailored theirs so that students with little to no knowledge of statistics or computer programming could still benefit from it.

“We tried to make it accessible, so we don’t assume that students have a background in math, programming, or statistics,” said Ogihara.

The two even wrote an online textbook for the course, which opens with a list of real-world examples of data science in practice. These include the facts that monitoring patient data can help doctors more accurately diagnose diseases, and that tracking social media posts can help data scientists explain a shift in public opinion. The resource is now being edited for publication and is unique because the textbook uses “R,” the favored programming language of many statisticians.

Ogihara and Bonnell chose to use R because it is attuned to statistical analysis and, by incorporating an increasingly popular collection of tools in R called tidyverse, students can easily learn how to process, wrangle, transform, and model data on their own, too.

“From the beginning to the end of the course, students were touching real data with their assignments,” Bonnell said. “So, they could always see the big picture and knew they were doing something important.”

For example, the class’s 20 students investigated the 2015 accusation that the New England Patriots deflated footballs during the AFC championship game because it was easier for quarterback Tom Brady to throw them in the cold. They tested whether the average ball pressure drop was because of randomness and concluded it was plausible that the pressure drops observed were because of a reason other than chance. That was one of first-year student Eddie Hanlon’s favorite assignments.

Hanlon said he has always been interested in numbers, but the course helped him learn some computer programming that can further his analysis. It also taught him some new statistical strategies.

“I’ve never done any programming and had zero experience with the software R,” said Hanlon, a finance major and computer science minor. “But by the end of the semester, I felt pretty proficient in R.”

He was so empowered by the course that Hanlon spent part of his winter break learning Python, another programming language that is also used widely by data scientists.

“I wasn’t initially interested in data science. I just didn’t know enough about the field,” he said. “But I am now. Data science is extremely applicable in so many different fields, so I definitely see it as a career possibility.”

Caroline Hall, a senior and math major, took the class to improve her skills in R for future job opportunities. She had already learned the programming language Java. But the course helped Hall feel so comfortable with R, it allowed her to learn two other tools since thenSQL and Tableau, which help transform and visualize datasets.

“I feel confident now in being able to transform datasets, which means to organize the data and extract the most useful information from it,” said Hall, who also has minors in computer science and psychology.

The class also piqued her interest in a career in data science.

“I want to start off as a data analyst. But I know they work with data scientists, so I may want to transition into that,” she said.