Don’t cheat on that test

Cengiz Zopluoglu, an assistant professor in the University of Miami School of Education and Human Development Research, Measurement, and Evaluation Program. Photo: Evan F. Garcia/News@TheU

By Barbara Gutierrez

Cengiz Zopluoglu, an assistant professor in the University of Miami School of Education and Human Development Research, Measurement, and Evaluation Program. Photo: Evan F. Garcia/News@TheU

Don’t cheat on that test

By Barbara Gutierrez
A University of Miami professor has created software to detect fraud in standardized tests.

Cengiz Zopluoglu, an assistant professor in the University of Miami School of Education and Human Development, has spent more than a decade trying to detect and eliminate cheating on standardized tests. Now, he’s helping other individuals and institutions do the same.

News@TheU asked Zopluoglu about his research:

News@TheU: Tell us a bit about the research you have been conducting with testing and why it is important?

Zopluoglu: My research is on statistical detection of fraudulent behavior in large-scale testing. Today, students take more tests, and consequences associated with the test scores are associatedwith higher stakes for students, teachers, and schools. The increasing reliance on testing in American education and for licensure and certification has been accompanied by an escalationin testing fraud at all levels. With high-stakes associated, some people may have the incentives to engage in fraudulent behavior in different ways. For instance, a student can copy answers from a nearby test taker. Teachers/proctors may give clues to test takers about the correct response in a particulartest location or even directly manipulate answer sheets to increase the test scores (see Atlanta Cheating Scandal).

Or, the items may be leaked before actual test administration, and some individuals may have advance knowledge of test items before taking the test (see recent speculations about SAT 2018). 

The fraud in testing is a serious threat to the validity of test scores, and the integrity of test scores have to be assured for a fair and just society. In other words, test scores must reflect the true proficiency level of candidates taking the tests and should not provide an unfair advantage to anybody. A fair result is the goal of all high-quality tests. Those who pass a medical licensure exam should do so because they indeed possess the knowledge and skill to perform a surgery, not because they had inappropriate prior access to the actual questions that would appear on the licensure examination. As a researcher with methodological training in educational measurement and statistics, I study the quantitative methods to screen the test response data for potential anomalies and unusual item response behavior. Therefore, we can assure that every test score is a reliable and valid measure of skills, proficiency, or knowledge of those who are taking the test.

News@TheUWhat do you actually look at as you are assessing whether there was fraud, response data?

Zopluoglu: It depends on the type of fraud you are looking for. If we are looking for test takers copying from each other, we look at the similarity between response vectors of two students and compute the likelihood of observing the number of matching and non-matching responses using some advanced measurement and statistical models.

If we are looking for the direct or indirect effect of teachers/proctors on student responses, we look at the erasure behavior of students. Today, test companies are not just counting the number of correct and incorrect responses in an answer sheet, but are also able to identify how many times a student changed his/her responsesand whether the student changed his/her response from incorrect to correct or from correct to incorrect using advanced test scoring machines.

Using this data, we can similarly use measurement and statistical models to identify whether or not a student’s erasure behavior is too extreme or not typical. If we are looking for advanced knowledge of items, we also have statistical and measurement models to use response time data along with item response data to flag suspiciouscases. We look at all evidence people may leave behind and available to us and then try to model that information to screen for suspicious item response behavior in the dataset.

News@TheU: Who is interested in this type of work and why?

ZopluogluMostly, companies that administer educational tests, or licensure and certification exams, are very interested and in need of these methods. A few years ago, very few people were working on it, but it has become a hot topic in recent years. I believe most test companies have now “test security” departments.

These companies screen data after every test administration and look for unusual behavior using the quantitative methods available in the academic literature and take appropriate actions if they find something. In 2012, I remember New York and Missouri were pushing legislative effortsto require their state education departments to create test security units and regularly screen for potential fraud in standardized tests.

News@TheU: Is fraud in educational testing a big problem?

Zopluoglu: If we believe students, yes it is a problem. My favorite response to this question is self-reported numbers from the Ethics of American Youth survey administered by the Josephson Institute of Ethics every other year. They ask a particular question to more than 20,000 middle and high school students about how many times students cheated on a test in the past year.

The last time they administered it in 2012, 24 percent of the students reported they had cheated at least once, and about 28 percent of the students said they had cheated two or more times on tests. The numbers in previous years were very similar. They have not done it since 2012, but I have seen the advertisement on their website for 2018. I will look forward to what students will report this year on that survey.

News@TheU: Tell us about the CopyDetect package; what is it and who can use it?  

Zopluoglu: It is a software package I developed for computing response similarity indices published in the academic literature. It is based on R, a specific programming language and environment for statistical computing. To my knowledge, it is currently the only open-source and non-commercial tool available to researchers and practitioners interested in this topic.

When I began reading this literature and working on it ten years ago, there was no available computational tool, and it was quite frustrating. The only way to research these topics was to write your own statistical programs, and I did. After some time, I just decided to make them publicly available and compiled all my code into an open-source R package, which can be downloaded and used by anyone anywhere in the world.

I received emails from many people around the world who use it, and people from different testing companies in the U.S. are approaching me in conferences to tell me that they use it. I believe developing these computational tools are essential for researchers and practitioners and I devote some of my time to improve these computational tools.

The CopyDetect package is available online: