Lessons Learned in Collecting Student-Level Data from Multiple Higher Education Institutions

Daniel Rossman

Institutions of higher education vary widely in how they define, collect, and store their students’ data, making the collection of student-level data across institutions a challenging task.

Since September 2015, Ithaka S+R has served as the independent evaluator of the Monitoring Advising Analytics to Promote Success (MAAPS) study, an intensive proactive and technology-enhanced advisement intervention for first-time low-income and/or first-generation students across the 11 four-year public universities that make up the University Innovation Alliance.

We recently completed the first of three years of collecting student outcomes data, including GPA, credit accumulation, and progress-to-degree, as well as data on treatment group students’ interactions with program advisors, for more than 10,000 students randomly selected into intervention and control groups. We’re also conducting an implementation study, documenting how each institution undertook the key components of the intervention.

Reflecting on the first year, we’ve identified a few strategies and practices that we have found to increase the likelihood of a successful multi-institutional data collection effort:

Designate a senior data lead and dedicated data analyst

At the outset of the project, we required that each institution’s project lead assign a senior data lead who could make decisions about priorities, had significant experience working with the institution’s student information system, and had the ear of senior administrators. In addition, we asked data leads to designate a half-time data analyst dedicated to the project (and paid for by the grant) who would be responsible for the day-to-day data collection and analyses that the project required. Not all projects will have the funds to secure so much time from a data analyst, but having dedicated institutional representatives should be a priority – these consistent relationships have improved and streamlined the data collection process in the MAAPS project.

Facilitate communication between all parties

We’ve used a listserv to share updates, send documents, and ask questions, and have encouraged institutional representatives to use it to target questions and share practices amongst themselves. In addition, all guideline documents are sent to the listserv and uploaded to a shared Google folder, enabling data leads and analysts to access all documents from a single location.
We have designated two clear points of contact (me and Rayane Alamuddin) at Ithaka S+R, who respond very quickly to personal emails and emails sent to the listserv, and always make ourselves available for phone calls.

Create a detailed codebook for each data file

We developed several codebooks listing the requested variables and their key details, including their format (e.g. binary), their definition, their scope (e.g., which students should be included), when and how they should be submitted, and other relevant notes. When asked what’s been most helpful in creating the data files, almost every data lead has pointed to the codebook.

Consistently solicit feedback from institutions, especially for the codebook

We facilitated two 90-minute webinars and two live trainings with the data leads and analysts from the 11 institutions, in addition to holding multiple phone calls, to gather their feedback and make revisions. Similar to the principles that drive human-centered design, we iterated this process until we finalized definitions and processes that made sense to every institution’s data lead. Throughout this process, we sought advice and support from our technical consultants, and engaged with administrators and staff across the institutions, including those in the registrar’s office, financial aid office, and members of the Institutional Review Board.
We regularly solicit feedback from representatives at a subset of institutions, approaching it like a virtual focus group, to help us diagnose issues and test proposals before introducing an issue or instituting a change to the whole group. The subset of institutions alternate based on known interest, capacity, and specific skillset.

Develop a series of standard data checks to run on each submitted data file

As part of a systematic review of each data file, we run a host of checks in a timely manner, enabling us to diagnose issues and fix them quickly, sometimes proactively. If changes need to be made on the institution’s end, we promptly explain the issue and the fix, and request that they submit a revised file.

Be flexible and transparent

We try to be flexible and accommodating whenever possible, such as developing timelines with institutions’ preferences and calendars in mind.
Despite months of preparation and rounds of revisions, unforeseen issues arose, which should not come as a surprise. It’s best to be able to communicate these issues with institutions immediately, revise relevant documents, and help institutions make changes. Fortunately, we are working with 11 institutions whose leadership and staff are committed to this project, and who have been consistently responsive and supportive, mirroring the goals of the collaborative environment fostered and encouraged by the University Innovation Alliance.

While this list is certainly not exhaustive, these practices are relatively easy to implement and can make the cross-institutional data collection process significantly more efficient. Overall, we perceive and treat institutions and their data teams as partners, rather than as mere sources of data.

Topics:

Student outcomes