NADDI 2015

{Research Data Management: Enhancing Discoverability with Open Metadata Standards }

NADDI 2015 Program Abstracts & Descriptions

(in alphabetical order by title)

See also the NADDI 2015 Program.

 


A Next Generation, Reusable, Web-based Data Catalog

Jeremy Iverson, Colectica; and Barry Radler, Institute on Aging, UW-Madison

The new Colectica Portal showcases how DDI 3.2 enables multiple organizations to use the same software to allow researchers to search, browse, compare, and download complex datasets. In late 2013, NIH funded MIDUS to create a DDI-based, harmonized data extraction system. MIDUS is a decades-long national longitudinal study of 10,000+ Americans with a blend of social, health, and biomarker data. This project builds on Colectica to enable something unprecedented: the ability to obtain customized cross-project downloads of harmonized MIDUS data. This capability enhances efficient and effective public use of the large longitudinal and multi-disciplinary datasets. In 2014, the UK ESRC and MRC funded the CLOSER project to maximize the use, value, and impact of nine UK longitudinal studies. The scale and detail of the metadata included will make it amongst the largest such repositories in the world. The discovery platform is a customization of Colectica Portal, building on previous developments, including the work with MIDUS. The new Colectica Portal offers cutting-edge search technologies and user-friendly ways to discover, navigate and display the metadata. This presentation describes metadata creation and harmonization, and showcases the innovative search and discovery interfaces that are critical to researchers understanding and leveraging massive data resources.

PDF of full presentation.
Return to the Program.

An Open Source, DDI-Based Data Curation System for Social Science Data

Jeremy Iverson, Colectica

The Institution for Social and Policy Studies (ISPS) at Yale University and Innovations for Poverty Action (IPA) partnered to develop a repository for research data from randomized controlled trials in the social sciences. The repository is an expansion - and major upgrade - of the existing ISPS Data Archive. Together with Colectica, the partners have developed a software platform that leverages DDI Lifecycle, the standard for data documentation. The software structures the curation workflow, which also includes checking data for confidentiality and completeness, creating preservation formats, and reviewing and verifying code. The software enables a seamless framework for collecting, processing, archiving, and publishing data. This data curation software system combines several off-the-shelf components with a new, open source, Web application that integrates the existing components to create a flexible data pipeline. The software helps automate parts of the data pipeline and unifies the workflow for staff. Default components include Fedora Commons, Colectica Repository, and Drupal, but the software is developed so each of these can be swapped for alternatives. This session will include a live demonstration of the data curation software.

PDF of full presentation
Return to the Program.

Connecting the Dots with DDI

Michelle Edwards, Cornell Institute for Social and Economic Research (CISER)

The Cornell Institute for Social and Economic Research (CISER) data archive has been actively accepting Cornell social science and economic research data since 1981. Holdings range from US Census to New York centric studies to International demographic studies and many, many more. Researchers currently search the archive using a basic search across a limited number of Study level and File level descriptor tags. To enhance discoverability, CED2AR will be implemented to add Variable level and enhanced Study level metadata. CED2AR uses DDI 2.5 metadata standards for documenting the holdings, along with schema.org for microdata markup to allow search engines to parse the semantic information from the DDI metadata. New data deposits? Researcher data or new archive collections will be added using an online data deposit form to create Study level and File level metadata and provide upload capabilities for the data and program files. An API will be used to pass metadata gathered from the data deposit form to both the current archive structure as well as the CED2AR database, ensuring the integrity of both systems. Three processes: an online data deposit form, the archive holdings, and CED2AR, all linked through DDI 2.5 will create a new workflow for the CISER data archive. By connecting the dots with DDI, we will enhance discoverability and usability of the CISER data holdings.

PDF of full presentation
Return to the Program.

Crowdsourcing DDI Development: New Features from the CED2AR Project

Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, & William C. Block; Cornell Institute for Social & Economic Research

Recent years have shown the power of user-sourced information evidenced by the success of Wikipedia and its many emulators. This sort of unstructured discussion is currently not feasible as a part of the otherwise successful metadata repositories. Creating and augmenting metadata is a labor-intensive endeavor. Harnessing collective knowledge from actual data users can supplement officially generated metadata. As part of our Comprehensive Extensible Data Documentation and Access Repository (CED2AR) infrastructure, we demonstrate a prototype of crowdsourced DDI, using DDI-C and supplemental XML. The system allows for any number of network connected instances (web or desktop deployments) of the CED2AR DDI editor to concurrently create and modify metadata. The backend transparently handles changes, and frontend has the ability to separate official edits (by designated curators of the data and the metadata) from crowd-sourced content. We briefly discuss offline edit contributions as well. CED2AR uses DDI-C and supplemental XML together with Git for a very portable and lightweight implementation. This distributed network implementation allows for large scale metadata curation without the need for a hardware intensive computing environment, and can leverage existing cloud services, such as Github or Bitbucket.

PDF of full presentation
Return to the Program.

Data Capture: Tracking Data from Source to Results

Wendy L. Thomas, Minnesota Population Center

Over the years the focus of DDI has expanded from rendering a completed codebook into a machine processable format to recording data capture and production processes. However, much research involves the identification, restructuring, and reuse of data from existing sources. How well does DDI capture this chain of events and the provenance of individual data objects? Can we create tools or processes to assist researchers in documenting this activity accurately and easily? A recent research project involving 45 indicators and 100-plus data sources covering a 23 year period serves as a case study for examining the following issues: Citing source data; Capturing selection criteria; Citing on-line extraction tools; Recoding, recalculating, transposing, and reformatting process capture; Linking the final research data set to its sources at the cell level; and Original metadata problems. The presentation will focus on coverage gaps, practices for using existing DDI structures, and recommendations for tools or procedures to assist researchers in capturing this information and incorporating it into their final metadata documents.

PDF of full presentation
Return to the Program.

Data Discoverability in Public Health

Arofan Gregory, Tito Castillo, Samuel Moore, Brian Hole, Christiana McMahon, Spiros Denaxas, Veerle Van Den Eyden, Herve' L'Hours, Lucy Bell, Jack Kneeshaw, Matthew Woollard, Chifundo Kanjala, Gareth Knight, & Basia Zaba; Open Data Foundation

The Wellcome Trust recently funded a project analyzing the discoverability and reusability of data for the purposes of research in the public health and epidemiology sector, on behalf of a group of more than 20 international funders. Public health is a cross-cutting domain, using data from many different sources. The current state of play was examined, and several different models for improving data discoverability were considered. DDI, and similar types of rich metadata, were seen as an exemplary and necessary tools for improving the state of data discoverability in this domain.

PDF of full presentation
Return to the Program.

Data Management Module: a New Extension for the Rogatus System

Ingo Barkow, DIPF - German Institute for International Educational Research

Rogatus is an open source survey and metadata repository solution basing on the DDI 3.2 standard and using the Generic Longitudinal Business Process Model (GLBPM) to specify its tool chain. Currently the project is supported by DIPF, IAB and GESIS and creates more and more interest especially with NSIs and data collection agencies. This presentation gives an update on new developments since last years' NADDI 2014 like coding support for ISCED, improvements on the case management system, compatibility to other platforms like Colectica or MMIC plus an outlook on the mobile client "Aitema", but will focus mainly on the new extension Data Management Module. Data Management Module was developed to support contractual processes between data producers and research data centres to give staff working in such facilities the possibility to get an overview about the legal and technical requirements derived from the contracts plus a linkage to the according documents (e.g. security levels for data, embargo times).

PDF of full presentation
Return to the Program.

DDI Moving Forward: Update, Feedback, and Suggestions for the Project

The next generation of the DDI will use unified modelling language to simplify the DDI standard, broaden its focus to new research domains, and make it expressible in technologies beyond XML

PDF of full presentation
Return to the Program.

Documenting Spreadsheets with Colectica for Excel

Dan Smith, Colectica

Colectica for Microsoft Excel is a free tool to document statistical data using open standards. The free Colectica for Excel tool allows researchers to document their data directly in Microsoft Excel. This talk will show how variables, code lists, and datasets can be described in a standard format and globally identified. Data can also be directly imported and documented from SPSS and Stata files. The standardized metadata is stored within the Excel files so it will be available to anyone receiving the documented dataset. Code books can also be customized and generated by the tool, and output in PDF, Word, Html, and XSL-FO formats. The new version 5 improves integration with SPSS and adds support for DDI Lifecycle 3.2, the premier open standard for data documentation.

PDF of full presentation
Return to the Program.

Documenting the Vietnam Era Twin Study of Aging

Carol Franz, University of California-San Diego

Data documentation is, above all, an act of communication. In an ideal world programmers and researchers toil together to provide potential data users with enough information that they can become familiar with a study and data set, discern whether these data are appropriate for their research question, have adequately detailed information to write a methods section in a publishable paper, and know how to access the data. Although DDI works well for survey data, many studies have types of data (e.g. cognitive, physiological, imaging, genetic) and samples (e.g., complex longitudinal samples with subsamples) that don't easily coalesce in DDI. In addition, unlike survey data, these types of data have complex protocols and processing/scoring that require in-depth knowledge about a study that needs to be communicated in order for the data to be used successfully. Many projects have little time, training or incentive to carefully document data for public use. In this session I will present on types of complex data collected in the longitudinal Vietnam Era Twin Study of Aging and the challenges they pose for DDI documentation.

PDF of full presentation
Return to the Program.

First Results from the Survey on Metadata Management in the Educational Sciences

Ingo Barkow, DIPF - German Institute for International Educational Research

Representing educational content in DDI like cognitive items from computer-based assessment is difficult. Though DDI 3.2 has improvements in regards to response domains and workflow only items which are close to questionnaires (meaning the use of simple stimuli) can be represented. To identify the white spaces in the standard a survey on metadata management in the educational sciences has been performed by the author as part of doctoral thesis in November and December 2014. This talk will present first results from this survey which might have an impact on the further development of DDI Lifecycle 3.x and DDI4 in the domain of the educational sciences.

PDF of full presentation
Return to the Program.

International Clinical Research Collaborations using DDI Lifecycle: CHARM's Growing Pains

David K. Johnson, University of Kansas

The Center for Hispanic American Research Methods (CHARM) is a cooperative of US and Latin American research laboratories interested in coordinating biobehavioral research. The first goal of the CHARM is to create a multilingual applied clinical research library (to-date over 400 unique instruments in 850 different applications) that can be shared widely by investigators throughout the US and Latin America to facilitate high quality biobehavioral research on medical issues germane to Hispanic Americans. It uses state-of-the-art data standards that specify a research lifecycle (the Data Documentation Initiative - version 3; DDI-3). By applying this international data standard to the clinical research instrument library, the CHARM offers participating investigators a database of well-described clinical instruments and code libraries that bootstrap the investigative process. An investigator assembles a neurocognitive battery using a flexible assessment battery approach. The selected battery can be implemented using Computer Assisted Testing (CATI), REDCap (both online data entry or email surveys), LIME Survey and its associated Optical Character Recognition (OCR) software (QueXF), or more traditional Paper-and-Pencil via PDFs. The DDI standards creates the database frame so that investigators move quickly to collect the data as well offering open source tools to facilitate data entry and verification. Finally these shared data standards provide a rational heuristic to pool data across sites, thus increasing power to detect meaningful differences while distributing the research costs, and subject burden. So long as a cooperative of multisite investigators used similar DDI standards (instrumentation, question phrasing and collection methods - all specified by the CHARM library) then those data can be pooled to answer a shared research question. This framework promotes a coordinated, interdisciplinary approach to research while allaying some of the administrative burden of deploying a research project by an (usually) over-encumbered investigator. Although we use validated and published translations wherever possible, there are many clinical instruments that need still translation by trained clinicians (about 2/3 of the library). We are establishing an online referee process for these translations as well coordinating the translation assignments.

PDF of full presentation
Return to the Program.

Keynote Address: From 'Data Discoverability' to 'Data Navigability'

Dr. Tito Castillo, Founder & Managing Director, Xperimint Ltd

The notion that data need to be made more discoverable is widespread; indeed it is an important theme of this conference. This conjures up an image of the bold explorer embarking on a romantic quest, akin to Christopher Columbus. Is this the right paradigm? Dr Castillo will argue that we need to shift from to a more deliberative mind-set considers the challenge of navigating the increasingly complex and inter-related data (and metadadata) systems. This is not simply a matter of semantics since the existing data discovery paradigm underpins our strategic approach to data management. He argues that the data discovery view places the responsibility on bold explorative scientist to find gems of data and glosses over the wider governance and community requirements. Once the discourse is shifted to that of 'data navigability' we begin to approach a metaphor that emphasises the continuum of data that exists, the obstacles that we may meet along the way and cultural shift that is needed.

PDF of full presentation
Return to the Program.

Marketing and Generating Partnerships Group Discussion

A conversation with the DDI Marketing and Partnerships group about tapping into user networks & developing a more formal mechanism to gauge user needs, frustrations, challenges.

PDF of full presentation
Return to the Program.

Overviewing the Translating Research in Elder Care Monitoring System (TMS) Data Platform

James M. Doiron, Health Research Data Repository, University of Alberta; Andrew DeCarlo, Metadata Technologies North America, Inc.; Shane McChesney, Nooro Online Research

The initial Translating Research in Elder Care (TREC 1.0) (http://www.trecresearch.ca) program was a 5-year (2008-2012), $4.7 million (CAD) CIHR funded research program examining the effects of context upon resident and care provider outcomes in the Canadian long-term care sector. A second phase of the project, TREC 2.0 commenced in 2014. The TREC Monitoring System (TMS) Data Infrastructure Platform project (CFI funded; $1 million CAD; 2012-2016), a collaborative effort between TREC, the University of Alberta's Health Research Data Repository (HRDR), Metadata Technology North America (MTNA), and Nooro Online Research, focuses upon the application of standardized DDI friendly metadata to support the automated collection/ingestion, quality assurance, harmonization and merging of TREC 2.0 data, as well the timely delivery of reports/outputs and real time dashboards based on these data. This session will offer an overview of the Data Infrastructure Platform project, including TREC and its data types/sources, the HRDR virtual research environment which supports the project, challenges encountered, demonstration of tools, and how the project employs metadata standards that will serve as 'proof of principle' for a transferrable metadata driven management framework, including the ability to output DDI, which greatly increases capacity for application within future KUSP and HRDR housed research activities and beyond.

PDF of full presentation
Return to the Program.

 


Plenary Session: Conference Wrap-Up and Future of NADDI

Return to the Program.

Plenary Session: Discovering Standards

Dorothea Salo, Faculty Associate in the School of Library & Information Studies at the UW-Madison

We joke about standards proliferation - standards are like toothbrushes, "there are 14, no, 15 competing standards," and so on - but our laughter is rueful at best. With so many standards fighting for use, it's hard to zero in on the ones we need, hard to know what to teach ourselves and our colleagues and students, hard to avoid standards-related decisions that turn out not to be ideal. Whose problem is this to solve, and how solvable is it?

PDF of full presentation
Return to the Program.

Research Data Services at the University of Wisconsin-Madison

Brianna Marshall, Trisha Adamus, & Elliott Shuppy; UW-Madison

This presentation will describe the data management landscape at the University of Wisconsin-Madison (UW) with emphasis on the history, organizational structure, and services offered by UW's interdisciplinary group Research Data Services (RDS). RDS is taking note of changes in the field by adapting to new needs related to data management plan consultations as well as the growing need for assistance with other aspects of data management, such as understanding and creating metadata. The presenters will discuss how metadata standards figure into the RDS mission as well as the opportunities and challenges that DDI presents.

PDF of full presentation
Return to the Program.

Starting in the Middle: The Wisconsin Longitudinal Study & DDI

Carol Lynn Roan, UW-Madison

The Wisconsin Longitudinal Study (WLS) has collected 6 rounds of data from a panel of 10,317 1957 high school graduates, one of their siblings, and for one round, their spouses. In addition to survey data the WLS includes a wide variety of administrative data and biological markers. When it came time to process the fifth round of data collection in 2004, WLS data managers used DDI-compliant XML format to document the data. Prior to a potential seventh round of data collection, the WLS is transitioning to DDI Lifecycle. We will talk about the reasons for the transition, the complexity of working with longitudinal measures, and the benefits of taking on such a large challenge. Those benefits include the opportunity to work with and learn from those who have already made a similar transition and the potential to "do it right" in the next round by starting with a DDI Lifecycle framework. If time allows we will present our early work on moving to the latest standard.

PDF of full presentation
Return to the Program.

Supporting Extended Citations in DDI4

Larry Hoyle and Mary Vardigan, Institute for Policy & Social Research and ICPSR

In October 2014 a group with participants from the DDI, CDISC, and the Dublin Core Metadata Initiative communities met as part of the Dagstuhl 2014 DDI Moving Forward Sprint to explore ways in which DDI4 could support more nuanced representations of contributorship to the creation of datasets and related intellectual objects. This presentation will outline the results of that meeting, including details on the resulting implemented and proposed objects in the DDI4 model.

PDF of full presentation
Return to the Program.

Training Session: Discover the Power of DDI Metadata

Instructor: Jane Fry, Carleton University, Ottawa, Ontario

This half day workshop is appropriate for anyone who is responsible for managing microdata of individuals or organizations, and who has wondered how metadata could streamline their research.  A brief background on the Data Documentation Initiative (DDI) metadata standard will be followed by examples of using DDI and of tools that will help you produce and exploit DDI metadata. The participants will be presented with different applications running DDI metadata that showcase the discovery of and access to research data. Producing DDI metadata from two popular formats (SPSS and Excel) will be demonstrated and the tools that subsequently exploit these metadata will be shown (NESSTAR and Colectica Excel). Finally, participants will be presented with a data lifecycle workflow example that illustrates how metadata production and research data management can be integrated. As this is an introduction to DDI, no previous knowledge of this standard is required. There will be plenty of time for questions.

PDF of full presentation
Return to the Program.

Training Session: Open Data and Metadata Management

Instructors: Andrew De Carlo & Arofan Gregory, Metadata Technology North America, Inc.

This half day workshop focuses on the practical packaging of open data and metadata, in particular around statistical datasets. Participants will learn about the benefits of combining ASCII text data files with DDI-XML for the publication, sharing, and long term preservation of data, and how, when combined with relevant scripts and programs, this provides a powerful packaging system for the effective delivery of open data. To turn these principles into reality, participants will be introduced to SledgeHammer (http://goo.gl/7PB5B8) and other standard based tools enabling the production of open data packages and, as a result, the prepartion of DDI files, and the conversion of data & metadata from and to statistical packages, databases, and other platforms. This will be followed by the presentation of several real life practical use cases, illustrating the versatility and benefits of the approach (see agenda for details). This will include in-depth analysis and demos around the use cases of highest interest to the workshop audience. Participants will also be encouraged to present their own challenges. Finally, a hands-on session[1] will give the opportunity for all to try this out using their own data files or provided datasets. Each participant will receive a free 3-month license of version of SledgeHammer Pro.

Return to the Program.

University Data Policies and Data Services

Kristin Briney, UW-Milwaukee

DDI's principle goal is making research metadata machine-actionable, but what role are universities playing with respect to research data? Additionally, how do universities actually help researchers across the data lifecycle? Our research examines the landscape of university data policy and library data services to understand how research universities support data management. We reviewed the university websites of 206 institutions with a Carnegie Classification on Institutions of Higher Education of either “High” or “Very High” research level as of July 2014. We examined the content of the policies and asked several questions, including: Does the institution have a publicly accessible data sharing or management policy? What does the policy cover? Who owns the policy (e.g. Office of Research, Information Technology, Libraries)? What happens to the ownership of the data if a researcher leaves the institution? What types of universities are more likely to have data policies and/or data services? Our goal is to better understand how universities support data management through university policy and library services.

PDF of full presentation
Return to the Program.