Data Curation in the University: Libraries, Research, and Learning

Workshop date: March 25, 2013

Venue: School of Social Sciences Auditorium (SSS-I), JNU, New Delhi

Registration for this event is now closed.

This one-day, invitational workshop will be held at Jawaharlal Nehru University on March 25, 2013, from 9:00 a.m. to 5:30 p.m. Registration begins at 8:30 a.m.

Cyberinfrastructure and “Big Data” represent a paradigm shift in science that is transforming the research process to focus on the computation, analysis, curation, and reuse of digital data on a massive scale.  What are the implications for universities in terms of their libraries, research, and teaching?

This workshop brings together a select group of senior librarians, research and data center managers, cyberinfrastructure researchers, and educators to explore these issues and begin to integrate data curation into their practice and services.

Participants will be provided with:

Thanks to support from Jawaharlal Nehru University and the Office of Engagement at Purdue University, there is no charge to participate for invited attendees. Lunch and tea are provided.

Program Overview

Welcome

Welcome to Jawaharlal Nehru University and introduction to the workshop from Prof. Sudha Pai, Rector, and Prof. Ramesh C. Gaur, University Librarian.

Opening Address by Dr. BK Gairola, Ministry of ICT, Government of India

Opening address to be given by Dr. BK Gairola, Mission Director, Ministry of Information and Communication Technology, Government of India.

Keynote Address by Dr. Jagdish Arora, INFLIBNET

Dr. Jagdish Arora, is the Director of Information and Library Network (INFLIBNET) Centre from August, 2007 onwards. Prior to his present assignment, Dr. Arora has worked as the Librarian and Deputy Librarian at the Indian Institute of Technology Delhi and as the National Coordinator of the INDEST-AICTE Consortium. His awards include a Fulbright Professional Fellowship in Library and Information Science (1997- 98), SIS Fellow (1999), Young Librarian of the Year (2001, SATKAL), Librarian of the Year (IASLIC, 1999), ILA-Kuala Best Librarian Award (2004), and Master Motilal Sanghi Best Librarian Award (2009). He delivered 7th Dr. S.R. Ranganathan Memorial Lecture organized by Delhi Library Association in 2009. He was awarded "SATKAL Life Time Achievement Award" by SATKAL in 2011.

Dr. Arora was the Principal Investigator for several projects sponsored by agencies like AICTE, Dept. of Biotechnology (DBT), Ministry of Information Technology (MIT), Ministry of Human Resource Development (MHRD), the National Highway Authority of India and University Grants Commission. He is the Principal Investigator of the Project entitled "National Library and Information Infrastructure for Scholarly Content (N-LIST)" funded by the Ministry of Human Resource Development (MHRD) under the National Mission of Education through ICT. The project provides for access to selected e-resources to more than 6,000 colleges in India.

He has been a member of the International Library Advisory Boards of IEEE and IEE. Dr. Arora has written over 70 research articles in journals, book chapters, and conference papers. His current research interests include consortia-based subscription to e-resources, digital libraries, digitization of old and fragile documents and their storage, database-driven Web interfaces and Web-based library services, Web-based learning and education, access management technologies, and scientometric analysis.

Institutional Repositories and the Research Data Landscape

Sharing research data is an important ethic for many scientific disciplines, and repositories play a key role in this scholarly exchange. In addition to published papers, having access to the underlying data (e.g., software source code, spreadsheets, transcripts, field notebooks, observation logs, images and video, and sensor and instrument data) is critical to validating reported findings. Data can also be reused to advance the original research or new lines of inquiry.  Moreover, preserving and sharing existing datasets in repositories avoids the cost of generating new data from scratch.  In the case of government-sponsored research, such repositories make research data available to the taxpayers who funded the research as well as to citizen-scientists, students, and other researchers. Academic and research libraries are taking a more active role in data management, applying library science principles to help address the data deluge.  This includes a wide range of activities such as helping researchers formulate funder-required data management plans, adapting library practice to help organize and describe research datasets, developing data collections and data repositories, taking responsibility for digital preservation, and encouraging data literacy.  
This session will provide an overview of the research data lifecycle and explore new roles and practices for libraries in data management in the context of institutional repository services.  One example of such a service at Purdue University (PURR) will be presented for demonstration and discussion.

Michael Witt, Assistant Professor of Library Science, Purdue University, USA
Ramesh C. Gaur, University Librarian, Jawaharlal Nehru University, India

Cataloging Research Data Repositories with Databib

For centuries, librarians and other information professionals have organized information into catalogs and finding aids to make collections of information more accessible to patrons.  In that same tradition, an international effort named Databib, http://databib.org, is underway to identify and catalog the world’s repositories of research data.  Users and librarians create and curate records that describe data repositories.  Users can search for data repositories using a basic keyword or advanced search of metadata such as the title of the repository, its URL, who maintains the repository, its description, the subject areas it covers, and policies regarding who can access, reuse, and deposit data in the repository. In addition to the Databib website, all of the information in Databib is exposed using a variety of technologies to make it easy to integrate with other tools or platforms. These include RSS (Really Simple Syndication) feeds, OpenSearch, Linked Data (Semantic Web), RDF/XML, and support for over 300 social networks such as Twitter, Google, and Facebook. In the short time since its launch in May 2012, Databib has achieved wide recognition and adoption in the research community with over 20,000 users from 84 different countries. Its editorial and advisory boards include broad, global representation of institutions involved in research data management in the United States, India, China, Ecuador, South Africa, United Kingdom, Australia, Portugal, Egypt, Hungary, Japan, Spain, and Germany.

Michael Witt, Assistant Professor of Library Science, Purdue University, USA

Lunch and Breakout Sessions

What is the landscape of research data management in India? Workshop participants will break into smaller groups for lunch (provided) with facilitators who will lead a discussion to begin to survey research data infrastructure and identify stakeholders and approaches for collaboration.  The groups will reconvene to report out after lunch.

Big Data in Education: Improving Teaching and Learning Through Data Mining and Analytics

What does “big data” in education look like? How can we use “big data” or “data”, in general, to influence educational practices? The proposed session will discuss role of big data in education and suggest how educators can use big data to improve educational practice. Specifically, we will explore how big data can make it possible for teachers to get insights into student performance and learning. As Darrell West (Director of Governance Studies) pointed out, “By focusing on data analytics, teachers can study learning in far more nuanced ways”. For example, “big data” can inform teachers about what works in the classroom, how long student’s study, how much time they need to master key concepts, etc. Hence, we will discuss how data can be used to inform pedagogical practices in the classroom and become one of the most important tools for teachers and learners. However, as discussed in the U.S. Department of Education report Enhancing teaching and learning through educational data mining and learning analytics, there are potential barriers to adopting educational data mining and learning analytics, including technical challenges, institutional capacity, legal, and ethical issues. The session will also examine these barriers in detail.

Aman Yadav, Associate Professor of Educational Studies, Purdue University, USA

Learning and Open Data Initatives in India

Presentations and panel discussion by:

Prof. Uma Kanjilal, Indira Gandhi National Open University, INDIA
Dr. Sanjaya Mishra, Director, Commonwealth Educational Media Centre for Asia, INDIA
Dr. Usha Mujoo Munshi, Librarian, Indian Institute of Public Administration, INDIA

Developing a Global Data Cyberinfrastructure for Science and Engineering

Science and engineering in the past century was based on the triad of theory, experiment, and computational simulation.  In this century, tremendous increases in computational capabilities, the proliferation of digital sensors and data sources, all linked by a high-speed network, has created the “fourth paradigm” of science in which data and the knowledge that can be distilled from these data are now an integral part of science and engineering.  In contrast to the physical assets of the past, digital assets in the form of software and data can be collected, archived, and widely disseminated across the world.  Examples of this new model of research include the Large Hadron Collider project in CERN, which involves thousands of physicists across the world in the successful search for evidence of the Higgs Boson.  Another example is the National Science Foundation George E. Brown, Jr. Network for Earthquake Engineering Simulation (NEES).  NEES operates a cyberinfrastructure named the NEEShub that provides a digital repository that contains decades of earthquake engineering data collected from experiments conducted from 14 NEES laboratory sites.

Just as laboratories, textbooks, and manuscripts were the required tools for science and engineering in the past century, a robust and distributed cyberinfrastructure is now an essential and necessary element needed to support research in this century.  Work over the past decade in developing cyberinfrastructure to support research has resulting in several successful systems, such as nanoHUB, NEEShub, iPlant, and the Compact Muon Solenoid distributed cyberinfrastructure.  Experience gained from the development and use of these cyberinfrastructure systems has pointed to a common set of emerging problems that will need to be solved to improve the power and usefulness of the cyberinfrastructure tools needed for modern science and engineering. The areas of need include:
 
1) Software tools, algorithms, and systems used to support digital data collection, curation, and dissemination.
2) Techniques to assure the long-term viability and accessibility of digitial data.
3) Science gateways and portals to support discipline-specific science and engineering as well as general digital scholarly activity.
4) The use of cloud computing and high performance computing systems to support data preservation activities.

In this session, we will explore the current state-of-the-art of cyberinfrastructure technologies used to support science and engineering today, and discuss these areas of emerging need.  The goal of this session is to develop an understanding of the needs of the science and engineering communities in India and globally.  Specifically, we will focus on the technical, sociological, and economic elements involved in the development, deployment, and use of data centric cyberinfrastructure for global science and engineering communities.

Thomas J. Hacker, Associate Professor of Computer and Information Technology, Purdue University, USA

Cyberinfrastructure Panel

Presentations and panel discussion by:

Prof. T.V. Vijay Kumar, Assistant Professor, School of Computer and System Sciences, Jawaharlal Nehru University, INDIA
Mr. Deepak Soni, Tata Technologies, INDIA
Prof. D.P. Vidyarthi, Associate Professor, School of Computer and System Sciences, Jawaharlal Nehru University, INDIA

After the conclusion of the workshop, an informal roundtable discussion will be held to identify opportunities for collaboration.

A collaboration between Purdue and Jawaharlal Nehru Universities

Organizing Committee

Prof. Ramesh C. Gaur, Jawaharlal Nehru University, India
Prof. Thomas J. Hacker, Purdue University, USA
Prof. Michael Witt, Purdue University, USA
Prof. Aman Yadav, Purdue University, USA

Directions


View Larger Map
Registration is now closed. | Please contact jnupurdue@gmail.com with questions or inquiries.