| Omerzu | Relationships among archives and the social science research community - The case of successful relationship |
|
Six years of existence gave ADP important discovery. Researches use archives only (if only) when they are done with the fieldwork for storing the raw data. But not even then, must archives apply for the data, when first information is published. Based on past experiences, cooperation among the researchers and Archives before/during/after-conducting framework for new study is simplified by constant two-way communication and desired results are achieved. Archives provide existing related studies data both from domestic and international sources, enhancing the possibility of general comparison in advance. Besides providing raw data, archives gather meta-data and important information, which serves as additional reference in the making of the study. Often researches like to reapply past studies, but they find themselves stranded on the data graveyard. Therefore, our duty is to seek for all available information upon which codebooks are produced. But if we encourage research groups to enliven everyday communication while still on the fieldwork, archives normally get as a result of constant and tight cooperation highquality codebooks. The successful cooperation between ADP (=Slovenian social data Archive) and the research project RIS (=Research on Internet in Slovenia) will be presented.
| |
| van Gelder | The Dutch Question Bank |
|
Steinmetz Archive has two current projects concerning databases of questions. Steinmetz Archive has made a back office system for the Social and Cultural Planning Office in the Netherlands for their longitudinal study "Cultural Changes in the Netherlands". In each wave of this survey, a number of questions is taken from the same pool of questions. The other project is the Dutch Question Bank. In this project a internet interface is made in which researchers can search a database with questions from major studies in the Netherlands. The aims of this project are:
-make available the question wordings and frequencies -to facilitate comparison between studies. | |
Advancing Research and Data Literacy: empowering users
| Corti | Exploiting UK survey data sources for teaching political science: experiences from the classroom |
|
This paper reports on the progress from a small-scale project that aims to show how real life data
resources can be exploited in the classroom. A resource will be created aim to transfer key skills (at
higher and
post-16 educational level) in relating to the understanding and appreciation of statistical data and
analysis as they relate to substantive issues.
In view of the recognition in the UK that the skills shortage for quantitative analysis is now critical, introducing the concepts that enable working with real life data sources, early on in post-16 education is one way to redress this shortage. Not only does practical knowledge about survey methods and secondary analysis teach students how research is actually conducted, it also provides them with a tangible and marketable skill that they can use in future employment. Through a number of strategic investments both by the JISC and the ESRC, the UK academic community has access to a unique and expansive range of digital data resources. Whilst individual datasets are used extensively in academic research they are significantly under- used in learning and teaching programmes within Higher Education, and rarely used in Further Education. As a national data provider the UK Data Archive is in a strong position to offer its resources to the learning and teaching communities for developing more 'packaged' resources. Fundamental to understanding how best to re-purpose and apply the content is the need to seek advice and input from instructors in the classroom. This project will draw together a small team of academics, teachers and data archivists/disseminators in order to create and pilot such new resources. The small-scale set of online learning objects teaching materials aimed at political science students being developed is based upon British Election Study data which provide a unique source of information for students wishing to explore a range of contemporary political issues in Britain. It is hoped that this pilot study will better knowledge on how discrete 'chunks' of learning materials can be preserved and disseminated in a wider national context, so that other teachers can exploit them in a flexible manner.
| |
| Shrimplin | Focusing in on Student Learning Outcomes: How SDA Helped Us Build a Learning Community for Data Users |
|
Learning and its assessment have become a focus of attention at many universities and colleges. In
response to this new environment, academic libraries are struggling to demonstrate the ways in which they
contribute to higher education. To meet this challenge, libraries have begun to expand their
responsibilities and to take a more active role in the learning process.
This presentation will focus on how we have collaborated with faculty at Miami University to create a shared view on learning outcomes and on how that shared view involves using SDA -- a set of programs for Survey Documentation and Analysis -- as a tool to develop and incorporate web-based data analysis for both undergraduate and graduate courses. Special attention will be given to three ongoing projects, all of which seek to make their courses more analytical while promoting active learning for students.
| |
| Neidert & Eisenhauer | Public Data and a Thriving Democracy: Threats and Opportunities |
|
The Association of Public Data Users (APDU) provides advocacy on the content of and access to public
data. APDU corresponds with administrators and officials directly involved with the collection and
dissemination of public data on behalf of its members. The Taskforce on Confidentiality is charged with
educating public data users on the related issues of confidentiality and access. This includes the
legislative and statutory environment as well as technical issues related to the appropriate use of
aggregate and local area statistics and public microdata files. The Taskforce is also charged with
communicating the needs of public data users to those government agencies that collect and disseminate
data to the public.
For purposes of user education, the Taskforce has developed a series of White Papers describing relevant statutes and pending legislation (e.g., FOIA, Privacy Act, Patriot Act, E-Commerce), and a Primer on Data Confidentiality containing a non-technical explanation of the confidentiality/access debate and of statistical and technical issues related to disclosure limitation. Unlike other publications on these issues, the primer is designed to be accessible to the traditional data librarian or archivist or to the data user with limited formal training in statistics. In addition to meeting the objective of educating public data users about technical and legal issues surrounding the confidentiality/access debate, these materials serve as a jumping off point for the Taskforce as it seeks to communicate the needs of public data users (in terms of data content, data quality issues associated with confidentiality edits, and access conditions) to data producers in a constructive way. The Taskforce is a vehicle for public data users to inform the legislative process and policy debate concerning privacy, confidentiality, and data access. For purposes of this presentation, the Taskforce's objective is to communicate users' concerns about continued and improved access to high quality survey and administrative data at the aggregate, local area, and microdata levels. In the paper, the Taskforce proposes technical, non-technical, and legislative reforms that can protect public access to data generated by public agencies while maintaining respondent confidentiality and expanding public awareness of the relevant issues; and describes alternative mechanisms by which those reforms might be enacted.
| |
| Temilola | Research Data and Issues of Confidentiality |
|
This paper focuses on issues around protecting information about human subjects and related data sent via the internet. It considers the three concepts necessary to any discussion about data security in any given social milieu: privacy, confidentiality and consent. World wide access through the internet raises many questions including who owns digital information, who has the right to profit from other's work and who has responsibility for guaranteeing or regulating access to valuable information. Researchers in the social and behavioral sciences are expected to be proactive in designing and performing research to ensure that the dignity and privacy of individual remains confidential. The purpose of this paper is that confidentiality issues need to be recognised and considered at every stage of the research process including the initial study design: identification, recruitment and consent processes for the study population; security, analysis and final disposition of data and publication or dissemination of data and results.
| |
| Fink and Hansen | Data Processing in Danish Data Archives |
|
The Danish Data Archives was established in 1973 as a national data bank for quantitative research
carried out primarily in the social sciences but also in medical science and history. In 1993 the DDA
became an independent unit in the Danish State Archives. Due to severe cut backs in staff in 2002 the
archive has at present 13 full-time employees.
The DDA collects, preserves and disseminates machine-readable research data. When the data archive
receives a data material, data and documentation are converted to an archival format, which secures
technical preservation for the future. According to a priority list based on frequency of usage, the data
materials are then processed. Data processing implies standardisation and check of the material as well
as creating correct linkage between data and documentation. This part of the data processing secures that
the documentation of a data set is preserved as complete as possible and that data stays comprehensible
for future usage. The DDA uses the DDI standard for creating data documentation.
The paper will give an elaborate description of the data processing procedures employed by the DDA,
developed step-by-step - often in corporation with the data archive network the DDA is part of - since
the archive was established. These procedures are compared to several fellow archives quite extensive and
the most resource demanding activity in the DDA. The data processing performed by the DDA means that the
data are perfectly suited for elaborate search engines such as NESSTAR and that assistance to users
after they have received material is unnecessary.
The output of processing a specific data material is a data documentation protocol (DDP) consisting of a
study description and a codebook. As some data archives have done already, the DDA wants to make DDP's
available to users on the Internet. At the moment we are considering how to publish these documents as
well-suited tools for research and educational purposes. The paper will comment on the considerations we
are having concerning this.
From 1993 the DDA has been allowed to preserve data with sensitive content and personal identifiers,
which primarily involves storage of unique personal identification numbers (cpr). It was necessary for
the DDA to implement new security standards and changes in both hard ware and data processing procedures
to make this possible. A presentation of the challenges in archiving and processing this kind of data
will be included in the paper.
| |
| Kleemola and Keckman-Koivuniemi | Data Processing in FSD: challenges in a new archive |
|
The Finnish Social Science Data Archive (FSD) is a national resource centre for social science research
and teaching. It started to operate in 1999 as a separate unit of the University of Tampere. Its primary
goal is to increase the use of existing machine-readable social science research data in Finlad. FSD is
funded by the Ministry of Education.
At the moment FSD processes all studies intensively which is very time-consuming. Data materials are checked with special attention to, for example, 'anonymization' and filter variables. Data materials are transferred to the databases on variable level. FSD uses SPSS program in data processing and preserves data in SPSS portable format. The archive uses the DDI standard for creating data documentation. Study descriptions and codebooks are available on the Internet. The paper will describe current FSD data processing procedures in detail and take a look at the future. The paper will also include a list of challenges that our recently founded archive has faced.
| |
| Crockett | Data Processing in the UK Data Archive |
|
In recent years, data archives have devoted more attention to metadata than data. Yet, for all the apparent increase in ease of converting data between formats (via menu driven import and export filters), accurate translation of data between data formats is arguably more difficult than before. This is because software is increasingly designed to provide a view of the data that is divorced from the software's internal (i.e. underlying) representation of the data and its "internal metadata" (variable descriptions, code labels, variable formats, missing values, etc.). Further, this internal metadata has increased in volume and complexity, leaving delimited text as only a partial representation of the full data file, with no de facto standard for storing data and internal metadata in one file (though SPSS portable format comes closest). This situation makes error free data format conversion a critical building block for the dual purposes of most data archives: data sharing and long-term preservation. The major problems that affect data format conversions are:
The paper will illustrate the UK Data Archive's recent solutions to these problems. This centres on the development of Visual Basic scripts which automate, standardize, and remove known sources of error when performing common conversions - typically from SPSS to STATA and tab-delimited text (with customized data dictionary in rtf format) - as well as code to remove undesirable embedded characters from MS Access databases. These tools also allow automation of file and directory naming, removing many of the sources of human error to which repetitive and mundane conversion tasks are prone, leaving data processors more time to check and validate the actual data. | |
| Donakowski | Data Processing at ICPSR | tr>
|
Established in 1962, the Inter-university Consortium for Political and Social Research (ICPSR) maintains and provides access to a vast archive of social science data for research and instruction. To ensure that data resources are available to future generations of scholars, ICPSR preserves data, migrating them to new storage media as changes in technology warrant. This paper will address the ICPSR experience in preserving and processing data. Topics will include past and current procedures for archiving and distributing the data, as well as a discussion on how the required level of processing is determined. Challenges, such as those that stem from the tension between the research community's desire for immediate access and the amount and cost of processing, will also be presented. This paper will also address attempts to simplify data download procedures through ICPSR Direct, as well as our efforts to meet the demand for access to a variety of data resources through online analysis systems.
| |
| Dale | Access to microdata in the UK: the case of the Samples of Anonymized Records from the 2001 Census |
|
The paper will discuss the process of negotiating the 2001 SARs,the confidentiality issues that have
arisen and the implication this has for research. It will then place this in the context of the broader
changes taking place in the UK and more widely. In particular it will ask whether technological
innovations (eg the GRID) increase the risks of disclosure or offer opportunities to provide a virtual
safe setting.
| |
| Severt | Licensed to Bill: Single Fare vs. Bus Pass |
|
This paper will explore some of the issues involved in the licensing of data
products whether they be single-use formats, multi-use CDs, or unlimited online
subscriptions. For librarians, this usually means weighing limited use of an
affordable product against widely distributed use of an expensive product, in
short, buying a single-ride bus fare vs. buying a pass for the whole month.
Other issues have to do with interpretation: is the product licensed for 15
simultaneous users, or 15 specific users? And most problematic of all, how to
accommodate the faculty member who has their own private subscription/limousine
which the rest of the campus wants to use/drive?
| |
| Hamilton & Humphrey | Meeting the Challenge: the National Population Health Survey and Data Access |
|
The Canadian National Population Health Survey (NPHS) began as a longitudinal survey in 1994/95, collecting information every two years from a panel of representative Canadians and their households. In response to needs identified by the National Health Information Council, survey methodologists were challenged with the task of producing cross-sectional and longitudinal estimates from this survey while ensuring the confidentiality of survey respondents.
The first three waves of the NPHS produced both cross-sectional public use microdata files and longitudinal files, the latter restricted to use by Statistics Canada approved researchers. The NPHS cross-sectional Household public use microdata file was one of the first products to be added to a new channel of scholarly access for Statistics Canada surveys, the Data Liberation Initiative. The DLI provided licensed access to the first three cycles of the NPHS pumfs for researchers and students in 66 post-secondary member institutions across Canada. Were the efforts to overcome confidentiality concerns worth the investment of time and energy? This paper will examine the extraordinary research outcomes that resulted from critical decisions relating to access and product dissemination for the NPHS within the context of confidentiality and information access concerns by partners across a multitude of jurisdictions. | |
| Anderson | Managing Data in a Distributed World |
|
The UK is fortunate in having a number of national services that support the
research community by acquiring, managing, preserving and providing access
to important data collections. The Arts and Humanities Data Service (AHDS)
is one such service, supporting arts and humanities research and teaching
communities by accessioning, managing and presenting digital research and
teaching resources. At the same time higher education institutions are
forging ahead by providing on-line services and access to their
institutional collections. Within this creative distributed environment, a
range of complementary skills and expertise are required, tools and
applications are developed and implemented, and best practice and standards
established and agreed upon. One of the key challenges facing those of us
working in this environment is how to reconcile local and national
initiatives and services, and how to ensure that we collaborate rather than
duplicate.
This paper will present the model currently being developed by the Arts and Humanities Data Service. The AHDS is a distributed organisation with a Managing Executive and five subject-focused Service Providers providing services for archaeology, history, literature, languages, linguistics, the performing arts, and the visual arts. As such it is a microcosm of the wider distributed environment. The paper will describe the AHDS working model and discuss how this model might be extended to provide a bridge between the local and the national. | |
| Humphrey | Models of Data Archiving Services: the results of an international survey |
|
Between September 2001 and June 2002, a consultation was conducted on behalf of the National Archives of Canada and the Social Sciences and Humanities Research Council into possible institutional models for a Canadian national data archive. One activity of this investigation consisted of an international survey of existing institutions providing data archiving services. This paper will review the content of this survey, present a summary of results from this survey, and discuss three generalized models arising from these findings.
| |
| Wright | Medical Research Data and Models for Sharing adn Preserving Data: The case of the UK Medical Research Council |
|
Over the decades the U.K. Medical Research Council has funded the construction of a large
number of population-based studies. These include several longitudinal studies and a
significant number of cross sectional studies and clinical trials. But until now, the MRC has
not had a formal policy with regard to the archiving and provision for secondary analysis of
these data. To that end, the MRC Data Archiving and Access Project was established in 2001 to
gather information, consult widely, and at the end of the Project, to make recommendations to
Council concerning data archiving and access policy.
The project has been a staged one, with Phase 1 conducting a broad but general survey and convening a Working Group of interested experts. Phase two on the one hand tightened the focus by commissioning a series of site visits to collect in depth information on the conduct of population based data creation and management; and on the other broadened the focus by convening a "Horizons" workshop which attempts to locate the current inquiry in the broader context of developments in e-science generally. The UKDA played a key role in both phases, first consulting informally with the MRC as they were formulating their broad survey; then sitting on the Data Archiving Working Group which considered the results of the first survey; and finally being hired formally as consultants to help conduct the Phase II site visits. This paper will present the Project findings and discuss the proposed policy and models of service provision, looking at issues impacting on them such as consent and confidentiality, promoting a culture of data sharing, infrastructure architectures and costs, and researcher attitudes. | |
| Eaton | The Development of the Electronic Records Archives Program at the U.S. National Archives |
|
The mission of the National Archives and Records Administration is to ensure, for the citizen and the public servant, for the President and for the Congress and the courts, ready access to essential evidence that documents the rights of citizens, the actions of Federal officials, and the national experience. Increasingly, records are created and maintained in electronic formats. The National Archives is responding to the challenge posed by the diversity, complexity, and enormous volume of electronic records being created today and the rapidly changing nature of the systems that are used to create them with the creation of the Electronic Records Archives Program. The Electronic Records Archives (ERA) is envisioned to be a comprehensive, systematic, and dynamic means for preserving any kind of electronic record, free from dependence on any specific hardware or software. ERA, when operational, will make it easy for the National Archives customers to find records they want, and easy for the National Archives to deliver those records in formats suited to customers' needs. This session will discuss the creation of ERA, the current challenges and future plans for this program.
| |
| Grimes | The Emerging Data Web |
|
The Data Web is an emerging network of distributed statistical computing resources encompassing data sets
and analytical servers. It is enabled by new Internet technologies -- by grid-computing toolkits,
Web-services and specialized communications protocols, portals, and visualization tools -- and promoted
by governmental and consortium efforts such as FedStats, the European DataGrid, and the Global Grid
Forum. The diversity of approaches and projects will bring both challenges and opportunities to data
providers, researchers, and the public.
This presentation will review Data Web evolution, touch on technology underpinnings, and focus on notable efforts to provide distributed analytical and statistical power on the Internet. It will cover key development efforts, discuss how to locate and exploit services, and suggest how data and service providers can join. This presentation will conclude by examining research directions and development directions and trends. | |
| Nelson | Constituent Mail Analysis Project (CMAP) |
|
In 1978, the US Senate began to automate the handling of constituent correspondence. Automated
constituent correspondence system files are well suited for aggregate, quantitative research. The
correspondence management system records provided in electronic form by the Senate Computer Center are an
important access tool, a source of significant information, and the only index to Senatorial constituent
correspondence. Unlike the correspondence, itself, they can be easily purged of confidential
information and therefore more quickly opened for research. Perhaps most significantly, the Senate
staffers have already coded demographic and topical information into the computer files, providing a
database that can be readily adapted for use with statistical database software. Through a collaborative
venture between the Digital Archives and Data Center of Emory's Woodruff Library, the Constituent Mail
Analysis Project is building a web access point for the correspondence files of Senator Sam Nunn (Emory
is the official repository of Senator Nunn's papers). The project will segregate out metadata describing
constituent correspondence written in response to significant events (e.g., the Gulf War and Gays in the
Military legislation) and then provide a series of access tools that allow researchers to determine
regional and temporal differences in the opinions expressed.
| |
| Stratford | Digital Library Collections: Fostering Collaboration |
|
The Digital Library Collections Task Force at the University of California, Davis has been asked to make
recommendations for the future development of digital collections for the UC Davis Libraries. The goals
of the UCD Digital Library Collections Program are to:
- Increase the number and range of digital resources collections and resources available for faculty and students; - Offer digital library collections that are sustainable, scaleable, and compatible with the UCD and California Digital Library's technology infrastructure, and interoperable with national and international digital library collections and initiatives; - Promote and support the scholarly creation use of digital content by students and faculty at UCD; - Collaborate with the CDL and other research libraries in the development of digital library collections, technical infrastructure, and basic user access mechanisms; - And identify, evaluate, and pursue funding for library digital collections projects. | |
| Becht | Easy access to secondary data for scientific research |
|
The Dutch Scientific Statistical Agency stimulates the secondary use of data for social sciences by
improving the availability and accessibility of these data. A database has been set up that contains
metadata of relevant data following DDI standards (presented at the IASSIST conference of 2001). Since
then, an electronic catalogue was developed that can extract metadata from several databases using XML.
This catalogue is freely accessible via Internet and functions as a virtual library. It enables
(potential) users to easily search and quickly screen the available data, ideally from several suppliers
in one search. Furthermore, it provides ways for obtaining more detailed information and an opportunity
to download or order data. At this conference, the design of the electronic catalogue will be presented,
together with a view on its usefulness, its advantages and its pitfalls.
| |
| Oymyr | MADIERA: a European Infrastructure for Web-Based Data Dissemination |
|
MADIERA (Multilingual Access to Data Infrastructures of the European Research Area) is a EU-funded project that started in December 2002. Its main objective is to establish a web portal for social science data based on the DDI and extensions to the existing Nesstar technology. The project will develop tools for multilingual support, logics for identifying comparable datasets, a system for geo-referencing of datasets, options for researchers to add materials relevant to certain datasets, and thus build a cumulative knowledge base for social science data.
Within November 2005 a distributed web portal with access to data from archives all over Europe will be established. Furthermore, the aim is to extend the portal to a broader range of data providers. Partners in the MADIERA consortium are the national social science data archives in Norway, UK, Denmark, Finland, Switzerland, Greece and Germany, plus Nesstar Limited. (Website: www.madiera.net) The presentation will give a full overview of the MADIERA project.
| |
| Schulz, Brockfeld, Kelpin, Parnitzke & Wagner | Clearinghouse for Transport Data and Transport Models - Concept and Implementation |
|
The paper presents both concept and current implementation status of the new "Clearinghouse for Transport
Data and Transport Models", run by the Institute for Transport Research at the German Aerospace Center.
Although transport related research is highly dependent on reliable data, many relevant empirical
studies, statistical data or modelling approaches are exclusively known only by a small number of well
informed users. Facing that problem, the internet-based clearinghouse will facilitate easy access to
metadata as well as datasets and models for a broader public. The information available on the website
includes a wide range of detailed metadata, related material such as scanned questionnaires, code
lists, publication lists, or supplementary hyperlinks. Using XML-technology, the documentation of
metadata ist based on the DDI Documet Type Definition "codebook.dtd". Search for datasets or models is
supported by thematic catalogues and a site-specific search-engine based on a thesaurus. To display both
data and metadata of statistical data the NESSTAR system is used.
| |
| Block, Davis, & Peterson | The future of the Integrated Public Use Microdata Series: IPUMS International and IPUMS Redesign |
|
This presentation will describe two major data integration projects
underway at the Minnesota Population Center. The first is IPUMS
International, a project dedicated to collecting and distributing census
data from around the world. Its goals are to collect and preserve data and
documentation, harmonize data, and disseminate the data free of charge.
Data is currently available for Colombia, France, Kenya, Mexico, the
United States, and Vietnam. Other countries will follow, including a .1%
sample from China. Our second major integration is a redesign of the
Integrated Public Use Microdata Series (IPUMS). This project will create
two large parallel series of historical U.S. census microdata. The first
is a revamped IPUMS that, among other improvements, incorporates Census
2000 and American Community Surveys. The second is a restricted-use
microdata archive containing 1.4 billion records from the censuses of 1940
to 2000. The two series will be developed simultaneously using the same
software, methodology, and documentation. This will enable researchers to
design their analyses with publicly accessible data and limit expensive
time in a Research Data Center. Public-use test datasets will be
developed to mimic the unique aspects of the restricted files, allowing
researchers to test research designs, demonstrate their feasibility, and
minimize research costs.
| |
| Schreven | Providing access to the Dutch population census of 1971 |
|
Although the first Dutch population census was held in 1795 by the occupying French, it wasn.t until 1829 that the Dutch picked up on the idea and institutionalised the concept. From then on, there was a decennial census until 1930. The 1940 census was cancelled due to World War II, but soon thereafter the thread was picked up again, resulting in general population censuses in 1947 and 1960 and a housing census in 1956. The late sixties and seventies showed an increasing public concern with the protection of privacy. This led to a limited public ban on the 1971 census, only some .18 percent actually refused to cooperate. The 1981 census on the other hand was first postponed and later cancelled because of an average non-response of 26 percent during census trials.
Since 1997 the Netherlands Institute for Scientific Information Services (NIWI) and Statistics Netherlands (CBS) have been working on several projects to digitise the Dutch population censuses. The first results, consisting of two sets of CD-ROMs, a Website (www.volkstellingen.nl), were presented in 1999. Through these CD-ROMs and the website images of the census publications from 1795 to 1971 were presented, also some 10,000 pages of published data were manually converted for the 1899 census, these are available through the CBS Statline system (http://www.cbs.nl/en/statline/index.htm). More recently NIWI and CBS are cooperating in a project aimed to do the same for all the other censuses. What.s more, the individual data of the last two censuses (1960 and 1971) will become available for research as well. My paper will deal with the some of the problems encountered while examining the 1971 individual data, as well as the actions taken to ensure that individual citizens cannot be identified within the data. Furthermore I will present the ongoing project as it is. | |
| Gey | Rescuing Historical Censuses at UC Data |
|
Between 1972 and 1988 the Lawrence Berkeley Laboratory
of the University of California acquired most known population
counts in machine readable form from the 1970 and 1980
decennial censuses at levels of geography down to the
census enumeration district and block group, as well as
other auxilliary files from the Bureau and other sources
such as 1947-1977 consolidated county and city data book
and mortality detail files for 1965-1985 from NCHS. Included
in this data are unique files which don't seem to be found
at ICPSR such as 1960 population by county (1000 items) and
1970 census second count (single years of age down to census
tract level of geography). The data were converted into a
unique compressed format and stored on tapes on a CDC-7600
supercomputer and later of DEC VAX clusters.
Before the last running computer containing this unique database failed in year 2000 a complete dump of this data was made by the Census Bureau and sent to UC DATA on DLT tape (34 gigabytes). This presentation will discuss the project which we are undertaking to rescue this data by decoding from ancient tape archiving formats and decompressing the highly compressed data (the final decompressed archive should exceed 100 gigabytes of historical data). | |
| Block & Wozniak | The National Historical Geographic Information System: An Update |
|
The National Historical Geographic Information System is a 5-year
NSF-funded project to create and freely disseminate all available
aggregate census information for the United States between 1790 and 2000,
as well as incorporate these data into a Geographic Information Systems
framework. NHGIS is now nearing the end of its second year of development
and we have made significant progress in data and metadata development, an
online data access system, and the creation of historical boundary files.
This presentation will describe our progress to date, including challenges
and solutions in creating a truly large yet generalizable data access
system based on DDI compliant metadata.
| |
| Thomas | Topic - Time - Geography: Navigating the Triangle of Social Science Data |
|
The National Historical Geographic Information System (NHGIS) project at the Minnesota Population Center (funded by the National Science Foundation) encompasses more than a collection of statistical data, shape files, and metadata. Integral to the successful completion of this project is the development of a search system that allows the user to approach the data from a variety of directions, discover the full range of topics
related to their query, explore the geography, and accurately tie data to the appropriate geography over time. This presentation will describe the approach used by the NHGIS project to solve the problem of linking two centuries of data for a rapidly expanding geographic area, using the information contained in the DDI metadata and inherent within the data files themselves. Built as a stand-alone module, the core of this system can be shared by other data collections providing data for U.S. geographies over time.
| |
| Southall | Great Britain Historical GIS: A new architecture for web dissemination |
|
The GBH GIS was originally developed as a research tool for the historical demography community,
combining a large body of census statistics held in Oracle with digitised boundaries held in ArcInfo.
These two components are available for web download but only separately, via EDINA and the UK History
Data Service, for further analysis. New funding from the UK national lottery requires us to make our
data web accessible as a genuinely national resource to the wider public, especially for local history
studies. Content has been extended to include scanned historic maps and text from 19th century
gazetteers. A new architecture based on Oracle Spatial software and several distinct middleware servers
has been developed
to support this. The core database closely links different information about the same area, including
boundary polygons. Locational and thematic maps are generated via two separate web map servers meeting
Open GIS Consortium standards.
| |
| Maynard | Implementation of the DDI at the Roper Center |
|
This paper will report on the results of a Roper Center project aimed at integrating the Data
Documentation Initiative (DDI) specification with its existing meta-data repositories (catalog database
and iPOLL).
Integration of the DDI specification and Center data resources requires review of meta-data database
structure and semantics, evaluation of relationships among meta-data resources, review and selection of
appropriate DDI elements, identification of meta-data deficiencies and mapping of existing fields to the
DDI. While the main purpose of the project is to develop a scheme for integrating various systems into a
DDI
compliant structure, we also plan to generate XML documents for a limited but diverse collection of
studies, in hopes of developing a base of experience by which to evaluate further implementation of the
DDI.
| |
| Miller | 1-2-3, That's How Elementary It's Gotta Be - Managing DDI |
|
This paper will describe the trials and tribulations experienced during the UK Data Archive's conversion
of metadata records held in a Unix Ingres database, which reflected the structure of the CESSDA Standard
Study Description, to a Microsoft SQLserver database geared up to the DDI xml standard. How the lessons
learned have and will contribute to the EU projects Metadater, Madiera and the development of NESSTAR and
DDI version 2.
The main areas covered will be the management issues of input, consistency, dataset series, controlled vocabularies, performance figures and legacy systems. It will also cover Web manipulation of DDI xml, and in particular the new UK Data Archive's catalogue replacement of BIRON. This application will also be the foundation of the resource discovery tool used in the portal for the new Economic and Social Data Service (ESDS), which came into operation January 2003. | |
| Moschner & Watteler | From Metadata Conversion to MetaDater Management |
|
MetaDater (Metadata Management and Production System for surveys in
Empirical Socio-economic Research) is a European Union funded project
started in January 2003. Its main objectives are:
The resulting standards and tools will support technical harmonization and integration of survey data and contribute to best practice in survey data resource sharing. Partners in the MetaDater project are social scienca data archives in Denmark, Germany, Greece, the Netherlands, Sweden, Switzerland, Norway and the United Kingdom www.metadater.org. | |
| Altman | Persistent Identifiers for Data |
|
The replicability of quantitative social science research has long been impaired by the fragility and
coarseness of citations to data. Emerging systems for creating persistent identifiers offer the promise
of
revolutionary change. In this paper I aim to develop a roadmap for the use of such identifiers with data.
I proceed in three stages. First, I discuss the roles that identifiers play in the citation,
preservation and
discovery of data resources, and I derive requirements that any system for identifying data should
fulfill. Second, I introduce the leading frameworks used for persistent identification of intellectual
property,
describe their functional characteristics, including, assignment, resolution, actionability, scope, and
granularity -- and I evaluate each framework with respect to the requirements. Finally, I suggest how
particular systems can be applied to data, and effectively embedded within existing metadata such as DDI
and MARC.
| |
| Anderson, Brent, Slusarz & Branton | From Question to Query: An Intelligent Strategy for Making Complex Data Accessible to Novice Users |
|
For inexperienced users of complex data sets, constructing an acceptable query can be a frustrating task.
They have to find out what kinds of variables are included and learn their specific names along with
the syntax for specifying queries. This paper describes an intelligent system for converting diverse
questions submitted by novice users into the often much narrower range of queries suitable for generating
tables from complex data sets. The paper illustrates the process of conversion from na‹ve free-form
questions to structured queries. Natural language strategies are used to parse the initial user request
to narrow its possible meanings. Those are then mapped onto the range of possible questions the data can
answer based on a detailed semantic network describing the data.. Users are then shown the program's
restatement of what they have asked along with relevant alternative questions ordered by their
likelihood. Those alternatives include both analytically generated queries and example queries having
similar properties. The program acts as an intelligent agent, permitting users to issue broad queries
while delegating the details to the agent. Case-based reasoning guides the user to relevant examples.
Machine learning permits successful queries to be added to the program's expanding knowledge base for
help with future queries. The paper outlines the broad strategies and then illustrates how the system
performs for a sample of user questions. The program is implemented for the PDQ-Explore system for
providing rapid, intelligent access to the IPUMS (Integrated Public Use Microdata Series) dataset.
| |
| Grim | The role of metadata for integrating data and documentation; the OSA Labour Supply Panel, a case study |
|
Metadata are relevant in different areas, such as the definition of standards, summarizing information on
data quality, and modelling of data and processes (Kent & Schuerhoff, 1997). Metadata for panel data are
often limited to comprehensive and well-specified documentation for each wave on what the data are about
and how they have been produced. When metadata is also used for the modelling of data and processes
(Kent et al., 2000), however, this has implications for data management both from a technical (database)
and from a statistical viewpoint. Focussing explicitly on panel data structures, as a starting point for
organizing the metadata, offers new possibilities to integrate data and panel data documentation. These
possibilities contribute furthermore in facilitating end-users with accessing the dissemination results
of panel data research. This paper describes how the metadata are implemented at the Institute for Labour
Studies for the OSA-Labour Supply Panel. It shows from a data-management and end-user viewpoint how
metadata modelling for panel data can enhance information on data quality. The paper also shows how XML
can be used to integrate data and documentation elements.
| |
| Burnhill | Getting to Know the Score: Using the First 20 Years to Plan the Next |
|
It was twenty years ago today, that IASSIST taught this band to play.
Set-up in 1983, our first gig was in Amsterdam, to tell the IASSIST Annual
Conference how the University of Edinburgh came to set-up its Data
Library. Having started small, we have grown in numbers, and now deliver
words, pictures and sounds to staff and students across the whole of UK
higher and further education in our role as national data centre. A brief
history of our time may be found at
http://www.ucs.ed.ac.uk/bits/2003/january_2003/ and you can open a window
onto our present at http://edina.ac.uk and
http://datalib.ed.ac.uk. In our
paper, we intend to count the changes and chart the future, hopefully to
go from strength to strength.
| |
| Olsen | Recovering and Preserving Data from a Large Long-term Data Collection Project |
|
The Utah Colleges Exit Poll is a continuing state-wide voter survey which has been conducted every two years from 1982 to 2002. This collaboration between the political science and statistics departments provides students with experience in a wide range of research activities, from sample design and
instrument development to statistical estimation and data analysis. Data was gathered each election day
from 5,000 to 7,000 voters using 5 separate questionnaire forms. Researchers "call" election races and provide analysis in an election night television broadcast. This archiving project is designed to identify, locate, and organize the data, documentation, and other materials associated with the exit poll over 20 years and 11 election cycles, creating an on-line system for data extraction and subsetting, questionnaire and codebook retrieval and searching, and access to publications, video, etc. Advice and hints are provided.
| |
| Lenhardt | Privacy and Confidentiality Issues with Spatial Data |
|
Social scientists are well acquainted with protecting respondent privacy and confidentiality in survey research. Increasingly, data with a spatial component are being used in social science research as GIS technology and spatial analytic techniques take hold among researchers. These types of data may include locations of households, remote sensing images, or maps of clusters of disease incidences. As spatially based data become available in higher resolutions, the possibility of eroding privacy and confidentiality is increased. Similarly, merging spatial data from different data sources, any one of which may not reveal information about individuals, may allow respondents to be identified.
While a number of methods for protecting respondent confidentiality are used for survey research, at this time little work has been done to address these issues for spatial data. In our presentation we will describe the privacy and confidentiality issues generated by spatial data and make some preliminary suggestions of what researchers can do to address these issues. | |
| Linden | FGDC, Meet the DDI: Adding Geospatial Metadata to a Numeric Data Catalog |
|
StatCat, Yale's statistical data finder, was initially designed as a database of selected DDI elements. Now StatCat is being redesigned to include metadata for geospatial data. In the process of writing a crosswalk from the FGDC metadata standard to the DDI elements in StatCat, we examined the compatibilities and incompatibilities between both standards and determined how to reconcile differences for the purpose of data discovery within a database structure. This project also entailed choosing an optimal subset of metadata elements from the two standards, designing a search interface for numeric and/or geospatial data, importing metadata from various sources, and planning for future developments in the delivery and presentation of numeric and geospatial data and metadata.
| |
| van der Steen | Taking down barriers around GIS-data for Dutch universities |
|
Dutch scientists have difficulties accessing Dutch Geographical Information System (GIS) data. The prices asked by governmental institutions and private companies for the use of GIS-data are far to high for universities, even though the need for GIS-data is substantial. Governmental institutions that want to
share their data for scientific use for free are accused of false competition. In the Netherlands private
companies have made huge investments to acquire GIS-data and want to protect their investments.
Despite this barrier the Scientific Statistical Agency (WSA) of the Netherlands Organization for Scientific Research (NWO) has succeeded in the acquisition of GIS-data for Dutch universities. An other problem in Dutch GIS-data is lack of preservation of data in a central GIS-data archive. A system will be developed to allow controlled access via the Internet. We hope to start archiving the data we have acquired systematically in a few months as well. Unfortunately, foreign researchers aren't allowed to use this data. Perhaps in two or three years the government will allow the creation of a public GIS-data infrastructure. | |
| Reed, Blunsdon, McNeil & McEachern | Integrating public domain data to construct community profiles |
|
This paper describes a project that aims to collate and integrate publicly available data from a wide variety of sources, in order to construct community profiles based on economic, social, political and cultural differences. Recently, there has been a growing recognition that individual behaviour needs to be understood in terms of the context in which people live their daily lives, such as the neighbourhood, school, community or region. There is a great deal of data available in the public domain, but these are collected for a wide range of purposes, and with great variation in the units of analysis. Our project integrates data at the level of local government areas in Victoria, Australia. It takes the years 1996 and 2001 as its base, because these are years in which there was both a national census and a national election, and so community level data is available for demographic variables and voting behaviour for these years. From this base, we draw on a variety of sources to include data on businesses, crime and suicide, licensed premises, information about religious institutions, schools, recreational facilities, services (law courts, drug advice and so on), cultural organisations and government and community services.
| |
| Soltes | e-Europe and e-Europe+ Projects |
|
One of the the main strategic goals of the European Union for the first decade of third millennium has been according to the so called Lisbon strategy to make the European Union the most competitive and dynamic knowledge-based economy with improved employment and social cohesion by year 2010. One of the main ways and means how to achieve this ambitious strategic goal has been the e-Europe Project with the main aim to better and fully utilize current information and communication technologies to such an extent as to provide by the year 2005 that the united Europe will have modern on-line services, e-government, e-learning services, e-health services, a dynamic e-business environment, etc. As in the next year 2004 on the 1 May, the EU has to be enlarged by another ten new member states from the CEEC, currently the project e-Europe has been extended also towards these future new member states in the form of the so called e-Europe+ Project with the main aim to prepare these countries for their inclusion into the above challenges of the e-Europe strategy. As this author has been coordinating the e-Europe+ Project for the Slovak Republic, the paper will deal with and present some first preliminary results so far achieved within this project especially as far as the first five selected sectors (telecommunications, education, work/skills/employment, social inclusion and e-government) are concerned.
| |
| A panel discussion chaired by Michael R. Carlson, U.S. National Archives and Records Administration (NARA) | |
| Panelists: Margaret O. Adams, NARA; Melanie F. Wright, UK Data Archive, and Janet Vavra, ICPSR | |
|
The social science data community, as well as academics generally, government officials, journalists, lawyers, and the general public increasingly expect direct online access to .born digital. archival holdings. This session will be devoted to a discussion of the specific experiences of three major data archives that have implemented online access to all or portions of their holdings, focussing on the manner in which offering this form of access has altered (or not) the services they support. Each of the panelists will describe the types of online service offered by her institution and attempt to sort out the realities from the assumptions about offering online access to archival materials. The panelists will also address and encourage audience commentary on : how each institution decided what archival materials and services to offer online; what happens when users have remote direct access to archival materials; whether offering online services has changed internal administrative operations in the archives; whether online services caused a change in the interaction between researchers and archivists; whether the online services have led to more or less use of the archival materials; and finally, whether users seem satisfied with the services offered online.
|
| Brown | The U. S. Decennial Census of Population and Housing from an Archival Perspective |
|
In 1789, the framers of the Constitution of the United States mandated an "actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States, and within every subsequent Term of ten Years, in such Manner as they shall be by Law directed." Since then, the United States Government has conducted an actual enumeration each decade. Some of the results of each of those enumerations have come to reside in the National Archives of the United States. This paper will explore the history of the process of determining the archival value at the U.S. National Archives and the history of the transfer of the census materials to the archives. Finally the paper will discuss the history of providing access to those materials to researchers.
| |
| Downey | The Census of Canada from an Archival Perspective |
|
Since 1666, when Intendant Jean Talon enumerated the 3 215 inhabitants of what was then New France, the census has been a wonderful source of information about Canadians. However, is the fact that these documents are a 'wonderful source of information' enough to give them archival value? What is archival value anyway, and why is it important? Shouldn't archives just acquire what researchers want? This paper will address these multi-faceted issues, including the history of census taking in Canada, the process of determining archival value at the National Archives of Canada (or NA), the history of the transfer of the census to NA custody and current census initiatives - including digitization at Statistics Canada and the NA.
| |
| Crowe | From destruction to digitisation: a short history of census records in Ireland |
|
This paper will take a gallop through the different fates of census records from 1821 to 1911, with reasons for those fates, and an outline of what plans are now to make surviving census records more widely accessible. The non-statistical value of these records - their uses for local, social and family history - will also be explored.
| |
| Corti | New Directions for the Uk Qualitative Data Service - 2003 and beyond |
|
The Economic and Social Data Service (ESDS) is a new national data archiving and dissemination service which came into operation in January 2003. The service has initially secured funding for just under five years, and is a jointly-funded initiative sponsored by the Economic and Social Research Council (ESRC) and the Joint Information Systems Committee (JISC). The ESDS has been established as a distributed service, based on a collaboration between four key centres of expertise: the UKDA and the Institute for Social and Economic Research (ISER), both based at the University of Essex; and MIMAS and the Cathie Marsh Centre for Census and Survey Research (CCSR), both located at the University of Manchester.
This specialist service for qualitative data, Qualidata, hosted by the UKDA will provide access to and support for a range of qualitative datasets. The work builds on Qualidata expertise and international reputation in this area, developed over the past eight years. A strategy of data enhancements, identified in consultation with key stakeholders will be developed including:
In addition to providing enhanced access to user-friendly qualitative data, great emphasis is also being placed on user support and training. Training will focus on generalist introductory and more focused workshops on detailed areas of research interest and methodology; and will be supplemented by 'data confrontation' workshops aiming to enhance the methodological and substantive understanding, and secondary analytical potential, of archived qualitative data sources. Finally, the service will provide a programme of support for creators and depositors of qualitative data to ensure that high quality, well-documented and ethically conformant data are acquired and distributed. | |
| Hill | From manuscripts to metadata: collaborating working in the Archives Hub |
|
This paper will describe the growth and the future development of the Archives Hub service, one part of the emerging UK National Archive Network. The service provides free public access to descriptions of over 10,000 archive collections held in more than 60 universities and colleges in the United Kingdom. The archives cover a broad range of subjects, relevant to many areas of research, and date from medieval times to the present day. Collaboration has been a key part of the service at every level, from its funding to its management and its day-to-day operation. The archival descriptions in the Hub are created using international standards by archivists and librarians in the institutions which hold the collections. Currently the descriptions are held centrally, but this paper will describe the progress that has been made on developing a distributed service that will allow institutions to maintain their metadata locally while continuing to allow access to it through the main Archives Hub website at
http://www.archiveshub.ac.uk/.
| |
| Teixeira | Advances in Data Preservation: The Roper Center Archive Approach |
|
During the past several years the Roper Center Archives have undergone a significant shift in emphasis - refocusing effort and resources to upgrading data preservation and access capabilities. A multi-track approach was developed to achieve this objective concentrating on 1.) improving the cataloged record for each study, 2.) scanning paper documentation to PDF files, 3.) addressing concerns about data confidentiality and responsible use and 4.) converting older data format files (primarily column binary) to more standard formats. Based on this approach, a pilot project on the Roper/Fortune survey data series (75 studies, 1938-1949) was conducted in Spring 2000. The success of this project was encouraging and led to the development of a more aggressive plan to transform the archives. Much more work has been completed using this approach and while far from complete, the archives stand in a much better place today then in recent history. This paper examines the progress made through these efforts and the challenges encountered, then reports on the current state of the Roper Center Archives and provides a look ahead to future developments.
| |
| Dekker | "To be or not to be", that shouldn't be the question! |
|
Several societal, technological and scientific trends, like
digitalisation and internationalisation of research, urge(d) social
science data archives to adapt to their new environment. Most archives
(especially the successful ones) have indeed changed their mission,
strategy and organisation recently.
In the Netherlands however we got trapped in our "polder model" (that is, find a consensus solution for every problem). Consequently, data services are scattered, are lacking coordination and are hardly able to adapt to the trends above. Moreover, being small and old-fashioned hinders to keep up with international developments. In stead of setting up a strategy for renewal of our social science data services we currently have a discussion whether data archiving should be continued: Steinmetz Archive will be evaluated on its viability because its holding organisation (NIWI) will probably stop to exist! This threat on the Steinmetz accelerates the discussion how to adapt the data services to meet the new trends and to take away the current disadvantages. In my presentation I will present some ideas for establishing new data services. Using this blueprint I would like to have a discussion about these ideas and how to bring back the Netherlands in the international data services arena. | |
| Timms-Ferrara | Public Opinion Matters: A New Roper Center Program Designed to Promote Classroom Use of Public Opinion Data |
|
As survey data become more and more a part of our daily lives, there is increased value in training students of all levels and academic disciplines how to locate and use this information with confidence. This paper explores a new program developed at the Roper Center that provides an introduction to the various tools for locating and analyzing data and promotes the use of public opinion information in the classroom. Using a variety of sources, Public Opinion Matters offers different topical modules and presents options for the research process that appeal to both the novice and veteran researcher. Integrated into this process are the following:
Public Opinion Matters brings most or all of these tools together and enables users to easily review a set of available data on a given topic. This discussion will include full definitions and examples of the metadata currently utilized at the Roper Center. Included in this presentation will be a demonstration of iPOLL, to acquaint users with the broad range of coverage available beyond the selection of materials included in Public Opinion Matters. | |
| Gray | Data & Statistical Literacy for Librarians |
|
Librarians have long provided support for statistical information which they could treat as any other print material with the expectation that the reader would be responsible for how the information was used. Data archivists and data librarians often worked within institutes that included statisticians or survey methodologist that could provide expect advice for researchers undertaking statistical analysis of social or economic data files. As data products were introduced to the library, first on CD-ROM and more recently via online tools, statisticians, data archivists and data librarians were concerned about user support for these primary sources. At the same time, the use of statistical information and its popularity in the mass media has made statisticians aware of the need for statistical literacy for the general public. This paper proposes that there also exists the need for statistical literacy for librarians as well as an increased awareness of issues of how one evaluates data quality, because librarinas are becoming the new intermediaries to statistical resources. Recently the National Science Foundation funded a research project aimed at making U.S. government statistical available over the Internet more accessible and understandable by the general public. (See http://www.ils.unc.edu/govstat). One of the principal investigators has been quoted as saying .This project will help people without specialized training use the Internet to find, and understand, the statistical data they need.. At the same time, statistical science is moving into more fields with increased diversity and specialization. A beginning might be to have statistical scientists look at coping with data analysis and librarians examine the way they handle data and statistical inquiries.
| |
| Schield | Statistical Literacy Survey: Reading Tables and Graphs |
|
An international survey on reading tables and graphs of rates and percentages been conducted by the W. M. Keck Statistical Literacy project. The 250+ respondents included statistical educators in the US and internationally, school teachers in the US and in South Africa, professionals at the US Bureau of the Census, college faculty in non-statistical areas, and college students. The primary focus was on the use of informal statistics: rates and percentages. The results of this survey are presented and analyzed.
| |
| Gillman | Is Your Data Facility ISO Compliant?: Progress Towards Harmonizing the DDI and ISO 11179 |
|
ISO 11179 is a formally ballotted international standard for the description and registration of data elements. It is being used as a basis for metadata standardization, management and publication to the web in a growing number of large data producing organizations, including Statistics Canada, the U.S. Bureau of the Census, and the U.S. Bureau of Labour Statistics. The DDI and ISO standards are complimentary. The ISO standard would strengthen the administrative side of social science data operations, as well as the ability of archives to capture, describe and manage the concepts and classifications underlying data collections. The DDI enables direct user access, thereby offering increased return on investment (ROI) for data described in ISO-compliant registries. The paper outlines progress towards harmonization resulting from the DAIS|nesstar project. It introduces a data model for the DDI, describes an extension to incorporate ISO 11179, and discusses the implications of ISO compliance for social science data facilities.
| |
| Bradley | DAIS|nesstar - An Update |
|
Health Canada and Nesstar Ltd. have been working to combine the DDMS/DAIS data access system with Nesstar's web-based client server and data publishing technology. Enabled by an extensive set of Canadian DDI-compliant metadata, the first version of DAIS|nesstar is now being used in Health Canada for disseminating core health, socioeconomic and public opinion polling data, along with associated tables, reports and indicators. The presentation provides an overview of the project, products, on-going development work, as well as the current service. Issues of providing a common service orientation across servers in different organizations are highlighted and discussed.
| |
| Stukel | The UNESCO Institute for Statistics: how we decide what data to collect |
|
The UNESCO Institute for Statistics (UIS) was formed in 1999 to support all the statistical activities of UNESCO in its domains of competence: education, science and technology, culture and communications. It serves its 189 Member States by fulfilling the following functional main lines of action: the collection and dissemination of cross-nationally comparable data in its fields of competence; the methodological, technical and conceptual development of statistical systems and international classification systems; the analysis and interpretation of the international data; and statistical capacity building within Member States (particularly for developing countries). In order to fulfill our role as an international organization, it is paramount that the UIS have an effective strategy on what data it should collect and by what mechanism. This talk will discuss the processes the UIS undergoes in order to achieve these ends: the consultation efforts with countries in order to solicit their inputs and discuss possible content and data sources, as well as the workshops and meetings in order to launch new survey instruments and to encourage response to our questionnaires.
| |
| Buffett | Serving up Statistics to an International Community |
|
One challenge in disseminating information of any form is determining who the target audience
is, what they are looking for, and how best to deliver it. In trying to answer this question,
it is important to be able to determine how to measure whether or not the dissemination
activity is successful and is meeting the needs of both the organization and the user
community.
Is it most important for the technology to be implemented so that any country in the world regardless of their place on the technology adoption curve can reference the statistics in a timely manner? Is it most important to implement the latest technology to provide a seamless integration of online databases so that data from multiple international organizations can be accessed simultaneously? The tradeoffs are endless ... but so is the potential. | |
| Zhang | Comparability of cross-national data: How does the UNESCO Institute for Statistics approach the issue? |
|
Cross-national data provide the global/regional picture for advocacy, for resource mobilization and accountability of governments. Based on such data, countries can compare themselves against others to learn from one another, to benchmark important policy goals. There are two broad ways in which UIS uses cross-national data for comparative purposes. The first is to conduct secondary analysis of a variety of data sets, which include country-level data that the UIS collects annually, and student- and household-level microdata that are collected by other organizations. The second way in which the UIS uses data for cross-national comparisons is to develop a core set of indicators that can raise the profile of important issues. Comparisons among countries can be problematic when data are spurious and the reliability of data is susceptible. There are incentives for data distortion when too much emphasis is put on comparisons. We make every effort to ensure data comparability. This includes involving countries in interpreting data results and determining indicators. It also includes contextualising data findings.
| |