IASSIST 2003 Program

A Leadership Role of Data Archives in the Social Sciences

Omerzu Relationships among archives and the social science research community - The case of successful relationship
Six years of existence gave ADP important discovery. Researches use archives only (if only) when they are done with the fieldwork for storing the raw data. But not even then, must archives apply for the data, when first information is published. Based on past experiences, cooperation among the researchers and Archives before/during/after-conducting framework for new study is simplified by constant two-way communication and desired results are achieved. Archives provide existing related studies data both from domestic and international sources, enhancing the possibility of general comparison in advance. Besides providing raw data, archives gather meta-data and important information, which serves as additional reference in the making of the study. Often researches like to reapply past studies, but they find themselves stranded on the data graveyard. Therefore, our duty is to seek for all available information upon which codebooks are produced. But if we encourage research groups to enliven everyday communication while still on the fieldwork, archives normally get as a result of constant and tight cooperation highquality codebooks. The successful cooperation between ADP (=Slovenian social data Archive) and the research project RIS (=Research on Internet in Slovenia) will be presented.

van GelderThe Dutch Question Bank
Steinmetz Archive has two current projects concerning databases of questions. Steinmetz Archive has made a back office system for the Social and Cultural Planning Office in the Netherlands for their longitudinal study "Cultural Changes in the Netherlands". In each wave of this survey, a number of questions is taken from the same pool of questions. The other project is the Dutch Question Bank. In this project a internet interface is made in which researchers can search a database with questions from major studies in the Netherlands. The aims of this project are:
-make available the question wordings and frequencies
-to facilitate comparison between studies.

Advancing Research and Data Literacy: empowering users

Corti Exploiting UK survey data sources for teaching political science: experiences from the classroom
This paper reports on the progress from a small-scale project that aims to show how real life data resources can be exploited in the classroom. A resource will be created aim to transfer key skills (at higher and post-16 educational level) in relating to the understanding and appreciation of statistical data and analysis as they relate to substantive issues.

In view of the recognition in the UK that the skills shortage for quantitative analysis is now critical, introducing the concepts that enable working with real life data sources, early on in post-16 education is one way to redress this shortage. Not only does practical knowledge about survey methods and secondary analysis teach students how research is actually conducted, it also provides them with a tangible and marketable skill that they can use in future employment.

Through a number of strategic investments both by the JISC and the ESRC, the UK academic community has access to a unique and expansive range of digital data resources. Whilst individual datasets are used extensively in academic research they are significantly under- used in learning and teaching programmes within Higher Education, and rarely used in Further Education. As a national data provider the UK Data Archive is in a strong position to offer its resources to the learning and teaching communities for developing more 'packaged' resources. Fundamental to understanding how best to re-purpose and apply the content is the need to seek advice and input from instructors in the classroom. This project will draw together a small team of academics, teachers and data archivists/disseminators in order to create and pilot such new resources.

The small-scale set of online learning objects teaching materials aimed at political science students being developed is based upon British Election Study data which provide a unique source of information for students wishing to explore a range of contemporary political issues in Britain. It is hoped that this pilot study will better knowledge on how discrete 'chunks' of learning materials can be preserved and disseminated in a wider national context, so that other teachers can exploit them in a flexible manner.


Shrimplin Focusing in on Student Learning Outcomes: How SDA Helped Us Build a Learning Community for Data Users
Learning and its assessment have become a focus of attention at many universities and colleges. In response to this new environment, academic libraries are struggling to demonstrate the ways in which they contribute to higher education. To meet this challenge, libraries have begun to expand their responsibilities and to take a more active role in the learning process.

This presentation will focus on how we have collaborated with faculty at Miami University to create a shared view on learning outcomes and on how that shared view involves using SDA -- a set of programs for Survey Documentation and Analysis -- as a tool to develop and incorporate web-based data analysis for both undergraduate and graduate courses. Special attention will be given to three ongoing projects, all of which seek to make their courses more analytical while promoting active learning for students.


Balancing the Strength of Numbers with Confidentiality

Neidert & EisenhauerPublic Data and a Thriving Democracy: Threats and Opportunities
The Association of Public Data Users (APDU) provides advocacy on the content of and access to public data. APDU corresponds with administrators and officials directly involved with the collection and dissemination of public data on behalf of its members. The Taskforce on Confidentiality is charged with educating public data users on the related issues of confidentiality and access. This includes the legislative and statutory environment as well as technical issues related to the appropriate use of aggregate and local area statistics and public microdata files. The Taskforce is also charged with communicating the needs of public data users to those government agencies that collect and disseminate data to the public.

For purposes of user education, the Taskforce has developed a series of White Papers describing relevant statutes and pending legislation (e.g., FOIA, Privacy Act, Patriot Act, E-Commerce), and a Primer on Data Confidentiality containing a non-technical explanation of the confidentiality/access debate and of statistical and technical issues related to disclosure limitation. Unlike other publications on these issues, the primer is designed to be accessible to the traditional data librarian or archivist or to the data user with limited formal training in statistics. In addition to meeting the objective of educating public data users about technical and legal issues surrounding the confidentiality/access debate, these materials serve as a jumping off point for the Taskforce as it seeks to communicate the needs of public data users (in terms of data content, data quality issues associated with confidentiality edits, and access conditions) to data producers in a constructive way. The Taskforce is a vehicle for public data users to inform the legislative process and policy debate concerning privacy, confidentiality, and data access.

For purposes of this presentation, the Taskforce's objective is to communicate users' concerns about continued and improved access to high quality survey and administrative data at the aggregate, local area, and microdata levels. In the paper, the Taskforce proposes technical, non-technical, and legislative reforms that can protect public access to data generated by public agencies while maintaining respondent confidentiality and expanding public awareness of the relevant issues; and describes alternative mechanisms by which those reforms might be enacted.


TemilolaResearch Data and Issues of Confidentiality
This paper focuses on issues around protecting information about human subjects and related data sent via the internet. It considers the three concepts necessary to any discussion about data security in any given social milieu: privacy, confidentiality and consent. World wide access through the internet raises many questions including who owns digital information, who has the right to profit from other's work and who has responsibility for guaranteeing or regulating access to valuable information. Researchers in the social and behavioral sciences are expected to be proactive in designing and performing research to ensure that the dignity and privacy of individual remains confidential. The purpose of this paper is that confidentiality issues need to be recognised and considered at every stage of the research process including the initial study design: identification, recruitment and consent processes for the study population; security, analysis and final disposition of data and publication or dissemination of data and results.

Changes in the Way Data Archives Process Data

Fink and HansenData Processing in Danish Data Archives
The Danish Data Archives was established in 1973 as a national data bank for quantitative research carried out primarily in the social sciences but also in medical science and history. In 1993 the DDA became an independent unit in the Danish State Archives. Due to severe cut backs in staff in 2002 the archive has at present 13 full-time employees. The DDA collects, preserves and disseminates machine-readable research data. When the data archive receives a data material, data and documentation are converted to an archival format, which secures technical preservation for the future. According to a priority list based on frequency of usage, the data materials are then processed. Data processing implies standardisation and check of the material as well as creating correct linkage between data and documentation. This part of the data processing secures that the documentation of a data set is preserved as complete as possible and that data stays comprehensible for future usage. The DDA uses the DDI standard for creating data documentation. The paper will give an elaborate description of the data processing procedures employed by the DDA, developed step-by-step - often in corporation with the data archive network the DDA is part of - since the archive was established. These procedures are compared to several fellow archives quite extensive and the most resource demanding activity in the DDA. The data processing performed by the DDA means that the data are perfectly suited for elaborate search engines such as NESSTAR and that assistance to users after they have received material is unnecessary. The output of processing a specific data material is a data documentation protocol (DDP) consisting of a study description and a codebook. As some data archives have done already, the DDA wants to make DDP's available to users on the Internet. At the moment we are considering how to publish these documents as well-suited tools for research and educational purposes. The paper will comment on the considerations we are having concerning this. From 1993 the DDA has been allowed to preserve data with sensitive content and personal identifiers, which primarily involves storage of unique personal identification numbers (cpr). It was necessary for the DDA to implement new security standards and changes in both hard ware and data processing procedures to make this possible. A presentation of the challenges in archiving and processing this kind of data will be included in the paper.

Kleemola and Keckman-KoivuniemiData Processing in FSD: challenges in a new archive
The Finnish Social Science Data Archive (FSD) is a national resource centre for social science research and teaching. It started to operate in 1999 as a separate unit of the University of Tampere. Its primary goal is to increase the use of existing machine-readable social science research data in Finlad. FSD is funded by the Ministry of Education.

At the moment FSD processes all studies intensively which is very time-consuming. Data materials are checked with special attention to, for example, 'anonymization' and filter variables. Data materials are transferred to the databases on variable level. FSD uses SPSS program in data processing and preserves data in SPSS portable format. The archive uses the DDI standard for creating data documentation. Study descriptions and codebooks are available on the Internet.

The paper will describe current FSD data processing procedures in detail and take a look at the future. The paper will also include a list of challenges that our recently founded archive has faced.


CrockettData Processing in the UK Data Archive
In recent years, data archives have devoted more attention to metadata than data. Yet, for all the apparent increase in ease of converting data between formats (via menu driven import and export filters), accurate translation of data between data formats is arguably more difficult than before. This is because software is increasingly designed to provide a view of the data that is divorced from the software's internal (i.e. underlying) representation of the data and its "internal metadata" (variable descriptions, code labels, variable formats, missing values, etc.). Further, this internal metadata has increased in volume and complexity, leaving delimited text as only a partial representation of the full data file, with no de facto standard for storing data and internal metadata in one file (though SPSS portable format comes closest). This situation makes error free data format conversion a critical building block for the dual purposes of most data archives: data sharing and long-term preservation. The major problems that affect data format conversions are:

  • Rounding/truncation of numeric data
  • Truncation of textual data
  • Differences in handling internal metadata (differential label lengths, missing value handling, etc.)
  • Corruption of "specially" formatted variables (especially date/time variables)
  • Embedded special characters
For example, the SPSS "print format" is often used to perform data typing, yet it seldom matches the actual data, which can lead to catastrophic coarsening of data upon conversion. Similarly, one of the ill-documented features of MS Access is that the export precision can be controlled by the number of decimal places in the Regional Options of the Windows Control Panel. Lastly, and an example of point 5 above, embedded characters are an issue in MS Access, wherein fields may contain characters like "tabs" or "carriage returns". Unless these characters are stripped out prior to conversion to delimited text, the data will lose its rectangular structure.

The paper will illustrate the UK Data Archive's recent solutions to these problems. This centres on the development of Visual Basic scripts which automate, standardize, and remove known sources of error when performing common conversions - typically from SPSS to STATA and tab-delimited text (with customized data dictionary in rtf format) - as well as code to remove undesirable embedded characters from MS Access databases. These tools also allow automation of file and directory naming, removing many of the sources of human error to which repetitive and mundane conversion tasks are prone, leaving data processors more time to check and validate the actual data.


DonakowskiData Processing at ICPSR
Established in 1962, the Inter-university Consortium for Political and Social Research (ICPSR) maintains and provides access to a vast archive of social science data for research and instruction. To ensure that data resources are available to future generations of scholars, ICPSR preserves data, migrating them to new storage media as changes in technology warrant. This paper will address the ICPSR experience in preserving and processing data. Topics will include past and current procedures for archiving and distributing the data, as well as a discussion on how the required level of processing is determined. Challenges, such as those that stem from the tension between the research community's desire for immediate access and the amount and cost of processing, will also be presented. This paper will also address attempts to simplify data download procedures through ICPSR Direct, as well as our efforts to meet the demand for access to a variety of data resources through online analysis systems.

Constraints on Data Access: licenses and confidentiality

Dale Access to microdata in the UK: the case of the Samples of Anonymized Records from the 2001 Census
The paper will discuss the process of negotiating the 2001 SARs,the confidentiality issues that have arisen and the implication this has for research. It will then place this in the context of the broader changes taking place in the UK and more widely. In particular it will ask whether technological innovations (eg the GRID) increase the risks of disclosure or offer opportunities to provide a virtual safe setting.

SevertLicensed to Bill: Single Fare vs. Bus Pass
This paper will explore some of the issues involved in the licensing of data products whether they be single-use formats, multi-use CDs, or unlimited online subscriptions. For librarians, this usually means weighing limited use of an affordable product against widely distributed use of an expensive product, in short, buying a single-ride bus fare vs. buying a pass for the whole month. Other issues have to do with interpretation: is the product licensed for 15 simultaneous users, or 15 specific users? And most problematic of all, how to accommodate the faculty member who has their own private subscription/limousine which the rest of the campus wants to use/drive?

Hamilton & HumphreyMeeting the Challenge: the National Population Health Survey and Data Access
The Canadian National Population Health Survey (NPHS) began as a longitudinal survey in 1994/95, collecting information every two years from a panel of representative Canadians and their households. In response to needs identified by the National Health Information Council, survey methodologists were challenged with the task of producing cross-sectional and longitudinal estimates from this survey while ensuring the confidentiality of survey respondents.

The first three waves of the NPHS produced both cross-sectional public use microdata files and longitudinal files, the latter restricted to use by Statistics Canada approved researchers. The NPHS cross-sectional Household public use microdata file was one of the first products to be added to a new channel of scholarly access for Statistics Canada surveys, the Data Liberation Initiative. The DLI provided licensed access to the first three cycles of the NPHS pumfs for researchers and students in 66 post-secondary member institutions across Canada. Were the efforts to overcome confidentiality concerns worth the investment of time and energy? This paper will examine the extraordinary research outcomes that resulted from critical decisions relating to access and product dissemination for the NPHS within the context of confidentiality and information access concerns by partners across a multitude of jurisdictions.


Data Archive Models and Infrastructure: international comparisons

Anderson Managing Data in a Distributed World
The UK is fortunate in having a number of national services that support the research community by acquiring, managing, preserving and providing access to important data collections. The Arts and Humanities Data Service (AHDS) is one such service, supporting arts and humanities research and teaching communities by accessioning, managing and presenting digital research and teaching resources. At the same time higher education institutions are forging ahead by providing on-line services and access to their institutional collections. Within this creative distributed environment, a range of complementary skills and expertise are required, tools and applications are developed and implemented, and best practice and standards established and agreed upon. One of the key challenges facing those of us working in this environment is how to reconcile local and national initiatives and services, and how to ensure that we collaborate rather than duplicate.

This paper will present the model currently being developed by the Arts and Humanities Data Service. The AHDS is a distributed organisation with a Managing Executive and five subject-focused Service Providers providing services for archaeology, history, literature, languages, linguistics, the performing arts, and the visual arts. As such it is a microcosm of the wider distributed environment. The paper will describe the AHDS working model and discuss how this model might be extended to provide a bridge between the local and the national.


HumphreyModels of Data Archiving Services: the results of an international survey
Between September 2001 and June 2002, a consultation was conducted on behalf of the National Archives of Canada and the Social Sciences and Humanities Research Council into possible institutional models for a Canadian national data archive. One activity of this investigation consisted of an international survey of existing institutions providing data archiving services. This paper will review the content of this survey, present a summary of results from this survey, and discuss three generalized models arising from these findings.

WrightMedical Research Data and Models for Sharing adn Preserving Data: The case of the UK Medical Research Council
Over the decades the U.K. Medical Research Council has funded the construction of a large number of population-based studies. These include several longitudinal studies and a significant number of cross sectional studies and clinical trials. But until now, the MRC has not had a formal policy with regard to the archiving and provision for secondary analysis of these data. To that end, the MRC Data Archiving and Access Project was established in 2001 to gather information, consult widely, and at the end of the Project, to make recommendations to Council concerning data archiving and access policy.

The project has been a staged one, with Phase 1 conducting a broad but general survey and convening a Working Group of interested experts. Phase two on the one hand tightened the focus by commissioning a series of site visits to collect in depth information on the conduct of population based data creation and management; and on the other broadened the focus by convening a "Horizons" workshop which attempts to locate the current inquiry in the broader context of developments in e-science generally.

The UKDA played a key role in both phases, first consulting informally with the MRC as they were formulating their broad survey; then sitting on the Data Archiving Working Group which considered the results of the first survey; and finally being hired formally as consultants to help conduct the Phase II site visits.

This paper will present the Project findings and discuss the proposed policy and models of service provision, looking at issues impacting on them such as consent and confidentiality, promoting a culture of data sharing, infrastructure architectures and costs, and researcher attitudes.


Data, Digitization, Electronic Archives and the Web: at a crossroads

EatonThe Development of the Electronic Records Archives Program at the U.S. National Archives
The mission of the National Archives and Records Administration is to ensure, for the citizen and the public servant, for the President and for the Congress and the courts, ready access to essential evidence that documents the rights of citizens, the actions of Federal officials, and the national experience. Increasingly, records are created and maintained in electronic formats. The National Archives is responding to the challenge posed by the diversity, complexity, and enormous volume of electronic records being created today and the rapidly changing nature of the systems that are used to create them with the creation of the Electronic Records Archives Program. The Electronic Records Archives (ERA) is envisioned to be a comprehensive, systematic, and dynamic means for preserving any kind of electronic record, free from dependence on any specific hardware or software. ERA, when operational, will make it easy for the National Archives customers to find records they want, and easy for the National Archives to deliver those records in formats suited to customers' needs. This session will discuss the creation of ERA, the current challenges and future plans for this program.

GrimesThe Emerging Data Web
The Data Web is an emerging network of distributed statistical computing resources encompassing data sets and analytical servers. It is enabled by new Internet technologies -- by grid-computing toolkits, Web-services and specialized communications protocols, portals, and visualization tools -- and promoted by governmental and consortium efforts such as FedStats, the European DataGrid, and the Global Grid Forum. The diversity of approaches and projects will bring both challenges and opportunities to data providers, researchers, and the public.

This presentation will review Data Web evolution, touch on technology underpinnings, and focus on notable efforts to provide distributed analytical and statistical power on the Internet. It will cover key development efforts, discuss how to locate and exploit services, and suggest how data and service providers can join. This presentation will conclude by examining research directions and development directions and trends.


NelsonConstituent Mail Analysis Project (CMAP)
In 1978, the US Senate began to automate the handling of constituent correspondence. Automated constituent correspondence system files are well suited for aggregate, quantitative research. The correspondence management system records provided in electronic form by the Senate Computer Center are an important access tool, a source of significant information, and the only index to Senatorial constituent correspondence. Unlike the correspondence, itself, they can be easily purged of confidential information and therefore more quickly opened for research. Perhaps most significantly, the Senate staffers have already coded demographic and topical information into the computer files, providing a database that can be readily adapted for use with statistical database software. Through a collaborative venture between the Digital Archives and Data Center of Emory's Woodruff Library, the Constituent Mail Analysis Project is building a web access point for the correspondence files of Senator Sam Nunn (Emory is the official repository of Senator Nunn's papers). The project will segregate out metadata describing constituent correspondence written in response to significant events (e.g., the Gulf War and Gays in the Military legislation) and then provide a series of access tools that allow researchers to determine regional and temporal differences in the opinions expressed.

StratfordDigital Library Collections: Fostering Collaboration
The Digital Library Collections Task Force at the University of California, Davis has been asked to make recommendations for the future development of digital collections for the UC Davis Libraries. The goals of the UCD Digital Library Collections Program are to:

- Increase the number and range of digital resources collections and resources available for faculty and students;

- Offer digital library collections that are sustainable, scaleable, and compatible with the UCD and California Digital Library's technology infrastructure, and interoperable with national and international digital library collections and initiatives;

- Promote and support the scholarly creation use of digital content by students and faculty at UCD;

- Collaborate with the CDL and other research libraries in the development of digital library collections, technical infrastructure, and basic user access mechanisms;

- And identify, evaluate, and pursue funding for library digital collections projects.


Empowered by Metadata: building systems of access

BechtEasy access to secondary data for scientific research
The Dutch Scientific Statistical Agency stimulates the secondary use of data for social sciences by improving the availability and accessibility of these data. A database has been set up that contains metadata of relevant data following DDI standards (presented at the IASSIST conference of 2001). Since then, an electronic catalogue was developed that can extract metadata from several databases using XML. This catalogue is freely accessible via Internet and functions as a virtual library. It enables (potential) users to easily search and quickly screen the available data, ideally from several suppliers in one search. Furthermore, it provides ways for obtaining more detailed information and an opportunity to download or order data. At this conference, the design of the electronic catalogue will be presented, together with a view on its usefulness, its advantages and its pitfalls.

OymyrMADIERA: a European Infrastructure for Web-Based Data Dissemination
MADIERA (Multilingual Access to Data Infrastructures of the European Research Area) is a EU-funded project that started in December 2002. Its main objective is to establish a web portal for social science data based on the DDI and extensions to the existing Nesstar technology. The project will develop tools for multilingual support, logics for identifying comparable datasets, a system for geo-referencing of datasets, options for researchers to add materials relevant to certain datasets, and thus build a cumulative knowledge base for social science data.

Within November 2005 a distributed web portal with access to data from archives all over Europe will be established. Furthermore, the aim is to extend the portal to a broader range of data providers. Partners in the MADIERA consortium are the national social science data archives in Norway, UK, Denmark, Finland, Switzerland, Greece and Germany, plus Nesstar Limited. (Website: www.madiera.net)

The presentation will give a full overview of the MADIERA project.


Schulz, Brockfeld, Kelpin, Parnitzke & Wagner Clearinghouse for Transport Data and Transport Models - Concept and Implementation
The paper presents both concept and current implementation status of the new "Clearinghouse for Transport Data and Transport Models", run by the Institute for Transport Research at the German Aerospace Center. Although transport related research is highly dependent on reliable data, many relevant empirical studies, statistical data or modelling approaches are exclusively known only by a small number of well informed users. Facing that problem, the internet-based clearinghouse will facilitate easy access to metadata as well as datasets and models for a broader public. The information available on the website includes a wide range of detailed metadata, related material such as scanned questionnaires, code lists, publication lists, or supplementary hyperlinks. Using XML-technology, the documentation of metadata ist based on the DDI Documet Type Definition "codebook.dtd". Search for datasets or models is supported by thematic catalogues and a site-specific search-engine based on a thesaurus. To display both data and metadata of statistical data the NESSTAR system is used.

Historical Censuses: numbers from the past

Block, Davis, & Peterson The future of the Integrated Public Use Microdata Series: IPUMS International and IPUMS Redesign
This presentation will describe two major data integration projects underway at the Minnesota Population Center. The first is IPUMS International, a project dedicated to collecting and distributing census data from around the world. Its goals are to collect and preserve data and documentation, harmonize data, and disseminate the data free of charge. Data is currently available for Colombia, France, Kenya, Mexico, the United States, and Vietnam. Other countries will follow, including a .1% sample from China. Our second major integration is a redesign of the Integrated Public Use Microdata Series (IPUMS). This project will create two large parallel series of historical U.S. census microdata. The first is a revamped IPUMS that, among other improvements, incorporates Census 2000 and American Community Surveys. The second is a restricted-use microdata archive containing 1.4 billion records from the censuses of 1940 to 2000. The two series will be developed simultaneously using the same software, methodology, and documentation. This will enable researchers to design their analyses with publicly accessible data and limit expensive time in a Research Data Center. Public-use test datasets will be developed to mimic the unique aspects of the restricted files, allowing researchers to test research designs, demonstrate their feasibility, and minimize research costs.

SchrevenProviding access to the Dutch population census of 1971
Although the first Dutch population census was held in 1795 by the occupying French, it wasn.t until 1829 that the Dutch picked up on the idea and institutionalised the concept. From then on, there was a decennial census until 1930. The 1940 census was cancelled due to World War II, but soon thereafter the thread was picked up again, resulting in general population censuses in 1947 and 1960 and a housing census in 1956. The late sixties and seventies showed an increasing public concern with the protection of privacy. This led to a limited public ban on the 1971 census, only some .18 percent actually refused to cooperate. The 1981 census on the other hand was first postponed and later cancelled because of an average non-response of 26 percent during census trials.

Since 1997 the Netherlands Institute for Scientific Information Services (NIWI) and Statistics Netherlands (CBS) have been working on several projects to digitise the Dutch population censuses. The first results, consisting of two sets of CD-ROMs, a Website (www.volkstellingen.nl), were presented in 1999. Through these CD-ROMs and the website images of the census publications from 1795 to 1971 were presented, also some 10,000 pages of published data were manually converted for the 1899 census, these are available through the CBS Statline system (http://www.cbs.nl/en/statline/index.htm). More recently NIWI and CBS are cooperating in a project aimed to do the same for all the other censuses. What.s more, the individual data of the last two censuses (1960 and 1971) will become available for research as well.

My paper will deal with the some of the problems encountered while examining the 1971 individual data, as well as the actions taken to ensure that individual citizens cannot be identified within the data. Furthermore I will present the ongoing project as it is.


GeyRescuing Historical Censuses at UC Data
Between 1972 and 1988 the Lawrence Berkeley Laboratory of the University of California acquired most known population counts in machine readable form from the 1970 and 1980 decennial censuses at levels of geography down to the census enumeration district and block group, as well as other auxilliary files from the Bureau and other sources such as 1947-1977 consolidated county and city data book and mortality detail files for 1965-1985 from NCHS. Included in this data are unique files which don't seem to be found at ICPSR such as 1960 population by county (1000 items) and 1970 census second count (single years of age down to census tract level of geography). The data were converted into a unique compressed format and stored on tapes on a CDC-7600 supercomputer and later of DEC VAX clusters.

Before the last running computer containing this unique database failed in year 2000 a complete dump of this data was made by the Census Bureau and sent to UC DATA on DLT tape (34 gigabytes). This presentation will discuss the project which we are undertaking to rescue this data by decoding from ancient tape archiving formats and decompressing the highly compressed data (the final decompressed archive should exceed 100 gigabytes of historical data).


Historical Spatial Data: enriching GIS

Block & WozniakThe National Historical Geographic Information System: An Update
The National Historical Geographic Information System is a 5-year NSF-funded project to create and freely disseminate all available aggregate census information for the United States between 1790 and 2000, as well as incorporate these data into a Geographic Information Systems framework. NHGIS is now nearing the end of its second year of development and we have made significant progress in data and metadata development, an online data access system, and the creation of historical boundary files. This presentation will describe our progress to date, including challenges and solutions in creating a truly large yet generalizable data access system based on DDI compliant metadata.

ThomasTopic - Time - Geography: Navigating the Triangle of Social Science Data
The National Historical Geographic Information System (NHGIS) project at the Minnesota Population Center (funded by the National Science Foundation) encompasses more than a collection of statistical data, shape files, and metadata. Integral to the successful completion of this project is the development of a search system that allows the user to approach the data from a variety of directions, discover the full range of topics related to their query, explore the geography, and accurately tie data to the appropriate geography over time. This presentation will describe the approach used by the NHGIS project to solve the problem of linking two centuries of data for a rapidly expanding geographic area, using the information contained in the DDI metadata and inherent within the data files themselves. Built as a stand-alone module, the core of this system can be shared by other data collections providing data for U.S. geographies over time.

SouthallGreat Britain Historical GIS: A new architecture for web dissemination
The GBH GIS was originally developed as a research tool for the historical demography community, combining a large body of census statistics held in Oracle with digitised boundaries held in ArcInfo. These two components are available for web download but only separately, via EDINA and the UK History Data Service, for further analysis. New funding from the UK national lottery requires us to make our data web accessible as a genuinely national resource to the wider public, especially for local history studies. Content has been extended to include scanned historic maps and text from 19th century gazetteers. A new architecture based on Oracle Spatial software and several distinct middleware servers has been developed to support this. The core database closely links different information about the same area, including boundary polygons. Locational and thematic maps are generated via two separate web map servers meeting Open GIS Consortium standards.

Metadata Conversions: migration experiences

MaynardImplementation of the DDI at the Roper Center
This paper will report on the results of a Roper Center project aimed at integrating the Data Documentation Initiative (DDI) specification with its existing meta-data repositories (catalog database and iPOLL). Integration of the DDI specification and Center data resources requires review of meta-data database structure and semantics, evaluation of relationships among meta-data resources, review and selection of appropriate DDI elements, identification of meta-data deficiencies and mapping of existing fields to the DDI. While the main purpose of the project is to develop a scheme for integrating various systems into a DDI compliant structure, we also plan to generate XML documents for a limited but diverse collection of studies, in hopes of developing a base of experience by which to evaluate further implementation of the DDI.

Miller1-2-3, That's How Elementary It's Gotta Be - Managing DDI
This paper will describe the trials and tribulations experienced during the UK Data Archive's conversion of metadata records held in a Unix Ingres database, which reflected the structure of the CESSDA Standard Study Description, to a Microsoft SQLserver database geared up to the DDI xml standard. How the lessons learned have and will contribute to the EU projects Metadater, Madiera and the development of NESSTAR and DDI version 2.

The main areas covered will be the management issues of input, consistency, dataset series, controlled vocabularies, performance figures and legacy systems. It will also cover Web manipulation of DDI xml, and in particular the new UK Data Archive's catalogue replacement of BIRON. This application will also be the foundation of the resource discovery tool used in the portal for the new Economic and Social Data Service (ESDS), which came into operation January 2003.


Moschner &
Watteler
From Metadata Conversion to MetaDater Management
MetaDater (Metadata Management and Production System for surveys in Empirical Socio-economic Research) is a European Union funded project started in January 2003. Its main objectives are:

  • developing standards to describe large scale comparative surveys over space and time;
  • developing a comprehensive data model and tools for survey metadata creation and management;

    The resulting standards and tools will support technical harmonization and integration of survey data and contribute to best practice in survey data resource sharing. Partners in the MetaDater project are social scienca data archives in Denmark, Germany, Greece, the Netherlands, Sweden, Switzerland, Norway and the United Kingdom www.metadater.org.


  • Metadata Power Tools: persistent identifiers, intelligent agents, and metadata modeling

    AltmanPersistent Identifiers for Data
    The replicability of quantitative social science research has long been impaired by the fragility and coarseness of citations to data. Emerging systems for creating persistent identifiers offer the promise of revolutionary change. In this paper I aim to develop a roadmap for the use of such identifiers with data. I proceed in three stages. First, I discuss the roles that identifiers play in the citation, preservation and discovery of data resources, and I derive requirements that any system for identifying data should fulfill. Second, I introduce the leading frameworks used for persistent identification of intellectual property, describe their functional characteristics, including, assignment, resolution, actionability, scope, and granularity -- and I evaluate each framework with respect to the requirements. Finally, I suggest how particular systems can be applied to data, and effectively embedded within existing metadata such as DDI and MARC.

    Anderson, Brent, Slusarz & BrantonFrom Question to Query: An Intelligent Strategy for Making Complex Data Accessible to Novice Users
    For inexperienced users of complex data sets, constructing an acceptable query can be a frustrating task. They have to find out what kinds of variables are included and learn their specific names along with the syntax for specifying queries. This paper describes an intelligent system for converting diverse questions submitted by novice users into the often much narrower range of queries suitable for generating tables from complex data sets. The paper illustrates the process of conversion from na‹ve free-form questions to structured queries. Natural language strategies are used to parse the initial user request to narrow its possible meanings. Those are then mapped onto the range of possible questions the data can answer based on a detailed semantic network describing the data.. Users are then shown the program's restatement of what they have asked along with relevant alternative questions ordered by their likelihood. Those alternatives include both analytically generated queries and example queries having similar properties. The program acts as an intelligent agent, permitting users to issue broad queries while delegating the details to the agent. Case-based reasoning guides the user to relevant examples. Machine learning permits successful queries to be added to the program's expanding knowledge base for help with future queries. The paper outlines the broad strategies and then illustrates how the system performs for a sample of user questions. The program is implemented for the PDQ-Explore system for providing rapid, intelligent access to the IPUMS (Integrated Public Use Microdata Series) dataset.

    GrimThe role of metadata for integrating data and documentation; the OSA Labour Supply Panel, a case study
    Metadata are relevant in different areas, such as the definition of standards, summarizing information on data quality, and modelling of data and processes (Kent & Schuerhoff, 1997). Metadata for panel data are often limited to comprehensive and well-specified documentation for each wave on what the data are about and how they have been produced. When metadata is also used for the modelling of data and processes (Kent et al., 2000), however, this has implications for data management both from a technical (database) and from a statistical viewpoint. Focussing explicitly on panel data structures, as a starting point for organizing the metadata, offers new possibilities to integrate data and panel data documentation. These possibilities contribute furthermore in facilitating end-users with accessing the dissemination results of panel data research. This paper describes how the metadata are implemented at the Institute for Labour Studies for the OSA-Labour Supply Panel. It shows from a data-management and end-user viewpoint how metadata modelling for panel data can enhance information on data quality. The paper also shows how XML can be used to integrate data and documentation elements.

    Data Services: a mix of the new and the old

    BurnhillGetting to Know the Score: Using the First 20 Years to Plan the Next
    It was twenty years ago today, that IASSIST taught this band to play. Set-up in 1983, our first gig was in Amsterdam, to tell the IASSIST Annual Conference how the University of Edinburgh came to set-up its Data Library. Having started small, we have grown in numbers, and now deliver words, pictures and sounds to staff and students across the whole of UK higher and further education in our role as national data centre. A brief history of our time may be found at http://www.ucs.ed.ac.uk/bits/2003/january_2003/ and you can open a window onto our present at http://edina.ac.uk and http://datalib.ed.ac.uk. In our paper, we intend to count the changes and chart the future, hopefully to go from strength to strength.

    OlsenRecovering and Preserving Data from a Large Long-term Data Collection Project
    The Utah Colleges Exit Poll is a continuing state-wide voter survey which has been conducted every two years from 1982 to 2002. This collaboration between the political science and statistics departments provides students with experience in a wide range of research activities, from sample design and instrument development to statistical estimation and data analysis. Data was gathered each election day from 5,000 to 7,000 voters using 5 separate questionnaire forms. Researchers "call" election races and provide analysis in an election night television broadcast. This archiving project is designed to identify, locate, and organize the data, documentation, and other materials associated with the exit poll over 20 years and 11 election cycles, creating an on-line system for data extraction and subsetting, questionnaire and codebook retrieval and searching, and access to publications, video, etc. Advice and hints are provided.

    Spatial Data Access Issues: enabling GIS

    LenhardtPrivacy and Confidentiality Issues with Spatial Data
    Social scientists are well acquainted with protecting respondent privacy and confidentiality in survey research. Increasingly, data with a spatial component are being used in social science research as GIS technology and spatial analytic techniques take hold among researchers. These types of data may include locations of households, remote sensing images, or maps of clusters of disease incidences. As spatially based data become available in higher resolutions, the possibility of eroding privacy and confidentiality is increased. Similarly, merging spatial data from different data sources, any one of which may not reveal information about individuals, may allow respondents to be identified.

    While a number of methods for protecting respondent confidentiality are used for survey research, at this time little work has been done to address these issues for spatial data.

    In our presentation we will describe the privacy and confidentiality issues generated by spatial data and make some preliminary suggestions of what researchers can do to address these issues.


    LindenFGDC, Meet the DDI: Adding Geospatial Metadata to a Numeric Data Catalog
    StatCat, Yale's statistical data finder, was initially designed as a database of selected DDI elements. Now StatCat is being redesigned to include metadata for geospatial data. In the process of writing a crosswalk from the FGDC metadata standard to the DDI elements in StatCat, we examined the compatibilities and incompatibilities between both standards and determined how to reconcile differences for the purpose of data discovery within a database structure. This project also entailed choosing an optimal subset of metadata elements from the two standards, designing a search interface for numeric and/or geospatial data, importing metadata from various sources, and planning for future developments in the delivery and presentation of numeric and geospatial data and metadata.

    van der SteenTaking down barriers around GIS-data for Dutch universities
    Dutch scientists have difficulties accessing Dutch Geographical Information System (GIS) data. The prices asked by governmental institutions and private companies for the use of GIS-data are far to high for universities, even though the need for GIS-data is substantial. Governmental institutions that want to share their data for scientific use for free are accused of false competition. In the Netherlands private companies have made huge investments to acquire GIS-data and want to protect their investments.

    Despite this barrier the Scientific Statistical Agency (WSA) of the Netherlands Organization for Scientific Research (NWO) has succeeded in the acquisition of GIS-data for Dutch universities. An other problem in Dutch GIS-data is lack of preservation of data in a central GIS-data archive.

    A system will be developed to allow controlled access via the Internet. We hope to start archiving the data we have acquired systematically in a few months as well.

    Unfortunately, foreign researchers aren't allowed to use this data. Perhaps in two or three years the government will allow the creation of a public GIS-data infrastructure.


    Strengthening Numbers: measurement, aggregation and policy

    Reed, Blunsdon, McNeil & McEachernIntegrating public domain data to construct community profiles
    This paper describes a project that aims to collate and integrate publicly available data from a wide variety of sources, in order to construct community profiles based on economic, social, political and cultural differences. Recently, there has been a growing recognition that individual behaviour needs to be understood in terms of the context in which people live their daily lives, such as the neighbourhood, school, community or region. There is a great deal of data available in the public domain, but these are collected for a wide range of purposes, and with great variation in the units of analysis. Our project integrates data at the level of local government areas in Victoria, Australia. It takes the years 1996 and 2001 as its base, because these are years in which there was both a national census and a national election, and so community level data is available for demographic variables and voting behaviour for these years. From this base, we draw on a variety of sources to include data on businesses, crime and suicide, licensed premises, information about religious institutions, schools, recreational facilities, services (law courts, drug advice and so on), cultural organisations and government and community services.

    Soltese-Europe and e-Europe+ Projects
    One of the the main strategic goals of the European Union for the first decade of third millennium has been according to the so called Lisbon strategy to make the European Union the most competitive and dynamic knowledge-based economy with improved employment and social cohesion by year 2010. One of the main ways and means how to achieve this ambitious strategic goal has been the e-Europe Project with the main aim to better and fully utilize current information and communication technologies to such an extent as to provide by the year 2005 that the united Europe will have modern on-line services, e-government, e-learning services, e-health services, a dynamic e-business environment, etc. As in the next year 2004 on the 1 May, the EU has to be enlarged by another ten new member states from the CEEC, currently the project e-Europe has been extended also towards these future new member states in the form of the so called e-Europe+ Project with the main aim to prepare these countries for their inclusion into the above challenges of the e-Europe strategy. As this author has been coordinating the e-Europe+ Project for the Slovak Republic, the paper will deal with and present some first preliminary results so far achieved within this project especially as far as the first five selected sectors (telecommunications, education, work/skills/employment, social inclusion and e-government) are concerned.

    The Invisible Force: supporting users of online services

    A panel discussion chaired by Michael R. Carlson, U.S. National Archives and Records Administration (NARA)
    Panelists: Margaret O. Adams, NARA; Melanie F. Wright, UK Data Archive, and Janet Vavra, ICPSR
    The social science data community, as well as academics generally, government officials, journalists, lawyers, and the general public increasingly expect direct online access to .born digital. archival holdings. This session will be devoted to a discussion of the specific experiences of three major data archives that have implemented online access to all or portions of their holdings, focussing on the manner in which offering this form of access has altered (or not) the services they support. Each of the panelists will describe the types of online service offered by her institution and attempt to sort out the realities from the assumptions about offering online access to archival materials. The panelists will also address and encourage audience commentary on : how each institution decided what archival materials and services to offer online; what happens when users have remote direct access to archival materials; whether offering online services has changed internal administrative operations in the archives; whether online services caused a change in the interaction between researchers and archivists; whether the online services have led to more or less use of the archival materials; and finally, whether users seem satisfied with the services offered online.

    The Roots of Historical Censuses: the archivist's perspective

    BrownThe U. S. Decennial Census of Population and Housing from an Archival Perspective
    In 1789, the framers of the Constitution of the United States mandated an "actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States, and within every subsequent Term of ten Years, in such Manner as they shall be by Law directed." Since then, the United States Government has conducted an actual enumeration each decade. Some of the results of each of those enumerations have come to reside in the National Archives of the United States. This paper will explore the history of the process of determining the archival value at the U.S. National Archives and the history of the transfer of the census materials to the archives. Finally the paper will discuss the history of providing access to those materials to researchers.

    DowneyThe Census of Canada from an Archival Perspective
    Since 1666, when Intendant Jean Talon enumerated the 3 215 inhabitants of what was then New France, the census has been a wonderful source of information about Canadians. However, is the fact that these documents are a 'wonderful source of information' enough to give them archival value? What is archival value anyway, and why is it important? Shouldn't archives just acquire what researchers want? This paper will address these multi-faceted issues, including the history of census taking in Canada, the process of determining archival value at the National Archives of Canada (or NA), the history of the transfer of the census to NA custody and current census initiatives - including digitization at Statistics Canada and the NA.

    CroweFrom destruction to digitisation: a short history of census records in Ireland
    This paper will take a gallop through the different fates of census records from 1821 to 1911, with reasons for those fates, and an outline of what plans are now to make surviving census records more widely accessible. The non-statistical value of these records - their uses for local, social and family history - will also be explored.

    Transforming Social Science Data Archives

    CortiNew Directions for the Uk Qualitative Data Service - 2003 and beyond
    The Economic and Social Data Service (ESDS) is a new national data archiving and dissemination service which came into operation in January 2003. The service has initially secured funding for just under five years, and is a jointly-funded initiative sponsored by the Economic and Social Research Council (ESRC) and the Joint Information Systems Committee (JISC). The ESDS has been established as a distributed service, based on a collaboration between four key centres of expertise: the UKDA and the Institute for Social and Economic Research (ISER), both based at the University of Essex; and MIMAS and the Cathie Marsh Centre for Census and Survey Research (CCSR), both located at the University of Manchester.

    This specialist service for qualitative data, Qualidata, hosted by the UKDA will provide access to and support for a range of qualitative datasets. The work builds on Qualidata expertise and international reputation in this area, developed over the past eight years. A strategy of data enhancements, identified in consultation with key stakeholders will be developed including:

  • the creation of web-based samplers aiming to provide 'edited highlights' of key qualitative materials to illustrate the potential of the collection for research and teaching;
  • the creation of thematic resources whereby interviews relating to a particular theme and time period will be combined into a single resource, for example, crime and social order in the late- twentieth century;
  • data enhancement through value-added processing, ensuring that data are: fully anonymised; in an appropriate digital format; and have an enhanced finding aid, a dedicated user guide and associated webpages;
  • data enhancement via web delivery of marked-up primary data, such as interview transcripts, using XML standards and tools to facilitate rapid and flexible retrieval of information. The Edwardians Online Project represents the first phase of this plan of work;
  • enhanced access to key qualitative collections held elsewhere, in partnership with the hosting archive, to facilitate use in research and teaching. Non-digital collections will be evaluated for their suitability for value-added enhancement work and possible acquisition by the UKDA.

    In addition to providing enhanced access to user-friendly qualitative data, great emphasis is also being placed on user support and training. Training will focus on generalist introductory and more focused workshops on detailed areas of research interest and methodology; and will be supplemented by 'data confrontation' workshops aiming to enhance the methodological and substantive understanding, and secondary analytical potential, of archived qualitative data sources.

    Finally, the service will provide a programme of support for creators and depositors of qualitative data to ensure that high quality, well-documented and ethically conformant data are acquired and distributed.


  • HillFrom manuscripts to metadata: collaborating working in the Archives Hub
    This paper will describe the growth and the future development of the Archives Hub service, one part of the emerging UK National Archive Network. The service provides free public access to descriptions of over 10,000 archive collections held in more than 60 universities and colleges in the United Kingdom. The archives cover a broad range of subjects, relevant to many areas of research, and date from medieval times to the present day. Collaboration has been a key part of the service at every level, from its funding to its management and its day-to-day operation. The archival descriptions in the Hub are created using international standards by archivists and librarians in the institutions which hold the collections. Currently the descriptions are held centrally, but this paper will describe the progress that has been made on developing a distributed service that will allow institutions to maintain their metadata locally while continuing to allow access to it through the main Archives Hub website at http://www.archiveshub.ac.uk/.

    TeixeiraAdvances in Data Preservation: The Roper Center Archive Approach
    During the past several years the Roper Center Archives have undergone a significant shift in emphasis - refocusing effort and resources to upgrading data preservation and access capabilities. A multi-track approach was developed to achieve this objective concentrating on 1.) improving the cataloged record for each study, 2.) scanning paper documentation to PDF files, 3.) addressing concerns about data confidentiality and responsible use and 4.) converting older data format files (primarily column binary) to more standard formats. Based on this approach, a pilot project on the Roper/Fortune survey data series (75 studies, 1938-1949) was conducted in Spring 2000. The success of this project was encouraging and led to the development of a more aggressive plan to transform the archives. Much more work has been completed using this approach and while far from complete, the archives stand in a much better place today then in recent history. This paper examines the progress made through these efforts and the challenges encountered, then reports on the current state of the Roper Center Archives and provides a look ahead to future developments.

    Dekker"To be or not to be", that shouldn't be the question!
    Several societal, technological and scientific trends, like digitalisation and internationalisation of research, urge(d) social science data archives to adapt to their new environment. Most archives (especially the successful ones) have indeed changed their mission, strategy and organisation recently.

    In the Netherlands however we got trapped in our "polder model" (that is, find a consensus solution for every problem). Consequently, data services are scattered, are lacking coordination and are hardly able to adapt to the trends above. Moreover, being small and old-fashioned hinders to keep up with international developments.

    In stead of setting up a strategy for renewal of our social science data services we currently have a discussion whether data archiving should be continued: Steinmetz Archive will be evaluated on its viability because its holding organisation (NIWI) will probably stop to exist! This threat on the Steinmetz accelerates the discussion how to adapt the data services to meet the new trends and to take away the current disadvantages.

    In my presentation I will present some ideas for establishing new data services. Using this blueprint I would like to have a discussion about these ideas and how to bring back the Netherlands in the international data services arena.


    Understanding the Strength of Numbers: statistical literacy

    Timms-FerraraPublic Opinion Matters: A New Roper Center Program Designed to Promote Classroom Use of Public Opinion Data
    As survey data become more and more a part of our daily lives, there is increased value in training students of all levels and academic disciplines how to locate and use this information with confidence. This paper explores a new program developed at the Roper Center that provides an introduction to the various tools for locating and analyzing data and promotes the use of public opinion information in the classroom. Using a variety of sources, Public Opinion Matters offers different topical modules and presents options for the research process that appeal to both the novice and veteran researcher. Integrated into this process are the following:

  • The iPOLL database contains nearly 400,000 questions with topline results collected over the last 65+ years in the United States;
  • Links to more than 1,000 Public Perspective articles;
  • Some 14,000 datasets in the archives; links to the catalogue exist for studies rich in material on the highlighted subject matter;
  • Analytical Summaries are prepared for some topics;
  • Online Secondary Analysis Tool;
  • Links to other relevant sites.

    Public Opinion Matters brings most or all of these tools together and enables users to easily review a set of available data on a given topic. This discussion will include full definitions and examples of the metadata currently utilized at the Roper Center. Included in this presentation will be a demonstration of iPOLL, to acquaint users with the broad range of coverage available beyond the selection of materials included in Public Opinion Matters.


  • GrayData & Statistical Literacy for Librarians
    Librarians have long provided support for statistical information which they could treat as any other print material with the expectation that the reader would be responsible for how the information was used. Data archivists and data librarians often worked within institutes that included statisticians or survey methodologist that could provide expect advice for researchers undertaking statistical analysis of social or economic data files. As data products were introduced to the library, first on CD-ROM and more recently via online tools, statisticians, data archivists and data librarians were concerned about user support for these primary sources. At the same time, the use of statistical information and its popularity in the mass media has made statisticians aware of the need for statistical literacy for the general public. This paper proposes that there also exists the need for statistical literacy for librarians as well as an increased awareness of issues of how one evaluates data quality, because librarinas are becoming the new intermediaries to statistical resources. Recently the National Science Foundation funded a research project aimed at making U.S. government statistical available over the Internet more accessible and understandable by the general public. (See http://www.ils.unc.edu/govstat). One of the principal investigators has been quoted as saying .This project will help people without specialized training use the Internet to find, and understand, the statistical data they need.. At the same time, statistical science is moving into more fields with increased diversity and specialization. A beginning might be to have statistical scientists look at coping with data analysis and librarians examine the way they handle data and statistical inquiries.

    SchieldStatistical Literacy Survey: Reading Tables and Graphs
    An international survey on reading tables and graphs of rates and percentages been conducted by the W. M. Keck Statistical Literacy project. The 250+ respondents included statistical educators in the US and internationally, school teachers in the US and in South Africa, professionals at the US Bureau of the Census, college faculty in non-statistical areas, and college students. The primary focus was on the use of informal statistics: rates and percentages. The results of this survey are presented and analyzed.

    DDI, Nesstar, and ISO: pushing the boundaries

    GillmanIs Your Data Facility ISO Compliant?: Progress Towards Harmonizing the DDI and ISO 11179
    ISO 11179 is a formally ballotted international standard for the description and registration of data elements. It is being used as a basis for metadata standardization, management and publication to the web in a growing number of large data producing organizations, including Statistics Canada, the U.S. Bureau of the Census, and the U.S. Bureau of Labour Statistics. The DDI and ISO standards are complimentary. The ISO standard would strengthen the administrative side of social science data operations, as well as the ability of archives to capture, describe and manage the concepts and classifications underlying data collections. The DDI enables direct user access, thereby offering increased return on investment (ROI) for data described in ISO-compliant registries. The paper outlines progress towards harmonization resulting from the DAIS|nesstar project. It introduces a data model for the DDI, describes an extension to incorporate ISO 11179, and discusses the implications of ISO compliance for social science data facilities.

    BradleyDAIS|nesstar - An Update
    Health Canada and Nesstar Ltd. have been working to combine the DDMS/DAIS data access system with Nesstar's web-based client server and data publishing technology. Enabled by an extensive set of Canadian DDI-compliant metadata, the first version of DAIS|nesstar is now being used in Health Canada for disseminating core health, socioeconomic and public opinion polling data, along with associated tables, reports and indicators. The presentation provides an overview of the project, products, on-going development work, as well as the current service. Issues of providing a common service orientation across servers in different organizations are highlighted and discussed.

    International Statistics: strength through collaboration

    StukelThe UNESCO Institute for Statistics: how we decide what data to collect
    The UNESCO Institute for Statistics (UIS) was formed in 1999 to support all the statistical activities of UNESCO in its domains of competence: education, science and technology, culture and communications. It serves its 189 Member States by fulfilling the following functional main lines of action: the collection and dissemination of cross-nationally comparable data in its fields of competence; the methodological, technical and conceptual development of statistical systems and international classification systems; the analysis and interpretation of the international data; and statistical capacity building within Member States (particularly for developing countries). In order to fulfill our role as an international organization, it is paramount that the UIS have an effective strategy on what data it should collect and by what mechanism. This talk will discuss the processes the UIS undergoes in order to achieve these ends: the consultation efforts with countries in order to solicit their inputs and discuss possible content and data sources, as well as the workshops and meetings in order to launch new survey instruments and to encourage response to our questionnaires.

    BuffettServing up Statistics to an International Community
    One challenge in disseminating information of any form is determining who the target audience is, what they are looking for, and how best to deliver it. In trying to answer this question, it is important to be able to determine how to measure whether or not the dissemination activity is successful and is meeting the needs of both the organization and the user community.

    Is it most important for the technology to be implemented so that any country in the world regardless of their place on the technology adoption curve can reference the statistics in a timely manner?

    Is it most important to implement the latest technology to provide a seamless integration of online databases so that data from multiple international organizations can be accessed simultaneously?

    The tradeoffs are endless ... but so is the potential.


    ZhangComparability of cross-national data: How does the UNESCO Institute for Statistics approach the issue?
    Cross-national data provide the global/regional picture for advocacy, for resource mobilization and accountability of governments. Based on such data, countries can compare themselves against others to learn from one another, to benchmark important policy goals. There are two broad ways in which UIS uses cross-national data for comparative purposes. The first is to conduct secondary analysis of a variety of data sets, which include country-level data that the UIS collects annually, and student- and household-level microdata that are collected by other organizations. The second way in which the UIS uses data for cross-national comparisons is to develop a core set of indicators that can raise the profile of important issues. Comparisons among countries can be problematic when data are spurious and the reliability of data is susceptible. There are incentives for data distortion when too much emphasis is put on comparisons. We make every effort to ensure data comparability. This includes involving countries in interpreting data results and determining indicators. It also includes contextualising data findings.