Christopher Davis
The Internet provides a global infrastructure for data and information publishing that has the potential to revolutionize how data and information is accessed and used. A variety of methods exist for taking advantage of the capabilities of the Internet for this purpose, but two problems with data and information access have been exacerbated by an explosion of tools and resource servers. First, resources are distributed across many locations and may often be difficult to locate. Second, different tools and approaches each have individual advantages, leading to the use of different methods at different locations. CIESIN's information system approach is designed to solve both of these problems by providing a method for locating and integrating heterogeneous, distributed data and information resource servers. Internet based technologies, such as those utilized by CIESIN, could potentially transform the organization of how data and information are provided and provide users and providers alike with new services and capabilities.
Julie A. Fore
The World Wide Web is a very rich information source, but getting the full benefits from it can be difficult. The software used to access the Web, browsers, can be confusing to use and difficult to configure. The intent of this paper is to perform a review and evaluation of the most common World Wide Web browsers available for the DOS, Windows and Macintosh operating systems. A summary of the benefits and drawbacks of the browsers will be given. Also included will be recommendations for the minimum and the optimum hardware configurations needed to run each browser and suggestions for companion software needed to maximize the usefulness of each browser.
Juri Stratford
At the UC Davis Government Documents Department, we have integrated a number of different strategies for providing access to depository datafiles. These include providing public use microcomputers, extracting data from the CDs, loaning CDs, and providing access both to CDs and extracted data via the WWW and anonymous FTP.
Patrick T. Collins
For six years, the National Center on Child Abuse and Neglect (NCCAN) has funded the National Data Archive on Child Abuse and Neglect (NDACAN), whose primary mission has been to acquire, process, preserve, and disseminate high quality datasets relevant to the study of child maltreatment. Unfortunately, early efforts by NDACAN to acquire data were hampered by uncooperative investigators, poor documentation, and a lack of resources to reimburse investigators for costs associated with preparing data and documentation. From the outset, NDACAN advocated for mandatory data archiving for NCCAN research grantees. In addition, the Archive published a manual which set forth standards for the preparation and dissemination of research datasets and their associated documentation. In 1992, NCCAN began requiring their research grantees to prepare their datasets and documentation according to these standards. The Archive's lobbying efforts came to fruition when, in their 1994 RFP, NCCAN set forth the requirement that applicants include in their proposal plans to archive their data with NDACAN. For the first time, NDACAN will work with investigators from the beginning of their projects to ensure that data and documentation are prepared properly. While it will be several years before the first of these grantees are required to submit data, the Archive will provide technical support throughout their grant. Grantees will be provided a clear set of deliverables which must prepared according to the standards set forth in the technical standards manual. All investigators will have a two year "grace period" after the termination of the grant which will allow them to publish the results of their study before the data are made available to the public.
In this paper, I will discuss the process that led to NCCAN's policy change, the details of how the policy was implemented, and how it has changed the Archive's relationships with investigators. I will contrast NCCAN's archiving policy with that of other federal agencies that require archiving of research data (e.g., NIJ, NSF). Finally, I will discuss the deliverables that NCCAN research grantees will be expected to provide to the Archive (e.g., codebook, data files) and the standards that we have established for these materials.
Karsten Rasmussen
A report from an enquete. Data professionals have given their views to questions on a future codebook format ("should it be SGML?", "should it be supported by vendors?", etc.). On the other hand archives have described their holdings with regard to levels of machine-readable documentation. The paper presents the key figures from the questionnaires sent out by "the IASSIST codebook action goup". Maybe we are moving in a direction demanding less knowledge about special formats from our users?
Luc Albert
This presentation will describe the challenges involved in producing census microdata files. First it will review the main achievements and difficulties of the 1991 Public Use Microdata Files Program (PUMF). Secondly, it will propose a series of objectives to improve the timeliness and foster a wider utilization of the proposed 1996 Census PUMFs. This presentation will provide the data producer's perspective. It will discuss how the program is funded and the content of the files is determined; present challenges encountered in ensuring the confidentiality of the files and propose a dissemination strategy for the 1996 PUMF files.
Monica Boyd
This presentation focuses on alternative conceptualizations of census public use data files in which databases are viewed as specialized products or as mass market items. Implications for consumer access, use, and loyalty are discussed along with implications for research training and public policy inputs. Where relevant, comparisons are made between Canadian and United States approaches to the marketing and accessibility of census public use micro data files.
Angela Dale
The decision of the UK Census Offices to allow the release of samples of anonymised records from the 1991 Census was an important landmark in data dissemination in the UK. The decision was informed by academic research coordinated by a sub-group of a Working Party on the 1991 Census, set up by the Economic and Social Research Council (ESRC) and chaired by the late Cathie Marsh. Two non-overlapping samples were released, a 2% sample of individuals and a 1% sample of households and all individuals in the households.
ESRC purchased the SARs with sole distribution rights for the first five years and set up an End User Licence Agreement. The Census Microdata Unit at Manchester University was funded by the ESRC to take responsibility for the licensing and distribution of the data. The SARs are available without charge to UK academics whose work is funded by the Higher Education Funding Council. Unlike other data distribution agreements which are between the individual user and the supplier (usually the ESRC Data Archive), that for the SARs placed legal responsibility for the data with the University.
The ESRC allow the SARs to be sold to non-academic users in the UK. These may be commercial organisations, central government, local authorities, or academics doing work funded by an external agency. Although ESRC's purchase agreement with OPCS gave world-wide distribution rights the ESRC are not, at the moment, prepared to allow the data to be sold overseas. The reasons for this include concerns about the ability to enforce sanctions over misuse of the data and concerns that the data may be passed to unregistered users. This conflicts with ESRC's wish to maximise the academic value of the data, particularly in international comparative studies, and also to maximise the returns from sales of the data. Discussions are currently underway in an attempt to resolve this problem.
There is little doubt that the Census Offices will be willing to draw similar samples from the 2001 Census. However, many users will want an increased sample size, and the addition of further variables to the samples. There will also be a need to consider whether the existing licensing arrangements are the most appropriate and, if not, how they should be changed. Finally, arrangements for purchasing the data and for deciding what costs should be passed on to users will also be the subject of debate in the coming years.
Donald Morse
The Sample of Annonymised Records produced by the UK Census Offices for the 1991 Census undoubtedly represents a major step forward in the flexibility of census data available to researchers. Use of the SARs allows, for example, the construction of alternative household classifications and multiple deprivation indices for non-tabular analysis. But is the quality or utility of these new data any better than that of the usual tabular output? As elsewhere, the use of individual and household level data throws up problems - of sampling error, for instance; and it is arguable that the coarse areal grain made available makes spatial analysis less useful than that possible with the Small Area Statistics. What users want, of course, is a larger sample and a finer geographical grain. But the hurdle that must be jumped, that of ensuring confidentiality, is probably too high. It is up to users to find the solution.
Paul F. Bergen
This presentation will focus on interactive dissemination and collection of numeric and geo-spatial data on the World Wide Web (WWW).
Using forms-compliant HTML interfaces developed by the author at the University of Virginia and Harvard University, this presentation will discuss the potential of the WWW for data librarians and others who work directly to support research and instruction in university environments.
Demonstrations will include an interactive on-line atlas which uses TIGER/Line data from the U.S. Census Bureau and serves as a WWW to ArcInfo gateway. Interfaces for interactive analysis of numeric data from the 1990 Public Use Micro Data Files and the Regional Economic Information System (both from the U.S. Census Bureau) will also be shown. Methods for data collection on the WWW will be discussed using an on-line survey being programmed for the Harvard University Department of Psychology.
Louise Corti
In 1991 Professor Paul Thompson, based at the University of Essex, conducted a survey of social scientists who had received ESRC funding for research projects which generated qualitative data. The purpose of the survey was to ascertain the level of support for implementing an archive policy in respect of such material and to assess the quantity of data potentially available. The results of the survey showed that a vast quantity of reusable research material, comprising principally in-depth interviews, diaries, field notes and open-ended questionnaires, had been retained by researchers who were principally in favour of an archival initiative.
Following the survey report, ESRC has now funded a Resource Centre, QUALIDATA, at the University of Essex. The aims of the Centre are to locate, assess and arrange for the deposit of qualitative materials in suitable archive repositories in Britain. A key function of the Centre will be to maintain an information database about the extent and availability of qualitative research material in general. This will provide a major new resource in the field to complement the existing ESRC Data Archive for machine-readable data sets also based at Essex. In addition the Centre will promote and encourage the secondary use of the data it processes, monitor its use by researchers and develop guidelines for researchers on preparing their data for archival deposit.
The presentation will describe the initial six months work in establishing the Centre, the projects which have been assessed, the possibilities for re-use of qualitative data, and the concerns that researchers have expressed around the issues of confidentiality and copyright.
Ann Gerken Green
The United Nations Scholars' Workstation, a networked information resource for students, researchers, and faculty at Yale University, aims to provide customized access to information about the United Nations as well as the tools necessary to perform numeric data analysis. The Workstation is a virtual repository of electronically retrievable bibliographic citations, digitized texts, images, data sets, maps, and pointers to other sources of information currently available on the Internet.
This prototype workstation is being developed by the University Library and the Social Science Statistical Laboratory. It will directly support the curriculum in United Nations studies as well as a Ford Foundation project called "The United Nations in its Second Half-Century," with headquarters at Yale and international participation.
The funding for the UN Scholars' Workstation development includes:
1) information and finding guides to local library resources and Internet
resources
2) the production of special tools (maps, charts, etc.)
3) improved access to international statistical resources
This IASSIST presentation will focus upon the work done and that forthcoming in the area of improving access to statistical information. The following activities will be discussed and reviewed:
1) acquisition of ICPSR, IMF, World Bank, and other data files
2) transfer of data from mainframe storage to the local area network
3) production of descriptive information: abstracts, codebooks, guides on a
gopher (including comments about the lack of standardized formats and
machine-readable codebooks)
4) conversion of data to standard formats (e.g. SPSS, SAS, dbank)
5) conversion of the gopher to html and integration into the workstation
homepage
6) work on a data extractor using the homepage as a front end
John Blodgett
The Consortium for International Earth Sciences Information Network (CIESIN) in conjunction with the Urban Information Center at the University of Missouri St. Louis has established a public archive of United States census data. The data are available via the Internet using FTP and/or a WWW browser. Currently the archive contains map boundary files for all common geographic units used in the census (including census blocks, block groups, tracts, counties, etc.) in a standard ascii (BNA) format. Data extracted from the 1990 decennial census Summary Tape File 3 (STF3) is provided in a format that makes it easy to link with the boundary files to create thematic maps with widely available GIS software such as Atlas*GIS, ArcView and Mapinfo. These data files are organized by state and geographic level. Using the archive, researchers should be able to readily retrieve files that would allow them to analyze and/or map data for areas as small as blocks or block groups for anywhere in the U.S.. This paper discusses the content of the archive as well as describing some of the details of how and why it was constructed.
Renata G. Coates
The Lijphart Elections Archive, housed at the University of California, San Diego campus, is a research collection of district-level election results for twenty-seven countries. Until 1994, the collection focused on post-World War II democracies in Western Europe, but also included the United States, Canada, India, Israel, Japan, Australia and New Zealand. Recently, Costa Rica and the European Union were added. Future plans call for the expansion of the Archive to more than 70 countries--including many new democracies from Central and Eastern Europe, Latin America and Africa.
Until 1994, the preferred format for building this library collection had been the original hard copy. During the last year, prompted by requests from graduate students, efforts began to be made to acquire data in machine-readable form. There is now a concerted effort to bring this collection to the Internet. A web homepage was created in March of 1995.
Plans are underway to make the Lijphart Elections Archive a world-wide resources for scholars studying democratic elections. Currently, the web site provides election data through a variety of menus. Linkages to other institutions are also being developed. With the use of on-line utilities, facilitating multi-country research is one of its ultimate goals.
We hope the international academic community will find our efforts meritorious, and join with us in the future development of this very worthwhile resource.
Jackie Shieh
The focus of cataloguing the Inter-university Consortium for Political and Social Research (ICPSR) online codebooks is to provide users in a timely fashion adequate bibliographic information on VIRGO, the University of Virginia Library's computerized library system. Many cataloguers today are cataloguing materials that cannot be held in hand. Gathering bibliographic information for electronic formats can be a bewildering and monstrous experience. The author shares her experience on how the fear of working with computer files was reduced to a minimum with the help of the computer support department, and the sense of triumph and accomplishment she felt when patrons successfully retrieved what they needed through the online catalog!
Vince Grey
After a discussion of the Network Data Library System, we will look at who uses the system to retrieve data. The need for straightforward, easily understandable documentation is clear. Codebooks produced for CATI surveys are often jumbled and unclear, particularly to people who have neither the time nor the inclination to delve deeply into them (e.g., undergraduates).
Sharon Neary
This paper examines recent critical tensions shaping the use and expectations of users of Canadian data files for academic teaching and research. Using the University of Calgary as an example, factors treated include demographic and skill diversification among the traditional quantitative research elite and the shifting emphasis from research to teaching in setting the contemporary political agenda for the Academy. An amiable, integrated interface to both institutional and virtual data collections is essential to meet the expectations of users for both enhanced access and ease of use.
Doug Norris
In recent years Statistics Canada has produced and released 15-20 public use micrdata files each year. These files are a rich source of data for research and teaching purposes. It is argued that in the past many files have not been widely utilized. This paper discusses a number of issues data producers face in making microdata files available.Possible reasons for the underutilization are discussed. Finally suggestions are made about ways that data producers and data librarians might work together to increase the use made of microdata files for research and teaching purposes.
Mary Anne Webber
SLID is one of several new longitudinal surveys being developed by Statistics Canada. The survey raises many interesting questions for data dissemination. Collected using computer-assisted interviewing, there is no paper questionnaire to serve as a starting point for user documentation. There are challenges with respect to confidentiality. As large and complex data sets, longitudinal surveys often require a significant investment on the part of users to become acquainted with data structures and analytical potential. The analytical techniques can also be complex and not as widely known as those used with cross-sectional data. These issues will be discussed, along with some of the directions being taken or examined for SLID.
Yvette Hackett
Over the last 5 to 10 years, computers have rapidly infiltrated all aspects of records creation. From the introduction of automation as a tool to assist in the generation of paper copies, the technology is now rapidly displacing paper. Along with the information explosion which has characterized much of the second half of the 20th century, these changes have required an almost on-going review and refinement of the principles and practices governing the entire spectrum of activities performed by archives.
To acquire, preserve and provide access to types of information that Archives have traditionally held requires an extensive electronic capability which takes into account the functionality of the many hardware and software products which have appeared in the marketplace.
While the National Archives continues to monitor all aspects of electronic records creation, record-keeping and dissemination, this paper will address three areas of particular interest to the Government Archives Division:
- the identification of a physical format to be used for long-term preservation of electronic holdings;
- the identification of a logical storage format which could somehow replace the multitude of proprietary software packages currently in use without losing significant aspects of the record;
- the interpretation and application of archival concepts such as original order and arrangement to the electronic products of office information systems, and its integration into descriptive standards and practices.
Ken Hannigan
The National Archives of Ireland was established by an Act of 1986 which formally amalgamated two existing repositories, the Public Record Office of Ireland and the State Paper Office, which had existed respectively for over one hundred and two hundred and fifty years. The Act also provided for the preservation of records of government departments and agencies and for their transfer to the National Archives under a 30 year rule of access. The fact that, increasingly, such records are being generated and stored in digital form has major implications for how a "traditional" archives, like the Irish National Archives, interfaces with the agencies for whose records it is ultimately responsible.
Ireland's geo-political situation as a small country on the periphery of Europe, but part of the European Union, has had a profound influence on archival developments there. In recent years moves at European level towards greater political integration and harmonisation over a wide range of areas, including communications, heritage management, freedom of information and data protection, have begun to shape archival policy, while developments in information technology and data exchange are beginning to bring Irish archivists into closer contact with their peers abroad, beyond the narrow forum within which they formerly exchanged ideas. However, the archival profession, both in Ireland and more widely in Europe, generally attracting its recruits from a liberal arts background, has found itself ill-equipped to adapt to these changes. As yet the great debates which have been taking place in North America on the custody and management of digital data have had only faint echoes in Europe. While most archives have begun applying the new technologies to their traditional archival tasks, and some have begun to provide on-line access to their services, few have yet come to terms with the problem of what to do about the records generated by these technologies.
This paper will present a perspective from one who, as a "traditional" archivist, is attempting to survive a personal odyssey in search of the information highway.
Hilary Beedham
This paper will consider the variety of types of documentation which can be associated with a single dataset and will examine the issues surrounding the provision of a coherent set of documentation when it comprises differing qualities of paper, ascii files and software dependant metadata. These issues will be discussed using the UK Family Expenditure Survey as a case study. Over the 33 years of the survey, the content of the documentation has remained remarkably stable but the way in which it has been presented has changed considerably. I will discuss how technological change has affected the provision of the documentation to users and will then look to the future and the ideal of a complete set of machine-readable documentation.
Patrick Curran
The INCORE server was set up by the United Nations University and the University of Ulster to act as a central resource for academics, policy-makers and others concerned with conflict resolution and ethnicity across the globe. The advantage of placing information on the Internet is that it is available to anyone anywhere in the globe who are connected. The Internet is a very flexible and open resource but it is also a way for other people to get into your system. In order to take advantage of the Internet you must be a part of it. However, this article demonstrates that by doing this you put your computer at risk, so you need to protect it.
Every day computer networks and hosts are being broken into with varying levels of sophistication. While it's generally believed that most break-ins succeed due to weak passwords, there are advanced and sophisticated techniques that are more difficult to detect. The article looks firstly at the shortcomings of the password mechanism (concentrating on the Unix system) and then discusses the more sophisticated techniques available to intruders, including some examples of the misuse of commonly used Internet resources, e.g., ftp, sendmail, telnet, rlogin, rsh, etc. It also analyses the range of responses to network intrusion techniques, from software policing solutions like Kerebos and COPS to the hardware solution of firewall installation.
While the article raises the reader's awareness of the importance of security in a network situation, one shouldn't get paranoid. It's worth remembering that the nature of Unix and the Internet helped to defeat the Internet worm as well as spread it. The sensible approach is to secure a system according to its needs, keeping danger at a manageable level. In other words, don't stop travelling but wear a seat belt.
Irene Fournier Mearelli and Annick Kieffer
There is a specific situation in France. There is no survey data produced by the academic community. There is a quasi-monopoly by INSEE and organisations in charge of carrying out administrative surveys (Ministry of Education, Statistical Office of the Ministry of Labour, CEREQ, INED). The problem has thus arisen of access by the academic community to this data (acquisition,archives, access). Conventions were signed, (after 20 years of careful reflection...), between these institutions and the CNRS. LASMAS was to acquire this data and make it available to CNRS researchers. Little by little, LASMAS has become a clearinghouse as well as a resource centre on surveys of interest to social scientists. It continues to develop its activities in that direction, particularly by investing in longitudinal datasets and their analyses. LASMAS-IDL is not a simple centre for archives and distribution. It has become a research laboratory analyzing the surveys within the framework of its own research programs. It has thus become a centre where competence is accumulated not only on technical aspects of surveys, but also on scientific and substantive issues addressed by these surveys.
Judith Bradford, Mark Bunster, Mary Ellen Rives and Phyllis C. Self
While we may initially wonder if researchers face a deficiency of HIV/AIDS data, we can also choose to look more closely at what we include in our definition of data. To do this, we must also address issues of data quality, access and integration, for these are the areas that enable researchers to make good use of the data that ARE available.
Initially, we must expand what may be conventionally considered "research data," to include many types of information being collected by social scientists. Attitude and knowledge surveys, program evaluations, and client services data all contribute to the AIDS resources available, with as much relevance and value as the more traditional clinical trial, epidemiological, and patient care data resources. Further, it is not enough to simply collect these data; a conscientious effort must be made to ensure accuracy, appropriateness and timeliness, so that conclusions and policy decisions are not swayed by numbers that are incorrect, irrelevant or out of date.
Even reliable, valid and timely data are of little use unless they are efficiently integrated with the purpose of applying this new knowledge towards the control and treatment of AIDS. A collection of agencies, grant projects, and committed individuals from Virginia have worked to bring data providers and users from all areas together to review, discuss and use available data. The creation of the Medical College of Virginia/Virginia Commonwealth University's (MCV/VCU) HIV/AIDS Center has enabled this process . Our presentation will focus on the different AIDS data resources, and talk about the HIV/AIDS Center as a model for applying integrated data towards efforts against AIDS.
The purpose of the Center is to coordinate all HIV/AIDS efforts within the University and Hospital systems, and to encourage increased collaboration between the University and the community at large. It was thus designed to be a conduit for information, pulling resources from a variety of efforts, and applying the resources to community needs.
Two areas affiliated with the Center have been integral in its development by providing a strong foundation grounded in HIV/AIDS data. Each area affiliated with the Center supplies a different type of research information, filling data gaps that might otherwise appear if gathered from fewer sources. VCU's Survey Research Laboratory (SRL) specializes in the primary collection of HIV/AIDS attitudes and knowledge data, as well as coordination of surveillance and training data.The University Library Services (ULS) provide data storage and data resource information, for easy access to a wide range of data. The ULS is also instrumental in improving data access. In addition. the MCV School of Medicine and MCV Hospital are potentially valuable resources for patient-level care information, both inpatient and outpatient. Taken together, each of these resource agencies provide a key link in the application of HIV/AIDS research to patient care and community outreach, preventing a serious "data deficiency" when it comes to policy and program development for Virginia's HIV/AIDS population.
Dale S. Sherman
Research strategies used to investigate the Human Immunodeficiency Virus (HIV) and Acquired Deficiency Syndrome (AIDS) are challenged by a broad array of influences and design considerations. The variability of disease course, social dimension(s), and ethical issues related to HIV require particular sensitivity to methodological precision. A study`s conceptual foundation can, for instance, dictate the research question examined and ultimately distort the prevalence of the illness. Obtaining a representative sample of the population also presents several methodological challenges. Although the scope of individuals infected encompass persons of all ages, cultural groups, and lifestyles, many of these are from special populations difficult to reach. Moreover, once recruited, some respondents are hesitant to reveal their sexual activity and under-report risk behaviours while others exaggerate their activities and over-report sexual practices. Does the data collected accurately reflect our design questions or portray a deficiency in HIV-related research? Methodological precision should be addressed in experimental formulation and extend through the theoretical foundation, experimental design, measuring instruments, and sampling techniques used. Improving accessibility to data would enhance collection efforts and clarify the results obtained from other studies.
Jon Stiles, Fredric Gey, and Ilona Einowski
With the release of the 1990 Census TIGER files there has been excitement among data librarians about the possibilities of visual displays of social science data within a geographic information system framework. While UC DATA has mounted most of the U.S. 1990 census data (including the entire TIGER digitized boundary and feature files)on a CD-ROM information system available via the internet, developing a mapping capability has proved to be a slow process of hand sewing-together of threads of data, map boundary files, and commercial mapping software. In this paper we describe what we have accomplished (see the attached map) and what will need to be done to make such a process seamless to the data client (our user community).
Hendrik Meij and Robert Chen
Geo-referenced population data provide a critical link between data on the natural environment and data on human behaviour and welfare. Past and potential policy-oriented uses of such data include natural resource management, famine early warning and vulnerability assessment, damage assessment and associated disaster response, public health and medical service applications, urban and regional planning, and estimation of pollution emissions and land use change. Population dynamics and distribution have been consistently identified as key elements in understanding human interactions with the environment and in considering possible responses to environmental change.
CIESIN, in collaboration with several different organizations, has been supporting the development and dissemination of a number of different geo-referenced population datasets which will be described. Ongoing activities include efforts to link land cover characterization data, derived from satellite based sensors, to demographic data. These integrated datasets will be made available through interactive services over the Internet.
Donald J. Morse
Edinburgh University Data Library (EUDL) is the home of UKBORDERS, an ESRC-funded research project which has grown into a UK national on-line service for the extraction of digitised boundary data (DBD). UKBORDERS has been designed to incorporate links to a wide range of geo-referenced data, such as Digital Map Data (DMD) and the population census small area stistics (SAS), but the principal data set to date and the one which UKBORDERS was initially funded to house and give access to, is the 1991 population census DBD (purchased on behalf of the UK Higher Education Community by the ESRC/JISC). This data was supplied to EUDL for all 132,080 postcode units in Scotland and all 113,196 England & Wales enumeration districts. The paper describes and discusses the role, trials and tribulations of, and implications for EUDL in designing the UKBORDERS facility from scratch; in building the facility and constructing the datasets (including the census, administrative and postal higher-order aggregates); and in implementing the service over JANET. The paper then gives an example of a project in which EUDL is involved which uses the 1991 DBD and other, digital mapping, population census and postcode-referenced data generated from a client database.
David Barber and Jan Zauha
When a library successfully provides access to numeric social science data, scientists, science librarians, and others are likely to ask whether scientific data could not be made available in a similar manner. This invitation to extend data services to scientific data needs to be carefully considered since scientific data presents many unique challenges. This paper will outline the similarities and differences between the management of social science data and of scientific data. It will also describe what must be done to provide access to both forms of data.
Kenneth Thibodeau
Over the past fifteen years in the United States, there have been several collaborative studies of the archival value of scientific records. These projects shared a common concern with the record of research and development activities as a resource for historical research.
Since 1990, National Archives and Records Administration (NARA) of the United States has sponsored two major studies to obtain the advice of subject matter experts on the retention of data. The first, undertaken by the National Academy of Public Administration (NAPA), focused on major federal databases used in support of mission activities. Scientific data were the focus of the second study. This project, inaugurated in 1992, was undertaken by the National Academy of Sciences/National Research Council (NRC).
The NRC study was completed in March 1995. It recommends "A new strategy for archiving the Nation's scientific information resources." Under this strategy, scientific data from all observational programs sponsored by the U.S. Government would be preserved; data from experimental research and engineering projects would be preserved if the cost, difficulty or improbability of repeating the work would make it unlikely that the same or comparable data would ever be available from another sources; and the scientific community would assume the responsibility for data management, preservation, and access throughout the life-cycle of the data.
Bob Kochtubajda, Chuck Humphrey and Mark Johnson
A valuable meteorological data archive collected by the Alberta Research Council over the course of the Hail Studies Project (1956-1985), in central Alberta is in jeopardy of becoming unusable as the digital data stored on magnetic tape degrade over time, and expertise in the data collection, calibration, and interpretation becomes scarce. The overall goal of this project was to preserve the digital radar, aircraft, upper air and surface precipitation data along with supporting calibrations and documentation; to transfer this archive to the University of Alberta; and to make this archive available to the scientific community. There were three distinct operations carried out to ensure the long-term preservation of the archive; retrieval of the digital data and all supporting (secondary) data sources; the transfer of digital data from magnetic tape to compact disk; and the collection and preparation of relevant documentation describing the data. The archive will provide researchers with a documented dataset to support further research in radar meteorology, climate change, hydrology, cloud physics, mesoscale meteorology and severe weather phenomena.
Jim Henderson
During February, 1995, I distributed a brief survey, "WWW: What do Researchers Want?", to approximately thirty history and other listserves. Others were approached, but not all allow non-subscribers to use their lists. A single "reminder" e-mailing was sent in March. Each distribution generated just over 20 responses, for a total of 46. All responses were receive electronically.
In brief, researchers want clear guides to collections, supplementary information about the institution and its mission, and access information: rules for copying; mail, phone, e-mail information. They are far less interested in "cute" sample images (the olde map or photo) or sample text of selected collections. Parochial items such as organizational structure or exhibits and upcoming events are clearly the lowest priorities among those listed.
Researchers most highly value "subject oriented keywords pointing to related collections," and "detailed descriptions" and "finding aids" for major collections. Next, they want to know the ways and means of access: rules about the cost and availability of copies, both traditional (mail, phone) and e-mail contact information.
After the basics, and to get a view of the institution's possibilities, researchers want 1) listings of collections by genre, 2) lists of guides, pamphlets, and other publications, supported by 3) reference room hours and procedures, and 4) a general description of the institution's holdings and mission. Following closely are interactive needs: the ability to leave messages for the staff and to find out "What's new?" While given "some importance," image databases of photos and maps were deemed slightly less useful than the proposed textual databases, which also were not highly sought after. Selected sample items, by both typical content and format, were viewed unfavourably by one-third of the respondents. Internal and local items characterized by "organizational structure" and "upcoming events" received rather negative reviews. Current research listserve members find little interest in genealogical holdings, but this may say more about the respondents and the current availability of technology than about the potential broad interest in this information.
Bridget Winstanley
BIRON (Bibliographic Information Retrieval ONline) is the ESRC Data Archive's online catalogue and index. It has been available to external users since 1986, the latest version of the interface having been implemented in January 1994. Since this latter date users have been asked to provide email addresses on logging in and these addresses, numbering approximately 4000, were used to circulate a questionnaire designed to elicit users' views on the functionality and interface of BIRON. Their replies will be used to design the next version of BIRON. The paper will discuss the responses received, as well as the means by which the replies were delivered (WWW, email, post) and comment on users' expectations of the service offered by BIRON.
Ann S. Gray
Librarians have always been concerned with their image, status, and role in society. The technological advances of the past decade have also made them concerned with their future. Data archivists should have similar concerns. Caught between researchers whose appetite appears to be focused on instant gratification and technologists whose gratification threatens to instantly consume all available resources, the librarian risks further marginalization and a diminished capacity to obtain the resources needed to fund a data library. In essence, not only is the profession under siege, the institutions are also. This presentation examines the cultural, economic, and technological changes that could undermine the institutions and professions that foster data access and dissemination.
"Armed only with a hammer, everything looks like a nail." Most discussions which address the future of librarians see technological change as the most important revolution taking place in data access and distribution. While there are valid reasons to believe that electronic publishing and the proliferation of end-user systems will render central services unnecessary encumbrances to social science research, cultural and economic factors may present a more real danger.
Laura A. Guy
Technological change has had a tremendous impact on how we do our jobs. It
not only has affected how we organize and provide access to information, but
how our users conduct their research. This change has created new challenges
for our profession, not the least of which is wondering if it will make us
obsolete by replacing us with knowledge-based systems. The nature of these
changes is discussed and we fantasize a bit about the
Alison McCleery, Heather Ewington, Peter Burnhill and Emma Forster
This paper documents progress on setting up an experimental Scottish Household Migration Monitor. Hosted and managed by Edinburgh University Data Library, such a Monitor will be available to the academic community, and public and private sector housing and planning agencies.
An innovative feature of the Monitor is the inclusion of information on migration motivation as well as the patterns themselves. Better informed decision-making is feasible as a result of documenting and understanding the decision to migrate and how it varies in space and time. For example, urban and regional planning authorities would be able to estimate housing demand and future housing land requirements more accurately.
This paper traces the path of a learning curve which operated from the inception of what came to be known as the Migration and Housing Choice Survey, Scotland through to the processing, analysis and interpretation of the data it generated.
Vyacheslav Shipilov
In the situation of radical political and economical changes most of East European Archives on social science data seek the way out in establishment of East European Data Bank. The formation of new Community includes inter- correlation of several processes - more integration in the Western Data Community and multilateral agreements between East European Data Archives. This of course presupposes the following: a common system of preferences, financial support, technological changes in computing, free flow of data and information exchange, training and educational programs, newest communication networks. The proposal paper seeks to improve the organization, cooperation, technology, accessibility and future role of East European Data Bank. The Data Archives in the new independent states of the Eastern Europe are trying now to overcome a lot of difficulties in their way to the 'Europe XXI'.