Introduction
Data entry is a major part of the North American Volcanic and Igneous Rock Database (NAVDAT).
Data are populated from a simple standardized tab delimited file format that can be created using a spreadsheet.
The NAVDAT server is able to read these data files and populate the NAVDAT database.
One advantage to this approach is that populated data can be deleted and replaced with a new data file if changes are made.
In addition to adding new samples to a data file, typos, misspellings, and other mistakes are easily corrected.
The data files that are currently populated in the NAVDAT database are located here.
Creating a data file is simple.
Data from published papers are either manually entered into a spreadsheet or scanned using OCR (Optical Character Recognition) software.
Next, the data file is formatted into major sections that contain reference, geochemical, age, location, and method data.
The spreadsheet is saved in a tab-delimited format and then uploaded to the database via a web-based form. The original data file is stored on the server.
After the file is uploaded, the NAVDAT server is able to parse the data file and place the submitted data into the correct tables within the relational NAVDAT database.
Once the NAVDAT server is finished parsing the data file, it is available for the public to search for and download.
The NAVDAT search interface queries other databases such as the USGS DDS-14 geochronology database,
The New Mexico geochronology database, and PETROS.
These databases contain thousands of samples in a well-documented data structure, and are integrated for this reason, but still
remain in their original format.
To learn more about these databases, click here.
If you have or know of a large database that would benefit the community by being integrated into the NAVDAT search interface
contact Doug Walker.
Data File Format Description
Introduction:
There were three factors taken into consideration when the data file format was created. First, the data entry had to
be simple to do and easy to understand by someone using it for the first time. Second, the data format had to be dynamic enough to handle new types of data.
Finally, the data format had to be standardized so current and future data files could be parsed by the same program with a need for modifications by a programmer.
Saving and Naming data files for submission:
The data files are tab delimited text files with the naming convention "GeoRef number_Last name of the first author.txt".
An example of this naming convention would be "2000-073550_Manley.txt". The tab-delimited file should not have quote marks around any of the data.
Data File Examples:
Examples of actual submitted data files can be found HERE.
Major Headings:
The data file is divided into sections where different parts of the data or recorded. There is a set vocabulary for the sections headings, and
there is a list of these heading names below. Heading names indicate where one type of data starts and another ends. The parsing program looks for these Heading names
and directs the data under that heading to the correct table in the database. If a heading name is misspelled or missing, it will cause problems when parsing the data
and usually results in the data no entering the database. Contact Jason Ash, if there are problems with populating data.
Most data population problems are a result of typos, misspelling, or omission of heading names.
Major Heading Vocabulary
(Click on the heading name below to learn more)
GEOREF DATA (REQUIRED ):
The GEOREF DATA heading contains the reference information for that data file.
All files are required to have the GEOREF DATA section even if the data came from a thesis that was never published.
To learn more on how to submit unpublished data, click here.
GEOREF EXAMPLE
(View Original Data File 2000-073550_Manley.txt)
GEOREF DATA | 2000-073550 |
TI | Timing of volcanism in the Sierra Nevada of California; evidence for Pliocene delamination of the batholithic root? |
AU | Manley-Curtis-R; Glazner-Allen-F; Farmer-G-Lang |
AF | University of North Carolina at Chapel Hill, Department of Geological Sciences, Chapel Hill, NC, United States |
SO | Geology(Boulder). 28; 9, Pages 811-814. 2000. |
PB | Geological Society of America (GSA). Boulder, CO, United States. 2000. |
CP | United-States |
PY | 2000 |
LA | English |
AB | Recent seismic experiments across the southern Sierra Nevada, California, show that the range lacks a thick crustal root. Xenolith studies indicate that delamination and loss of much of the lower crust may have occurred between 10 and 3 Ma. We estimate that delamination occurred ca. 3.5 Ma on the basis of a sudden pulse of mafic potassic magmatism within and just east of the Sierra Nevada from 4 to 3 Ma.... |
DE | absolute-age; Ar-Ar; batholiths-; California-; Cenozoic-; continental-crust; crust-; dates-; delamination-; igneous-rocks... |
CC | 05A-Igneous-and-metamorphic-petrology; 03-Geochronology |
DT | Serial |
BL | Analytic |
NN | With GSA Data Repository Item 200085. |
IL | Refs |
RF | GeoRef, Copyright 2002, American Geological Institute. Reference includes data supplied by the Geological Society of America, Boulder, CO, United States |
IS | 0091-7613 |
TABLE HEADING(OPTIONAL ):
The TABLE HEADING section contains the name of the table or
figure within a document where the data was reported.
Table heading data is usually found just below the GEOREF DATA section,
examples of actual submitted data files can be found HERE. Even though
this section is not required it aids others when looking up data with in a document for correction or reference.
SAMPLE DATA(REQUIRED ):
This section contains the majority of the data in the file. This section has a strict format that must be followed, but data
is easily added to this section as long a few guidelines are met. The types or data available for a sample vary, but there is some data that is required
in this section of the data cannot be added to the database. A list of required data can be seen in the table below.
Table of require data for the "SAMPLE DATA" section
SAMPLE NAME |
Name of the sample reported in the document, this field should be in the first column. If more that one sample name exist, (this sometimes
occurs in cases where the field geologist used on set of names and the lab uses another set of names), use
the heading ALT_SAMPLE_NAME for the additional sample names. |
STATE |
This field is required to be the last column. The state, province, or territory, should be recorded in this category. This is a field used for
quality control, as well as meta-data searches. The reason behind adding this column is the make sure that location of samples have been check, if a
sample is later reported in one state, but the latitude and longitude show the sample in another state, then the data are rechecked and corrected. |
LatDD and LongDD |
Sample locations must be reported in decimal degrees for both the latitude and longitude. If the location data exists in other formats such as
township and range, or a description of the location (i.e., 400 ft north from I-70 mile marker 257) can also be reported under different headings.
If you are having problems with converting locations reported in township and range or other formats, please contact
Doug Walker or Jason Ash for help. This section is required, But
if a data file is ready except for location conversion, please submit the data anyways and the locations will be converted. |
LOC_PREC |
Location precision is used is searches and mapping of the data to show how accurate the sample location was reported.
|
AGE or MAX_AGE and MIN_AGE or GEOLOGICAL_AGE |
AGE or MAX_AGE and MIN_AGE or GEOLOGICAL_AGE MUST BE REPORTED. It helps to have as many columns describing the age of the sample. If
only the min and max age are reported, then a random age is calculated by the server for the age field that is used for plotting the data. The more
types of age data reported, the easier the data are to find and the more likely a user will trust the reported age. |
METHOD DATA(REQUIRED):
A detailed description of this section will be added soon.
STANDARD DATA(OPTIONAL ):
A detailed description of this section will be added soon.
FRACTIONATION DATA(OPTIONAL ):
A detailed description of this section will be added soon.
REFERENCE DATA(OPTIONAL ):
A detailed description of this section will be added soon.
INCLUSION DATA(OPTIONAL ):
A detailed description of this section will be added soon.
MINERAL DATA(OPTIONAL ):
A detailed description of this section will be added soon.
NORMALIZATION DATA(OPTIONAL ):
A detailed description of this section will be added soon.
ALIAS DATA(OPTIONAL ):
When a sample has been used in more then one publication or database, then an alias must be established between them.
An example of a case for alias data is when one paper publishes the major element data, and then another paper to publish the trace element data.
In some cases the sample name of the rock used is changed between publications. To learn more about the ALIAS DATA section
click here.
How do I name the data file?
When a data file is submitted it should have a file name starting with the GeoRef number followed by the first authors and ending with the ".txt" extension. The naming convention would look something like "GeoRef number_Last name of the first author.txt" or "2000-073550_Manley.txt"
If one sample shows up in two references, what do I do?
It is common for one sample to be referred to in more than one reference, and sometimes the name of the
sample is not exactly the same from one reference to another. This often occurs when the major element chemistry was reported in one reference
and the trace element chemistry was reported in another reference. This link can be established when submitting the data by using the
"ALIAS DATA" heading in the data file submitted for each reference. It is required that the "ALIAS DATA" heading is used in all data files and not just one
to insure the alias made is correct and can be verified again at a later data if needed.
The "ALIAS DATA" section should follow the format described below. Because the sample can be referenced in more that two
locations, the local sample name can be entered more than once to allow and alias to me made to numerous references. In the table below, the local "Sample 2" is
located once in "2001-234567" with the sample name "Sample 57" and again in "2000-643857" with the sample name "2".
ALIAS DATA | | |
Internal Sample Name |
External GeoRef |
External Sample Name |
Sample 1 |
2000-111111 |
Sample One |
Sample 2 |
2001-234567 |
Sample 57 |
Sample 2 |
2000-643857 |
2 |
Sample 3 |
1999-101010 |
Third Sample |
How to I submit unpublished data?
Unpublished data will be accepted into the database as long as the basic meta-data, such as sample name, age, and location and analysis
methods are also submitted in the data file. Even though the data is not published, the "Reference Data" section should be completed. If the data meets these
requirements then the data are submitted just like published data.
I already have a large database of sample, how do I submit it?
If you already have a well-documented database containing hundreds of samples, it is possible for
that data to be loaded into the database as is. Other databases such as DDS-14, PETROS, and the NEW MEXICO geochronological database
have been added to the NAVDAT search interface. The data are required to have basic information such as sample names, ages, and locations
before they can be added to the search interface. Contact Doug Walker or Jason Ash
if you would like to contribute a database or know of a database that should be integrated with the NAVDAT search interface.
My samples do not have their latitude and longitude reported in decimal degrees, is that OK?
NAVDAT data submission files expect latitude and longitude to be reported in decimal degrees.
If you are unable or having problems converting sample locations from a different format, such as township and range, contact
Doug Walker or Jason Ash.
What is an "Institution Number" and how do I find it?
The institution number identifies the name and location of the lab where the analysis of the sample took place.
The institution number is referenced in the METHOD DATA section of submitted data files. Click here for more information
about the METHOD DATA section of data files. The reason why
institution numbers are used is to establish a fixed vocabulary so the institution can be as part of the meta data searches. If
you need to get an institution number click here, or if you are unable to find an
institution in the database, please contact Doug Walker or Jason Ash.
Last update 3-28-2006
|