Geospatial Data
How The Dataverse Software Ingests Shapefiles
A shapefile is a set of files, often uploaded/transferred in .zip format. This set may contain up to fifteen files. A minimum of three specific files (.shp, .shx, .dbf) are needed to be a valid shapefile and a fourth file (.prj) is required for some applications – or any type of meaningful visualization.
For ingest, four files are the minimum required:
- .shp- shape format; the feature geometry itself
- .shx- shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
- .dbf- attribute format; columnar attributes for each shape, in dBase IV format
- .prj- projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format
Ingest
When uploaded to a Dataverse installation, the .zip is unpacked (same as all .zip files). Shapefile sets are recognized by the same base name and specific extensions. These individual files constitute a shapefile set. The first four are the minimum required (.shp, .shx, .dbf, .prj)
For example:
- bicycles.shp (required extension) 
- bicycles.shx (required extension) 
- bicycles.prj (required extension) 
- bicycles.dbf (required extension) 
- bicycles.sbx (NOT required extension) 
- bicycles.sbn (NOT required extension) 
Upon recognition of the four required files, the Dataverse installation will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set:
- Required: - .shp,- .shx,- .dbf,- .prj
- Optional: - .sbn,- .sbx,- .fbn,- .fbx,- .ain,- .aih,- .ixs,- .mxs,- .atx,- .cpg,- .qpj,- .qmd,- shp.xml
Then the Dataverse installation creates a new .zip with mimetype as a shapefile. The shapefile set will persist as this new .zip.
Example
1a. Original .zip contents:
A file named bikes_and_subways.zip is uploaded to the Dataverse installation. This .zip contains the following files.
- bicycles.shp(shapefile set #1)
- bicycles.shx(shapefile set #1)
- bicycles.prj(shapefile set #1)
- bicycles.dbf(shapefile set #1)
- bicycles.sbx(shapefile set #1)
- bicycles.sbn(shapefile set #1)
- bicycles.txt
- the_bikes.md
- readme.txt
- subway_line.shp(shapefile set #2)
- subway_line.shx(shapefile set #2)
- subway_line.prj(shapefile set #2)
- subway_line.dbf(shapefile set #2)
1b. The Dataverse installation unzips and re-zips files:
Upon ingest, the Dataverse installation unpacks the file bikes_and_subways.zip. Upon recognizing the shapefile sets, it groups those files together into new .zip files:
- files making up the “bicycles” shapefile become a new - .zip
- files making up the “subway_line” shapefile become a new - .zip
- remaining files will stay as they are 
To ensure that a shapefile set remains intact, individual files such as bicycles.sbn are kept in the set – even though they are not used for mapping.
1c. The Dataverse installation final file listing:
- bicycles.zip(contains shapefile set #1:- bicycles.shp,- bicycles.shx,- bicycles.prj,- bicycles.dbf,- bicycles.sbx,- bicycles.sbn)
- bicycles.txt(separate, not part of a shapefile set)
- the_bikes.md(separate, not part of a shapefile set)
- readme.txt(separate, not part of a shapefile set)
- subway_line.zip(contains shapefile set #2:- subway_line.shp,- subway_line.shx,- subway_line.prj,- subway_line.dbf)
For two “final” shapefile sets, bicycles.zip and subway_line.zip, a new mimetype is used:
- Mimetype: - application/zipped-shapefile
- Mimetype Label: “Shapefile as ZIP Archive”