Models

Data Model Specification: https://github.com/compilerla/los-angeles-data-sources Referenced spreadsheet for modeling: https://docs.google.com/spreadsheets/d/1uNtA4GbBwky8PPdNUvmXXZCI1GLtH5cGF-Q0FqD90w0/edit#gid=612549376

ALSO see: * https://project-open-data.cio.gov/v1.1/schema/ * https://www.w3.org/TR/vocab-dcat/

Notes:

The spreadsheet and DCAT standard are not one to one but can be adapted and detected. Major discrepency is over the flatening of distributions into a url and descriptor. The system may be configured to use the dataset’s portal’s vendor field to load a driver. A driver would be responsible for populating distribution instances. Default is a direct link driver.

class apps.datasets.models.CatalogRecord(*args, **kwargs)

Populated from SCDC data requirements doc. Additional context from fields scraped from speadsheet Datasets

https://asset1.basecamp.com/2345560/projects/13740930/attachments/275923420/44a8dc7c4f5133ba3e56252eb2e4400f0010/original/SCDC%20Presentation%20+%20data%20requirements%202-27-2017.pdf

This is the data maintained by humans. It represents a linkage to an external dataset.

dcat:CatalogRecord

Parameters:
  • id (AutoField) – Id
  • title (CharField) – A name given to the dataset.
  • state (FSMField) – State
  • curated_collection (CharField) – Collections were developed for CCF, following requested thematic topics and existing focus areas. Note: At a future data this field will meet requirements for optional “theme” adopted by USPRO and DCAT.
  • description (TextField) – free-text account of the dataset.
  • keyword (TextField) – A keyword or tag describing the dataset.
  • modified (DateTimeField) – Most recent date on which the dataset was changed, updated or modified.
  • publisher (ForeignKey to Publisher) – An entity responsible for making the dataset available (may not be responsible for collecting the data).
  • contact_point (TextField) – All relevant contact information (including name and email) for the person(s) to whom questions about the dataset should be sent.
  • identifier (CharField) – A unique identifier of the dataset.
  • access_level (CharField) – The degree to which this dataset could be made publicly-available, regardless of whether it has been made available. Choices: public (Data asset is or could be made publicly available to all without restrictions), restricted public (Data asset is available under certain use restrictions), or non-public (Data asset is not available to members of the public).
  • license (TextField) – This links to the license document under which the distribution is made available.
  • rights (CharField) – Information about rights held in and over the distribution.
  • spatial (TextField) – Spatial coverage of the dataset.
  • spatial_granularity (TextField) – Sub field of spatial coverage, required where applicable.
  • spatial_entity (ForeignKey to SpatialEntity) – Spatial entity
  • spatial_geometry (GeometryField) – Spatial geometry describing the coverage of the dataset.
  • temporal (TextField) – The temporal period that the dataset covers.
  • sync_strategy (CharField) – Plugin for automatically syncing metadata
  • sync_url (CharField) – Detected sync strategy url
  • distribution (CharField) – Available distributions, or specific data formats (ex: csv, Socrata API); type(s) of format(s)
  • distribution_fields (CharField) – URL of most commonly accessed distribution
  • accrual_periodicity (TextField) – The frequency at which dataset is published.
  • reports_to (TextField) – All legislation that requires or informs the collection and reporting of these data points.
  • collection_protocol (TextField) – Description of the frequency and mode of data collection (different from periodicity of dataset publication). Links to original data collection plan or proposals may also be added here.
  • conforms_to (TextField) – Data standard dataset meets.
  • described_by (CharField) – Machine readable documentation (typically used for APIs)
  • described_by_type (CharField) – Machine readable documentation type (typically used for APIs)
  • is_part_of (TextField) – The collection of which the dataset is a subset
  • issued (DateTimeField) – Date of formal issuance (e.g., publication) of the dataset.
  • language (CharField) – Language
  • landing_page (URLField) – A Web page that can be navigated to in a Web browser to gain access to the dataset, its distributions and/or additional information.
  • funded_by (TextField) – All groups and/or individuals that financially support the collection of this dataset. These entities may be the same or different from the dataset’s publisher
  • notes (TextField) – For use by SCDC project only
  • created_at (DateTimeField) – Created at
  • updated_at (DateTimeField) – Updated at
  • submitted_by (ForeignKey to User) – Submitted by
  • approved_by (ForeignKey to User) – Approved by
  • _percentage_complete (FloatField) – % complete
  • concepts (ManyToManyField) – The main category of the dataset. A dataset can have multiple themes.
  • tags (TaggableManager) – A comma-separated list of tags.
  • tagged_items (GenericRelation) – Tagged items
  • actor_actions (GenericRelation) – Actor actions
  • target_actions (GenericRelation) – Target actions
  • action_object_actions (GenericRelation) – Action object actions
lookup(local_name, dset_name=None)

Lookup a single value accross the catalog record and it’s dataset. The dataset provides any defaults the catalog record does not.

match_concepts

Concepts that should match during search. Does not select child concepts.

Expands concepts in the following order:

  • alternative ancestors (ancestors, ancestors alt parents)
  • search matched

Search expansion is ordering is limited by performace.

related_concepts

The intent is to query all related concepts, But “related” seems to work inverse of search? Right now we query descendant concepts instead of ancestors. Meaning a dataset tagged with “Health an Human Services” will relate to a story about water. In all likliehood we may want to query both descendant and ancestors for related but prefer a particular direction.

run_sync_strategy(sync_strategy=None, url=None)

Load & run the associated sync strategy

Simply returns if none is defined.

class apps.datasets.models.DataPortal(*args, **kwargs)

Fields scraped from spreadsheet Data Portals

dcat:Catalog

Typically, a web-based data catalog is represented as a single instance of this class.

Parameters:
  • id (AutoField) – Id
  • status (CharField) – Status
  • publisher (ForeignKey to Publisher) – Publisher
  • url (URLField) – Url
  • title (CharField) – Title
  • vendor (CharField) – Vendor
  • datasets_estimate (IntegerField) – Datasets estimate
  • license (TextField) – License
  • spatial_entity (ForeignKey to SpatialEntity) – Spatial entity
  • notes (TextField) – Notes
class apps.datasets.models.Dataset(*args, **kwargs)

This class represents the actual dataset as published by the dataset publisher. In cases where a distinction between the actual dataset and its entry in the catalog is necessary (because metadata such as modification date and maintainer might differ), the catalog record class can be used for the latter.

dcat:Dataset + api meta data

https://www.w3.org/TR/2013/WD-vocab-dcat-20130312/

This is where we sync metadata with external APIs

Parameters:
  • id (AutoField) – Id
  • catalog_record (OneToOneField to CatalogRecord) – Catalog record
  • title (CharField) – A name given to the dataset.
  • description (TextField) – free-text account of the dataset.
  • issued (DateTimeField) – Date of formal issuance (e.g., publication) of the dataset.
  • modified (DateTimeField) – Most recent date on which the dataset was changed, updated or modified.
  • identifier (CharField) – A unique identifier of the dataset.
  • keyword (TextField) – A keyword or tag describing the dataset.
  • language (CharField) – The language of the dataset.
  • temporal (TextField) – The temporal period that the dataset covers.
  • spatial (TextField) – Spatial coverage of the dataset.
  • accrual_periodicity (TextField) – The frequency at which dataset is published.
  • landing_page (URLField) – A Web page that can be navigated to in a Web browser to gain access to the dataset, its distributions and/or additional information.
  • theme (TextField) – The main category of the dataset. A dataset can have multiple themes.
  • publisher (TextField) – An entity responsible for making the dataset available.
  • contact_point (TextField) – Contact point
  • created_at (DateTimeField) – Created at
  • updated_at (DateTimeField) – Updated at
  • last_sync (DateTimeField) – Last time the data automatically synced
  • sourced_meta_data (JSONField) – Sourced meta data
class apps.datasets.models.DatasetURL(*args, **kwargs)

Keeps track of dataset urls in the system.

  • associate datasets not yet registered in the system
  • track old urls
  • search & dedupe catalog records
Parameters:
  • id (AutoField) – Id
  • catalog_record (ForeignKey to CatalogRecord) – Catalog record
  • url (URLField) – Url
attempt_catalog_record_sync()

Attempts to create a catalog record for this url. The record will be saved if the sync is successfull. None will be returned if no successfull sync took place.

class apps.datasets.models.DatasetsCustomPluginModel(id, path, depth, numchild, placeholder, parent, position, language, plugin_type, creation_date, changed_date, cmsplugin_ptr, title, show_title)
Parameters:
  • id (AutoField) – Id
  • path (CharField) – Path
  • depth (PositiveIntegerField) – Depth
  • numchild (PositiveIntegerField) – Numchild
  • placeholder (ForeignKey to Placeholder) – Placeholder
  • parent (ForeignKey to CMSPlugin) – Parent
  • position (PositiveSmallIntegerField) – Position
  • language (CharField) – Language
  • plugin_type (CharField) – Plugin_name
  • creation_date (DateTimeField) – Creation date
  • changed_date (DateTimeField) – Changed date
  • cmsplugin_ptr (OneToOneField to CMSPlugin) – Cmsplugin ptr
  • title (CharField) – Title
  • show_title (BooleanField) – Show title
  • datasets (ManyToManyField) – Datasets
class apps.datasets.models.DatasetsGroupPluginModel(id, path, depth, numchild, placeholder, parent, position, language, plugin_type, creation_date, changed_date, cmsplugin_ptr, title, link, show_title)
Parameters:
  • id (AutoField) – Id
  • path (CharField) – Path
  • depth (PositiveIntegerField) – Depth
  • numchild (PositiveIntegerField) – Numchild
  • placeholder (ForeignKey to Placeholder) – Placeholder
  • parent (ForeignKey to CMSPlugin) – Parent
  • position (PositiveSmallIntegerField) – Position
  • language (CharField) – Language
  • plugin_type (CharField) – Plugin_name
  • creation_date (DateTimeField) – Creation date
  • changed_date (DateTimeField) – Changed date
  • cmsplugin_ptr (OneToOneField to CMSPlugin) – Cmsplugin ptr
  • title (CharField) – Title
  • link (CharField) – Link
  • show_title (BooleanField) – Show title
  • datasets (ManyToManyField) – Datasets
class apps.datasets.models.DatasetsPluginModel(id, path, depth, numchild, placeholder, parent, position, language, plugin_type, creation_date, changed_date, cmsplugin_ptr)
Parameters:
  • id (AutoField) – Id
  • path (CharField) – Path
  • depth (PositiveIntegerField) – Depth
  • numchild (PositiveIntegerField) – Numchild
  • placeholder (ForeignKey to Placeholder) – Placeholder
  • parent (ForeignKey to CMSPlugin) – Parent
  • position (PositiveSmallIntegerField) – Position
  • language (CharField) – Language
  • plugin_type (CharField) – Plugin_name
  • creation_date (DateTimeField) – Creation date
  • changed_date (DateTimeField) – Changed date
  • cmsplugin_ptr (OneToOneField to CMSPlugin) – Cmsplugin ptr
  • datasets (ManyToManyField) – Datasets
class apps.datasets.models.DatasourceSuggestion(id, state, submission, submitted_by)
Parameters:
  • id (AutoField) – Id
  • state (FSMField) – State
  • submission (TextField) – Submission
  • submitted_by (ForeignKey to User) – Submitted by
class apps.datasets.models.Distribution(*args, **kwargs)

Fields defined from: https://www.w3.org/TR/vocab-dcat/#class-distribution

This model should be autopopulated by a sync task

dcat:Distribution

Parameters:
  • id (AutoField) – Id
  • title (CharField) – Title
  • description (TextField) – Description
  • issued (DateTimeField) – Date of formal issuance (e.g., publication) of the distribution.
  • modified (DateTimeField) – Most recent date on which the distribution was changed, updated or modified.
  • license (TextField) – This links to the license document under which the distribution is made available.
  • rights (TextField) – Information about rights held in and over the distribution.
  • access_url (URLField) – A landing page, feed, SPARQL endpoint or other type of resource that gives access to the distribution of the dataset
  • download_url (URLField) – A file that contains the distribution of the dataset in a given format
  • byte_size (PositiveIntegerField) – The size of a distribution in bytes.
  • media_type (CharField) – The media type of the distribution as defined by IANA.
  • format (CharField) – The file format of the distribution.
  • dataset (ForeignKey to Dataset) – Dataset
class apps.datasets.models.Publisher(*args, **kwargs)

Fields scraped from spreadsheet Publishers

aka Content Contributor shows up as dct:publisher must be able to export: http://xmlns.com/foaf/spec/#term_Person

Parameters:
  • id (AutoField) – Id
  • path (CharField) – Path
  • depth (PositiveIntegerField) – Depth
  • numchild (PositiveIntegerField) – Numchild
  • name (CharField) – Name
  • slug (SlugField) – appears in the url
  • agency_type (CharField) – Agency type
  • agency_url (URLField) – Agency url
  • primary_data_portal (URLField) – Primary data portal
  • body (RichTextUploadingField) – Body
  • description (TextField) – Description
set_sub_organization_of(name)

Sets the organizational parent by name

class apps.datasets.models.RecordColumn(*args, **kwargs)

Describes a column belonging to a dataset

Parameters:
  • id (AutoField) – Id
  • catalog_record (ForeignKey to CatalogRecord) – Catalog record
  • field_name (CharField) – Field name
  • label (CharField) – Label
  • description (TextField) – Description
  • data_type (CharField) – Data type
  • render_type (CharField) – Render type
  • concept (ForeignKey to Concept) – Concept
  • _order (OrderWrt) – order
class apps.datasets.models.SpatialEntity(id, name, geometry, granularity, data)
Parameters:
  • id (AutoField) – Id
  • name (CharField) – Name
  • geometry (GeometryField) – Geometry
  • granularity (TextField) – Granularity
  • data (HStoreField) – Data