Models¶
Data Model Specification: https://github.com/compilerla/los-angeles-data-sources Referenced spreadsheet for modeling: https://docs.google.com/spreadsheets/d/1uNtA4GbBwky8PPdNUvmXXZCI1GLtH5cGF-Q0FqD90w0/edit#gid=612549376
ALSO see: * https://project-open-data.cio.gov/v1.1/schema/ * https://www.w3.org/TR/vocab-dcat/
Notes:
The spreadsheet and DCAT standard are not one to one but can be adapted and detected. Major discrepency is over the flatening of distributions into a url and descriptor. The system may be configured to use the dataset’s portal’s vendor field to load a driver. A driver would be responsible for populating distribution instances. Default is a direct link driver.
-
class
apps.datasets.models.
CatalogRecord
(*args, **kwargs)¶ Populated from SCDC data requirements doc. Additional context from fields scraped from speadsheet Datasets
This is the data maintained by humans. It represents a linkage to an external dataset.
dcat:CatalogRecord
Parameters: - id (AutoField) – Id
- title (CharField) – A name given to the dataset.
- state (FSMField) – State
- curated_collection (CharField) – Collections were developed for CCF, following requested thematic topics and existing focus areas. Note: At a future data this field will meet requirements for optional “theme” adopted by USPRO and DCAT.
- description (TextField) – free-text account of the dataset.
- keyword (TextField) – A keyword or tag describing the dataset.
- modified (DateTimeField) – Most recent date on which the dataset was changed, updated or modified.
- publisher (ForeignKey to
Publisher
) – An entity responsible for making the dataset available (may not be responsible for collecting the data). - contact_point (TextField) – All relevant contact information (including name and email) for the person(s) to whom questions about the dataset should be sent.
- identifier (CharField) – A unique identifier of the dataset.
- access_level (CharField) – The degree to which this dataset could be made publicly-available, regardless of whether it has been made available. Choices: public (Data asset is or could be made publicly available to all without restrictions), restricted public (Data asset is available under certain use restrictions), or non-public (Data asset is not available to members of the public).
- license (TextField) – This links to the license document under which the distribution is made available.
- rights (CharField) – Information about rights held in and over the distribution.
- spatial (TextField) – Spatial coverage of the dataset.
- spatial_granularity (TextField) – Sub field of spatial coverage, required where applicable.
- spatial_entity (ForeignKey to
SpatialEntity
) – Spatial entity - spatial_geometry (GeometryField) – Spatial geometry describing the coverage of the dataset.
- temporal (TextField) – The temporal period that the dataset covers.
- sync_strategy (CharField) – Plugin for automatically syncing metadata
- sync_url (CharField) – Detected sync strategy url
- distribution (CharField) – Available distributions, or specific data formats (ex: csv, Socrata API); type(s) of format(s)
- distribution_fields (CharField) – URL of most commonly accessed distribution
- accrual_periodicity (TextField) – The frequency at which dataset is published.
- reports_to (TextField) – All legislation that requires or informs the collection and reporting of these data points.
- collection_protocol (TextField) – Description of the frequency and mode of data collection (different from periodicity of dataset publication). Links to original data collection plan or proposals may also be added here.
- conforms_to (TextField) – Data standard dataset meets.
- described_by (CharField) – Machine readable documentation (typically used for APIs)
- described_by_type (CharField) – Machine readable documentation type (typically used for APIs)
- is_part_of (TextField) – The collection of which the dataset is a subset
- issued (DateTimeField) – Date of formal issuance (e.g., publication) of the dataset.
- language (CharField) – Language
- landing_page (URLField) – A Web page that can be navigated to in a Web browser to gain access to the dataset, its distributions and/or additional information.
- funded_by (TextField) – All groups and/or individuals that financially support the collection of this dataset. These entities may be the same or different from the dataset’s publisher
- notes (TextField) – For use by SCDC project only
- created_at (DateTimeField) – Created at
- updated_at (DateTimeField) – Updated at
- submitted_by (ForeignKey to
User
) – Submitted by - approved_by (ForeignKey to
User
) – Approved by - _percentage_complete (FloatField) – % complete
- concepts (ManyToManyField) – The main category of the dataset. A dataset can have multiple themes.
- tags (TaggableManager) – A comma-separated list of tags.
- tagged_items (GenericRelation) – Tagged items
- actor_actions (GenericRelation) – Actor actions
- target_actions (GenericRelation) – Target actions
- action_object_actions (GenericRelation) – Action object actions
-
lookup
(local_name, dset_name=None)¶ Lookup a single value accross the catalog record and it’s dataset. The dataset provides any defaults the catalog record does not.
-
match_concepts
¶ Concepts that should match during search. Does not select child concepts.
Expands concepts in the following order:
- alternative ancestors (ancestors, ancestors alt parents)
- search matched
Search expansion is ordering is limited by performace.
The intent is to query all related concepts, But “related” seems to work inverse of search? Right now we query descendant concepts instead of ancestors. Meaning a dataset tagged with “Health an Human Services” will relate to a story about water. In all likliehood we may want to query both descendant and ancestors for related but prefer a particular direction.
-
run_sync_strategy
(sync_strategy=None, url=None)¶ Load & run the associated sync strategy
Simply returns if none is defined.
-
class
apps.datasets.models.
DataPortal
(*args, **kwargs)¶ Fields scraped from spreadsheet Data Portals
dcat:Catalog
Typically, a web-based data catalog is represented as a single instance of this class.
Parameters: - id (AutoField) – Id
- status (CharField) – Status
- publisher (ForeignKey to
Publisher
) – Publisher - url (URLField) – Url
- title (CharField) – Title
- vendor (CharField) – Vendor
- datasets_estimate (IntegerField) – Datasets estimate
- license (TextField) – License
- spatial_entity (ForeignKey to
SpatialEntity
) – Spatial entity - notes (TextField) – Notes
-
class
apps.datasets.models.
Dataset
(*args, **kwargs)¶ This class represents the actual dataset as published by the dataset publisher. In cases where a distinction between the actual dataset and its entry in the catalog is necessary (because metadata such as modification date and maintainer might differ), the catalog record class can be used for the latter.
dcat:Dataset + api meta data
https://www.w3.org/TR/2013/WD-vocab-dcat-20130312/
This is where we sync metadata with external APIs
Parameters: - id (AutoField) – Id
- catalog_record (OneToOneField to
CatalogRecord
) – Catalog record - title (CharField) – A name given to the dataset.
- description (TextField) – free-text account of the dataset.
- issued (DateTimeField) – Date of formal issuance (e.g., publication) of the dataset.
- modified (DateTimeField) – Most recent date on which the dataset was changed, updated or modified.
- identifier (CharField) – A unique identifier of the dataset.
- keyword (TextField) – A keyword or tag describing the dataset.
- language (CharField) – The language of the dataset.
- temporal (TextField) – The temporal period that the dataset covers.
- spatial (TextField) – Spatial coverage of the dataset.
- accrual_periodicity (TextField) – The frequency at which dataset is published.
- landing_page (URLField) – A Web page that can be navigated to in a Web browser to gain access to the dataset, its distributions and/or additional information.
- theme (TextField) – The main category of the dataset. A dataset can have multiple themes.
- publisher (TextField) – An entity responsible for making the dataset available.
- contact_point (TextField) – Contact point
- created_at (DateTimeField) – Created at
- updated_at (DateTimeField) – Updated at
- last_sync (DateTimeField) – Last time the data automatically synced
- sourced_meta_data (JSONField) – Sourced meta data
-
class
apps.datasets.models.
DatasetURL
(*args, **kwargs)¶ Keeps track of dataset urls in the system.
- associate datasets not yet registered in the system
- track old urls
- search & dedupe catalog records
Parameters: - id (AutoField) – Id
- catalog_record (ForeignKey to
CatalogRecord
) – Catalog record - url (URLField) – Url
-
attempt_catalog_record_sync
()¶ Attempts to create a catalog record for this url. The record will be saved if the sync is successfull. None will be returned if no successfull sync took place.
-
class
apps.datasets.models.
DatasetsCustomPluginModel
(id, path, depth, numchild, placeholder, parent, position, language, plugin_type, creation_date, changed_date, cmsplugin_ptr, title, show_title)¶ Parameters: - id (AutoField) – Id
- path (CharField) – Path
- depth (PositiveIntegerField) – Depth
- numchild (PositiveIntegerField) – Numchild
- placeholder (ForeignKey to
Placeholder
) – Placeholder - parent (ForeignKey to
CMSPlugin
) – Parent - position (PositiveSmallIntegerField) – Position
- language (CharField) – Language
- plugin_type (CharField) – Plugin_name
- creation_date (DateTimeField) – Creation date
- changed_date (DateTimeField) – Changed date
- cmsplugin_ptr (OneToOneField to
CMSPlugin
) – Cmsplugin ptr - title (CharField) – Title
- show_title (BooleanField) – Show title
- datasets (ManyToManyField) – Datasets
-
class
apps.datasets.models.
DatasetsGroupPluginModel
(id, path, depth, numchild, placeholder, parent, position, language, plugin_type, creation_date, changed_date, cmsplugin_ptr, title, link, show_title)¶ Parameters: - id (AutoField) – Id
- path (CharField) – Path
- depth (PositiveIntegerField) – Depth
- numchild (PositiveIntegerField) – Numchild
- placeholder (ForeignKey to
Placeholder
) – Placeholder - parent (ForeignKey to
CMSPlugin
) – Parent - position (PositiveSmallIntegerField) – Position
- language (CharField) – Language
- plugin_type (CharField) – Plugin_name
- creation_date (DateTimeField) – Creation date
- changed_date (DateTimeField) – Changed date
- cmsplugin_ptr (OneToOneField to
CMSPlugin
) – Cmsplugin ptr - title (CharField) – Title
- link (CharField) – Link
- show_title (BooleanField) – Show title
- datasets (ManyToManyField) – Datasets
-
class
apps.datasets.models.
DatasetsPluginModel
(id, path, depth, numchild, placeholder, parent, position, language, plugin_type, creation_date, changed_date, cmsplugin_ptr)¶ Parameters: - id (AutoField) – Id
- path (CharField) – Path
- depth (PositiveIntegerField) – Depth
- numchild (PositiveIntegerField) – Numchild
- placeholder (ForeignKey to
Placeholder
) – Placeholder - parent (ForeignKey to
CMSPlugin
) – Parent - position (PositiveSmallIntegerField) – Position
- language (CharField) – Language
- plugin_type (CharField) – Plugin_name
- creation_date (DateTimeField) – Creation date
- changed_date (DateTimeField) – Changed date
- cmsplugin_ptr (OneToOneField to
CMSPlugin
) – Cmsplugin ptr - datasets (ManyToManyField) – Datasets
-
class
apps.datasets.models.
DatasourceSuggestion
(id, state, submission, submitted_by)¶ Parameters: - id (AutoField) – Id
- state (FSMField) – State
- submission (TextField) – Submission
- submitted_by (ForeignKey to
User
) – Submitted by
-
class
apps.datasets.models.
Distribution
(*args, **kwargs)¶ Fields defined from: https://www.w3.org/TR/vocab-dcat/#class-distribution
This model should be autopopulated by a sync task
dcat:Distribution
Parameters: - id (AutoField) – Id
- title (CharField) – Title
- description (TextField) – Description
- issued (DateTimeField) – Date of formal issuance (e.g., publication) of the distribution.
- modified (DateTimeField) – Most recent date on which the distribution was changed, updated or modified.
- license (TextField) – This links to the license document under which the distribution is made available.
- rights (TextField) – Information about rights held in and over the distribution.
- access_url (URLField) – A landing page, feed, SPARQL endpoint or other type of resource that gives access to the distribution of the dataset
- download_url (URLField) – A file that contains the distribution of the dataset in a given format
- byte_size (PositiveIntegerField) – The size of a distribution in bytes.
- media_type (CharField) – The media type of the distribution as defined by IANA.
- format (CharField) – The file format of the distribution.
- dataset (ForeignKey to
Dataset
) – Dataset
-
class
apps.datasets.models.
Publisher
(*args, **kwargs)¶ Fields scraped from spreadsheet Publishers
aka Content Contributor shows up as dct:publisher must be able to export: http://xmlns.com/foaf/spec/#term_Person
Parameters: - id (AutoField) – Id
- path (CharField) – Path
- depth (PositiveIntegerField) – Depth
- numchild (PositiveIntegerField) – Numchild
- name (CharField) – Name
- slug (SlugField) – appears in the url
- agency_type (CharField) – Agency type
- agency_url (URLField) – Agency url
- primary_data_portal (URLField) – Primary data portal
- body (RichTextUploadingField) – Body
- description (TextField) – Description
-
set_sub_organization_of
(name)¶ Sets the organizational parent by name
-
class
apps.datasets.models.
RecordColumn
(*args, **kwargs)¶ Describes a column belonging to a dataset
Parameters: - id (AutoField) – Id
- catalog_record (ForeignKey to
CatalogRecord
) – Catalog record - field_name (CharField) – Field name
- label (CharField) – Label
- description (TextField) – Description
- data_type (CharField) – Data type
- render_type (CharField) – Render type
- concept (ForeignKey to
Concept
) – Concept - _order (OrderWrt) – order
-
class
apps.datasets.models.
SpatialEntity
(id, name, geometry, granularity, data)¶ Parameters: - id (AutoField) – Id
- name (CharField) – Name
- geometry (GeometryField) – Geometry
- granularity (TextField) – Granularity
- data (HStoreField) – Data