impuls.tasks

class impuls.tasks.AddEntity(entity: Entity, task_name: str = 'AddEntity')

Bases: Task

AddEntity is a simple task that adds the provided entity to the DB.

execute(r: TaskRuntime) None

execute process the data in the runtime environment.

As of now, Tasks are guaranteed to run in a single thread with a single runtime, but execute may be called multiple times in different runtime. Thus, it is safe for Task implementations to hold some execute-related state, but that state should be reset on entry to execute.

entity: Entity
class impuls.tasks.ExecuteSQL(task_name: str, statement: str)

Bases: Task

ExecuteSQL task simply executes the provided statement.

execute(r: TaskRuntime) None

execute process the data in the runtime environment.

As of now, Tasks are guaranteed to run in a single thread with a single runtime, but execute may be called multiple times in different runtime. Thus, it is safe for Task implementations to hold some execute-related state, but that state should be reset on entry to execute.

statement: str
class impuls.tasks.GenerateTripHeadsign(name: str | None = None)

Bases: Task

GenerateTripHeadsign is a task which fills the trip_headsign field for all Trips which don’t already have a headsign.

The generated headsign is the name of the last stop of the trip. This step will break if there are trips without any stops.

execute(r: TaskRuntime) None

execute process the data in the runtime environment.

As of now, Tasks are guaranteed to run in a single thread with a single runtime, but execute may be called multiple times in different runtime. Thus, it is safe for Task implementations to hold some execute-related state, but that state should be reset on entry to execute.

class impuls.tasks.LoadBusManMDB(resource: str, agency_id: str, ignore_route_id: bool = False, ignore_stop_id: bool = False)

Bases: Task

LoadBusManMDB loads data into the database from a BusMan MDB database.

Only the following entities are loaded: Route, Stop, Calendar, Trip and StopTime.

Agency has to be manually curated beforehand (e.g. with AddEntity task).

The imported Calendar entities will be empty after the import. Providing dates for calendars must be done manually afterwards.

Most MDB databases seen in the wild have no stop positions. This step will set the latitude and longitude to 0. Further curation is usually necessary.

Parameters:

  • resource: name of the resource with MDB file

  • agency_id: ID of the manually curated Agency

  • ignore_route_id: use route_short_name as the ID, instead of the BusMan internal ID

  • ignore_stop_id: use stop_code as the ID, instead of the BusMan internal ID

This task additionally requires mdbtools to be installed. This package is available in most package managers.

execute(r: TaskRuntime) None

execute process the data in the runtime environment.

As of now, Tasks are guaranteed to run in a single thread with a single runtime, but execute may be called multiple times in different runtime. Thus, it is safe for Task implementations to hold some execute-related state, but that state should be reset on entry to execute.

load_calendars(mdb_path: Path, db: DBConnection) None
load_routes(mdb_path: Path, db: DBConnection) None
load_stop_times(mdb_path: Path, db: DBConnection) None
load_stops(mdb_path: Path, db: DBConnection) None
load_trips(mdb_path: Path, db: DBConnection) None
agency_id: str
ignore_route_id: bool
ignore_stop_id: bool
resource: str
class impuls.tasks.LoadGTFS(resource: str)

Bases: Task

LoadGTFS attempts to load GTFS data from a ZIP archive.

The loader only supports a subset of the GTFS schema. Due to implementation details, some invalid values may be accepted and some valid values may be rejected. In particular:

  • stops.txt location_types 3, 4, and 5 will cause an error,

  • parent_station may only refer to stop_ids defined in earlier lines,

  • agency_id in fare_attributes.txt is required if it’s present in agency.txt, even if there’s only one agency defined in the dataset.

execute(r: TaskRuntime) None

execute process the data in the runtime environment.

As of now, Tasks are guaranteed to run in a single thread with a single runtime, but execute may be called multiple times in different runtime. Thus, it is safe for Task implementations to hold some execute-related state, but that state should be reset on entry to execute.

extract_gtfs_to(zip_path: str | PathLike[str], dir_path: str | PathLike[str]) None
resource: str
class impuls.tasks.ModifyRoutesFromCSV(resource: str, must_curate_all: bool = False, silent: bool = False)

Bases: ModifyFromCSV

ModifyStopsFromCSV implements the ModifyFromCSV step for Routes.

The CSV file pointed by the provided resource must have a header row and must have a route_id field.

The following fields may be present, and will be used to update the metadata of the matching Route:

  • route_short_name

  • route_long_name

  • route_type

  • route_color

  • route_text_color

  • route_sort_order

Parameters:
  • resource (str) – name of the resource with data (in CSV).

  • must_curate_all (bool) – if True, then this task will fail if some entities weren’t curated. Defaults to False.

  • silent (bool) – if True, doesn’t warn every time an entity from CSV isn’t found in the DB.

static csv_column_mapping() Mapping[str, CSVFieldData]

csv_field_mapping returns the mapping from a CSV column name to metadata about the column - the corresponding entity field and a converter from string to a value of an appropriate type.

static model_class() Type[Entity]

model_class returns the type from impuls.model whose entities are going to be modified

static primary_key_csv_column() str

primary_key_csv_field returns the CSV column name which contains the primary key

static query_for_all_ids() str

query_for_all_ids returns an SQL query string which returns all the known IDs of all entities of given type.

class impuls.tasks.ModifyStopsFromCSV(resource: str, must_curate_all: bool = False, silent: bool = False)

Bases: ModifyFromCSV

ModifyStopsFromCSV implements the ModifyFromCSV step for Stops.

The CSV file pointed by the provided resource must have a header row and must have a stop_id field.

The following fields may be present, and will be used to update the metadata of the matching Stop:

  • stop_name

  • stop_code

  • stop_lat

  • stop_lon

  • zone_id

  • wheelchair_boarding

  • platform_code

Parameters:
  • resource (str) – name of the resource with data (in CSV).

  • must_curate_all (bool) – if True, then this task will fail if some entities weren’t curated. Defaults to False.

  • silent (bool) – if True, doesn’t warn every time an entity from CSV isn’t found in the DB.

static csv_column_mapping() Mapping[str, CSVFieldData]

csv_field_mapping returns the mapping from a CSV column name to metadata about the column - the corresponding entity field and a converter from string to a value of an appropriate type.

static model_class() Type[Entity]

model_class returns the type from impuls.model whose entities are going to be modified

static primary_key_csv_column() str

primary_key_csv_field returns the CSV column name which contains the primary key

static query_for_all_ids() str

query_for_all_ids returns an SQL query string which returns all the known IDs of all entities of given type.

class impuls.tasks.RemoveUnusedEntities

Bases: Task

RemoveUnusedEntities removes entities from the database which serve no purpose:

drop_agencies_without_routes(db: DBConnection) None
drop_calendars_without_dates(db: DBConnection) None
drop_calendars_without_trips(db: DBConnection) None
drop_routes_without_trips(db: DBConnection) None
drop_stations_without_stops(db: DBConnection) None
drop_stops_without_stop_times(db: DBConnection) None
drop_trips_with_at_most_one_stop(db: DBConnection) None
execute(r: TaskRuntime) None

execute process the data in the runtime environment.

As of now, Tasks are guaranteed to run in a single thread with a single runtime, but execute may be called multiple times in different runtime. Thus, it is safe for Task implementations to hold some execute-related state, but that state should be reset on entry to execute.

class impuls.tasks.SaveDB(to: str | PathLike[str])

Bases: Task

SaveDB saves the contained data as-is to a database at a provided path.

execute(r: TaskRuntime) None

execute process the data in the runtime environment.

As of now, Tasks are guaranteed to run in a single thread with a single runtime, but execute may be called multiple times in different runtime. Thus, it is safe for Task implementations to hold some execute-related state, but that state should be reset on entry to execute.

to: Path
class impuls.tasks.SaveGTFS(headers: Mapping[str, Sequence[str]], target: str | PathLike[str], emit_empty_calendars: bool = False)

Bases: Task

SaveGTFS exports the contained data to as a GTFS zip file at the provided path.

headers is a mapping from a GTFS table (excluding the .txt extension) to a sequence of colum names. SaveGTFS doesn’t validate the provided mapping, so the caller must ensure all required columns and files are provided.

When emit_empty_calendars is set to True (default is False), empty calendars will still be generated in the calendar.txt file.

create_zip(dir: str | PathLike[str]) None
dump_tables(gtfs_dir: str | PathLike[str], db: DBConnection) None
execute(r: TaskRuntime) None

execute process the data in the runtime environment.

As of now, Tasks are guaranteed to run in a single thread with a single runtime, but execute may be called multiple times in different runtime. Thus, it is safe for Task implementations to hold some execute-related state, but that state should be reset on entry to execute.

emit_empty_calendars: bool
headers: Mapping[str, Sequence[str]]
target: Path
class impuls.tasks.TruncateCalendars(target: EmptyDateRange | InfiniteDateRange | LeftUnboundedDateRange | RightUnboundedDateRange | BoundedDateRange, fail_on_empty: bool = True)

Bases: Task

TruncateCalendars removes any services beyond the provided range.

For simplicity, all Calendars are converted to exception-based (all active dates represented by CalendarException).

apply_changes(db: DBConnection) None
check_if_empty() None
clear_state() None
compute_changes(db: DBConnection) None
compute_truncated_days_of(calendar: Calendar, db: DBConnection) set[Date]
drop_calendars(db: DBConnection) None
execute(r: TaskRuntime) None

execute process the data in the runtime environment.

As of now, Tasks are guaranteed to run in a single thread with a single runtime, but execute may be called multiple times in different runtime. Thus, it is safe for Task implementations to hold some execute-related state, but that state should be reset on entry to execute.

make_all_calendars_use_exceptions(db: DBConnection) None
set_exceptions_on_calendars(db: DBConnection) None
fail_on_empty: bool
target: EmptyDateRange | InfiniteDateRange | LeftUnboundedDateRange | RightUnboundedDateRange | BoundedDateRange