impuls.resource

class impuls.resource.Resource

See impuls.Resource.

class impuls.resource.LocalResource

See impuls.LocalResource.

class impuls.resource.HTTPResource

See impuls.HTTPResource.

class impuls.resource.ConcreteResource

Bases: Resource

ConcreteResource is an abstract Resource implementation which stores the last_modified and fetch_time properties. fetch() still needs to be implemented.

super().__init__() must be called by implementing classes in their __init__ methods.

property fetch_time: datetime

fetch_time contains the timestamp of the last successful call to fetch. Only available after a call to fetch. Must be an aware datetime instance.

property last_modified: datetime

last_modified contains the last update time of the resource. Only available after a call to fetch. Must be an aware datetime instance.

class impuls.resource.ManagedResource(stored_at: Path, last_modified: datetime = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), fetch_time: datetime = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc))

Bases: object

ManagedResource represents a resource which has been cached† by a Pipeline.

The name may be confusing, it does not implement the Resource protocol; it’s not an input to the Pipeline, rather a ManagedResource is the output of Pipeline.

ManagedResources should not be modified. However, if they are modified:

  • all tasks following the modifying task will receive the modified ManagedResource.

  • all tasks preceding and including the modifying task may receive a modified

    ManagedResource or an unmodified fresh copy of the Resource.

† - LocalResources are not cached; in this case stored_at is the same as the original LocalResource.path.

bytes() bytes

bytes reads the content of the file into a bytes object.

csv(encoding: str | None = None, errors: str | None = None, **csv_dict_reader_kwargs: Any) Iterator[dict[str, str]]

csv reads CSV records from the resource.

File is opened in “r” mode, and if encoding and errors are not defined, system settings are used.

Any other keyword arguments are passed to csv.DictReader constructor.

json(encoding: str | None = None, errors: str | None = None, **json_load_kwargs: Any) Any

json deserializes resource content using JSON.

File is opened in “r” mode, and if encoding and errors are not defined, system settings are used.

Any other keyword arguments are passed to json.load.

open_binary(buffering: int = -1) BinaryIO

open_text opens the cached file in “rb” mode, with the provided arguments

open_text(buffering: int = -1, encoding: str | None = None, errors: str | None = None, newline: str | None = None) TextIO

open_text opens the cached file in “r” mode, with the provided arguments

size() int

size returns the size of the file in bytes

stat() stat_result

stat returns the stat result of the cached file with Resource content

text(encoding: str | None = None, errors: str | None = None) str

text reads the content of the file into a string. If encoding and errors are not defined, system settings are used.

yaml(encoding: str | None = None, errors: str | None = None) Any

yaml deserializes resource content using YAML, using yaml.safe_load.

File is opened in “r” mode, and if encoding and errors are not defined, system settings are used.

fetch_time: datetime = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)

fetch_time is the time when the original Resource was downloaded.

Unavailable if the original resource was a LocalResource (will be the same as last_modified).

last_modified: datetime = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)

last_modified is the last_modified time of the original Resource.

Note that this is different than stat().st_mtime, which is unrelated to the original Resource.

stored_at: Path

stored_at is the Path to the cached resource

class impuls.resource.TimeLimitedResource(r: Resource, minimal_time_between: timedelta)

Bases: WrappedResource

TimeLimitedResource wraps a Resource and ensures the time between conditional fetches is at least minimal_time_between.

TimeLimitedResource can be used to cache constantly-changing resources or to prevent bothering an external server.

fetch(conditional: bool) Iterator[bytes]

fetch returns the content of the resource; preferably in chunks of FETCH_CHUNK_SIZE length.

last_modified and fetch_time attributes of the should be updated right before the first chunk is returned.

If the conditional is set, the Resource must raise InputNotModified if the resource was not modified since last_modified. In this case, last_modified and fetch_time must not be updated.

minimal_time_between: timedelta

minimal time which must pass between fetches to the external server

class impuls.resource.WrappedResource(r: Resource)

Bases: Resource

WrappedResource is a helper abstract class for implementing modifications to existing Resource instances using the decorator pattern.

WrappedResource proxies the last_modified and fetch_time properties to the wrapped resource, but leaves fetch() unimplemented.

super().__init__() must be called by implementing classes in their __init__ methods.

property fetch_time: datetime

fetch_time contains the timestamp of the last successful call to fetch. Only available after a call to fetch. Must be an aware datetime instance.

property last_modified: datetime

last_modified contains the last update time of the resource. Only available after a call to fetch. Must be an aware datetime instance.

r: Resource
class impuls.resource.ZippedResource(r: Resource, file_name_in_zip: str | None = None, save_zip_in_memory: bool = True)

Bases: WrappedResource

ZippedResource wraps a Resource pointing to a zip archive, creating a Resource which reads the content of one file from that archive.

  • file_name_in_zip dictates which file to extract from the archive.

    It defaults to None, which first checks if there’s one file in the archive, and extracts it.

  • save_zip_in_memory dictates whether the zipfile can be saved in memory.

    It defaults to True, but if the archive itself is huge this option may be set to False, causing the zip file to be written to a temporary file.

fetch(conditional: bool) Iterator[bytes]

fetch returns the content of the resource; preferably in chunks of FETCH_CHUNK_SIZE length.

last_modified and fetch_time attributes of the should be updated right before the first chunk is returned.

If the conditional is set, the Resource must raise InputNotModified if the resource was not modified since last_modified. In this case, last_modified and fetch_time must not be updated.

fetch_zip(conditional: bool) ContextManager[BinaryIO]

Fetches the bytes of the zip file, depending on the save_zip_in_memory setting.

fetch_zip_to_memory(conditional: bool) BytesIO

Fetches the zipfile to a BytesIO and returns it.

fetch_zip_to_temp_file(conditional: bool) Generator[BinaryIO, None, None]

Fetches the zipfile to a TemporaryFile and returns it

pick_file(in_: ZipFile) ZipInfo

Picks the file to decompress.

file_name_in_zip: str | None
save_zip_in_memory: bool
impuls.resource.DATETIME_MAX_UTC = datetime.datetime(9999, 12, 31, 23, 59, 59, 999999, tzinfo=datetime.timezone.utc)

Helper constant with an aware version of datetime.max

impuls.resource.DATETIME_MIN_UTC = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)

Helper constant with an aware version of datetime.min

impuls.resource.FETCH_CHUNK_SIZE = 131072

Preferred size of chunks returned by fetch()