impuls.resource¶
- class impuls.resource.Resource¶
See
impuls.Resource.
- class impuls.resource.LocalResource¶
See
impuls.LocalResource.
- class impuls.resource.HTTPResource¶
See
impuls.HTTPResource.
- class impuls.resource.ConcreteResource¶
Bases:
ResourceConcreteResource is an abstract
Resourceimplementation which stores thelast_modifiedandfetch_timeproperties.fetch()still needs to be implemented.super().__init__()must be called by implementing classes in their__init__methods.- property fetch_time: datetime¶
fetch_time contains the timestamp of the last successful call to fetch. Only available after a call to fetch. Must be an aware datetime instance.
- property last_modified: datetime¶
last_modified contains the last update time of the resource. Only available after a call to fetch. Must be an aware datetime instance.
- class impuls.resource.ManagedResource(stored_at: Path, last_modified: datetime = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), fetch_time: datetime = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc))¶
Bases:
objectManagedResource represents a resource which has been cached† by a
Pipeline.The name may be confusing, it does not implement the
Resourceprotocol; it’s not an input to the Pipeline, rather a ManagedResource is the output ofPipeline.ManagedResources should not be modified. However, if they are modified:
all tasks following the modifying task will receive the modified ManagedResource.
- all tasks preceding and including the modifying task may receive a modified
ManagedResource or an unmodified fresh copy of the Resource.
† - LocalResources are not cached; in this case
stored_atis the same as the originalLocalResource.path.- bytes() bytes¶
bytes reads the content of the file into a bytes object.
- csv(encoding: str | None = None, errors: str | None = None, **csv_dict_reader_kwargs: Any) Iterator[dict[str, str]]¶
csv reads CSV records from the resource.
File is opened in “r” mode, and if encoding and errors are not defined, system settings are used.
Any other keyword arguments are passed to csv.DictReader constructor.
- json(encoding: str | None = None, errors: str | None = None, **json_load_kwargs: Any) Any¶
json deserializes resource content using JSON.
File is opened in “r” mode, and if encoding and errors are not defined, system settings are used.
Any other keyword arguments are passed to json.load.
- open_binary(buffering: int = -1) BinaryIO¶
open_text opens the cached file in “rb” mode, with the provided arguments
- open_text(buffering: int = -1, encoding: str | None = None, errors: str | None = None, newline: str | None = None) TextIO¶
open_text opens the cached file in “r” mode, with the provided arguments
- size() int¶
size returns the size of the file in bytes
- stat() stat_result¶
stat returns the stat result of the cached file with Resource content
- text(encoding: str | None = None, errors: str | None = None) str¶
text reads the content of the file into a string. If encoding and errors are not defined, system settings are used.
- yaml(encoding: str | None = None, errors: str | None = None) Any¶
yaml deserializes resource content using YAML, using yaml.safe_load.
File is opened in “r” mode, and if encoding and errors are not defined, system settings are used.
- fetch_time: datetime = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)¶
fetch_time is the time when the original Resource was downloaded.
Unavailable if the original resource was a LocalResource (will be the same as last_modified).
- last_modified: datetime = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)¶
last_modified is the last_modified time of the original Resource.
Note that this is different than stat().st_mtime, which is unrelated to the original Resource.
- stored_at: Path¶
stored_at is the Path to the cached resource
- class impuls.resource.TimeLimitedResource(r: Resource, minimal_time_between: timedelta)¶
Bases:
WrappedResourceTimeLimitedResource wraps a
Resourceand ensures the time between conditional fetches is at leastminimal_time_between.TimeLimitedResource can be used to cache constantly-changing resources or to prevent bothering an external server.
- fetch(conditional: bool) Iterator[bytes]¶
fetch returns the content of the resource; preferably in chunks of
FETCH_CHUNK_SIZElength.last_modifiedandfetch_timeattributes of the should be updated right before the first chunk is returned.If the conditional is set, the Resource must raise InputNotModified if the resource was not modified since
last_modified. In this case,last_modifiedandfetch_timemust not be updated.
- minimal_time_between: timedelta¶
minimal time which must pass between fetches to the external server
- class impuls.resource.WrappedResource(r: Resource)¶
Bases:
ResourceWrappedResource is a helper abstract class for implementing modifications to existing
Resourceinstances using the decorator pattern.WrappedResource proxies the
last_modifiedandfetch_timeproperties to the wrapped resource, but leavesfetch()unimplemented.super().__init__()must be called by implementing classes in their__init__methods.- property fetch_time: datetime¶
fetch_time contains the timestamp of the last successful call to fetch. Only available after a call to fetch. Must be an aware datetime instance.
- property last_modified: datetime¶
last_modified contains the last update time of the resource. Only available after a call to fetch. Must be an aware datetime instance.
- class impuls.resource.ZippedResource(r: Resource, file_name_in_zip: str | None = None, save_zip_in_memory: bool = True)¶
Bases:
WrappedResourceZippedResource wraps a
Resourcepointing to a zip archive, creating a Resource which reads the content of one file from that archive.file_name_in_zipdictates which file to extract from the archive.It defaults to None, which first checks if there’s one file in the archive, and extracts it.
save_zip_in_memorydictates whether the zipfile can be saved in memory.It defaults to
True, but if the archive itself is huge this option may be set toFalse, causing the zip file to be written to a temporary file.
- fetch(conditional: bool) Iterator[bytes]¶
fetch returns the content of the resource; preferably in chunks of
FETCH_CHUNK_SIZElength.last_modifiedandfetch_timeattributes of the should be updated right before the first chunk is returned.If the conditional is set, the Resource must raise InputNotModified if the resource was not modified since
last_modified. In this case,last_modifiedandfetch_timemust not be updated.
- fetch_zip(conditional: bool) ContextManager[BinaryIO]¶
Fetches the bytes of the zip file, depending on the
save_zip_in_memorysetting.
- fetch_zip_to_memory(conditional: bool) BytesIO¶
Fetches the zipfile to a BytesIO and returns it.
- fetch_zip_to_temp_file(conditional: bool) Generator[BinaryIO, None, None]¶
Fetches the zipfile to a TemporaryFile and returns it
- pick_file(in_: ZipFile) ZipInfo¶
Picks the file to decompress.
- file_name_in_zip: str | None¶
- save_zip_in_memory: bool¶
- impuls.resource.DATETIME_MAX_UTC = datetime.datetime(9999, 12, 31, 23, 59, 59, 999999, tzinfo=datetime.timezone.utc)¶
Helper constant with an aware version of datetime.max
- impuls.resource.DATETIME_MIN_UTC = datetime.datetime(1, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)¶
Helper constant with an aware version of datetime.min