Data extraction is the process of collecting data from a variety of sources, often poorly organized or unstructured. It is the first step in both, ETL (extract, transform, load) and ELT (extract, load, transform) pipelines.
A data pipeline is a series of processes that migrate data from a source to a destination data store (often a database).
Docker uses operating system level virtualization to isolate software in packages called containers.
Digital Right Management is a way to protect data from being copied. Posts focus on how to get rid of DRM for the purpose of a private copy of the content.
Kibana is a front-end for Elasticsearch.
Machine Learning posts focus on computer algorithms that improve automatically through experience.
Nextcloud allows users to set up a set of tools to save and share data privately. It is not tied to the public cloud, so that users can spin up instances on their own private hardware and keep their data privately.
Generic tag that I use to tag (short) blog entries, in which I mainly document stuff for myself but expose those notes as they might be of interest for others as well.
Pi-hole is a lightweight nextwork-level ad- and tracking blocking application, which allows to block content for all devices in the network. It acts as a DNS sinkhole and optionally as a DHCP server.
Generic tag which I use to tag posts that cover tech topics (which are most of my posts).
A web-crawler, also called a spider or spiderbot or just crawler, is an internet bot that systematically browses the web. Web-crawlers are often operated by search engines for the purpose of web indexing.