DataTaunew | comments | leaders | submitlogin
2 points by tomkinstinch 2865 days ago | link | parent

Where I work, we've been using Snakemake[1] or data pipeliens. It's like GNU-make but with a Pythonic syntax. It determines which processing steps need to be executed by building a directed acyclic graph from the end to the beginning; it figures out which operations are needed to produce a given output, then looks at those and figures out their inputs, and so it. It can even submit parallel processing jobs to a batch-queueing cluster.

1. https://bitbucket.org/snakemake/snakemake/wiki/Home



2 points by nickhould 2864 days ago | link

Interesting. Does it handle the scheduling?

Airflow uses the directed acyclic graph (DAG). You can visualize those graph and build those sequences.

-----




RSS | Announcements