DataTau | SciKit-Learn Laboratory (SKLL)

DataTau

	SciKit-Learn Laboratory (SKLL) (github.com)
	9 points by IllSc 3435 days ago \| 3 comments

3 points by ematvey 3435 days ago | link

So they want to replace familiar and expressive Python boilerplate with slightly more compact new DSL boilerplate. And then when user runs in to something they haven't thought about, he have to fall back onto it's underlying technology stack. I don't see the point.

-----

2 points by dan_blanchard 3434 days ago | link

Depending on your use case, SKLL might not save you that much time, but we do a fair bit of hand-holding here to make it much harder to screw things up than with scikit-learn on its own. For example, if you want to run an experiment using with the same sparse data using RandomForestClassifier, SGDClasifier, and SVC, you would normally bump into the fact that RandomForestClassifier doesn't support sparse matrices in 0.15.2 (it will in the next release). SKLL automatically converts things to dense when necessary instead of just raising an exception. It will only raise an exception if it runs out of memory when doing that conversion, and even then it'll tell you why it needed to do that in the first place.

SKLL also provides default parameter grids for all of the supported learners, so if you don't know much about the underlying algorithm but want a tuned model, you can do that by just setting "grid_search=True" in your config file.

We also support loading a variety of file formats that aren't supported by scikit-learn (e.g., ARFF, MegaM). That way if you want to experiment with different ML toolkits using the same data, you can use the same files.

Obviously, as the main dev, I'm pretty biased, but it's been extremely useful for our group at ETS. We needed a tool for people who don't know Python to be able run ML experiments easily with the state-of-the-art estimators available in scikit-learn.

As for your comment about running into something we haven't thought about, please feel free to file an issue in such cases. :)

Oh, and thanks for the feedback. It made me realize that if people are linking to the GitHub page, I should really update the README to reflect the fact that we now have a tutorial for using run_experiment, and to emphasize more that we're mostly trying to development command-line utilities for scikit-learn, and that our Python API is just a happy side effect of that.

-----

1 point by Chopsting 3433 days ago | link

I run into error trying to run the run_experiment --local evaluate.cfg command...

File "C:\Users\roek0_000\Anaconda\lib\runpy.py", line 162, in _run_module_as_m in "__main__", fname, loader, pkg_name) File "C:\Users\roek0_000\Anaconda\lib\runpy.py", line 72, in _run_code exec code in run_globals File "C:\Users\roek0_000\Anaconda\Scripts\run_experiment.exe\__main__.py", lin 9, in <module> File "C:\Users\roek0_000\Anaconda\lib\site-packages\skll\utilities\run_experim nt.py", line 108, in main ablation=ablation, resume=args.resume) File "C:\Users\roek0_000\Anaconda\lib\site-packages\skll\experiments.py", line 1202, in run_configuration _classify_featureset(job_args) File "C:\Users\roek0_000\Anaconda\lib\site-packages\skll\experiments.py", line 651, in _classify_featureset print("Task:", task, file=log_file) ypeError: must be unicode, not str

-----

RSS | Announcements