p3ml.github.io

Impulses for the Panel: The Open Science Publishing Flood and Collaborative Authoring

Grey Literature as Result of the P3ML Project (Some Contribution to the Flood and Means to Navigate it)

The project P3ML funded by the Ministry of Education and Research of Germany (BMBF) under grant number 01/S17064 offered student labs with a strong focus on the practical aspects of Machine Learning. Furthermore, it produced a variety of teaching materials. These were published on different platforms:

Jupyter Notebooks

All our notebooks have been published at GitHub. For ease of access we created an explicit entry page at https://p3ml.github.io. While GitHub is great for storing, sharing and versioning the notebook file, it does not display all its content correctly. To see all elements as they are meant, you may use nbviewer. There are even free services like Binder that allow to work interactively with notebooks stored at GitHub.

Screenshots of the “Minimum Enclosing Ball” notebook as displayed by GitHub, nbviewer and Binder
Three platforms serving Jupyter Notebooks

A notebook consists for cells that may contain formatted text (Markdown, HTML, LaTeX), code (most often Python, might include data) and results from previous runs including visualizations. When the user executes a code cell, the code is send to a process in the background (“the kernel”), executed and the result integrated into the notebook right below the code cell.

Elements of Explanation in Jupyter Notebooks

(*) Shown during the talk.

On Navigating the Flood

During the project with covered some technology that could be helpful to navigate the Open Science Publishing Flood.

One lab experimented with Latent Dirichlet Allocation for Topic Mining (Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022. https://dl.acm.org/citation.cfm?id=944919.944937). Which resulted in another set of notebooks. The actual lab worked on data that we can not publish so the published notebooks took a separate source: Answers of deputies of the Deutsche Bundestag at abgeordnetenwatch.de. The last notebook of this section shows the mined topics as wordclouds that give a (surprisingly?) good impression of the topics.

There are already ready made solutions for topic mining. For example https://www.hypershelf.org from https://inphoproject.org. Another nice component is https://github.com/bmabey/pyLDAvis. There are more. In the lab we had the impression that the evolution of topics over time is especially interesting (not yet transferred to the abgeordnetenwatch.de data set).

Two further labs experimented with the N-Ball approach that combines state of the art word-embeddings into vector spaces (allowing calculations like “king” - “man” + “woman” = “queen”) with concepts hierarchies as codified through WordNet. The students transferred the approach to different languages, often their mother tongue.

Literature about Jupyter Notebooks

Presentations