r/MachineLearning · · 1 min read

A map of the latest 11 million papers split by semantic similarity and time slices [P]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

A map of the latest 11 million papers split by semantic similarity and time slices [P]

I have building alternative ways explore scientifc literature. The goal was to make the large number of papers published daily easier to keep up with by visualising the macro scopic trend.

It is free to use at The Global Research Space for any one interested in giving it a try!

How I built it

I sourced the latest 11M papers from OpenAlex and Arxiv and ecoded them using SPECTER 2 on titles and abstracts then projecting it down to 2d using UMAP and creating labels within voronoi bounds around high density peaks at increasingly deep depths.

There is also support for both keyword and semantic queries, and there's an analytics layer for ranking institutions, authors, and topics etc.

I have also more recently added to ability to slide back and forth in time and a daily auto ingestion script to ensure the map is up to date.

Feedback or suggestions is very welcome!

submitted by /u/icannotchangethename
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning