Day 5: Neo4j & Graphipedia
I'm taking a slight step backwards today. Last night, I remembered about Neo4j, a database that's built with graphs instead of more common data structures (like SQL).
It has a number of "shortest path" algorithms to find the shortest path between two nodes in the graph. We're going to load the entirety of wikipedia into a Neo4j database, and use the articles as nodes and links to other articles as edges so we can utilize these functions.
And again, someone has done all the hard work to make this possible: mirkonasato/graphipedia.
How To
- Install Neo4j Desktop
- Clone noppaz/graphipedia (the original version has not been updated to Neo4j v4, but this fork has been)
- Install maven with homebrew:
brew install maven
- Get confused because java doesn't seem to exist even though it's installed:
java -version
- Link the openjdk installed with maven to java:
sudo ln -sfn /opt/homebrew/opt/openjdk/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk
- Check to make sure java exists now:
java -version
cd
into the cloned project and build it withmvn package
- Download
pages-articles-multistream.xml.bz2
from https://dumps.wikimedia.your.org/enwiki/latest/ - Run the import script with
./import.sh ~/Downloads/enwiki-latest-pages-articles.xml.bz2 ./output
- Wait...