Difference between revisions of "Wikidata import 2024-10-28 Virtuoso"
Jump to navigation
Jump to search
(Created page with " == Storage Overview == <syntaxhighlight lang="bash"> ./hd/gamma/virtuoso ├── docker-compose.yml ├── scripts └── 10-bulkload.sql └── wikidata...") |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 17: | Line 17: | ||
$wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.nt.bz2 | $wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.nt.bz2 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Virtuoso Docker == | == Virtuoso Docker == | ||
Line 101: | Line 93: | ||
Started the import at 2024-10-29T10:50:00 | Started the import at 2024-10-29T10:50:00 | ||
Load completed at 2024-11-01T12:13:43 | Load completed at 2024-11-01T12:13:43 | ||
+ | |||
+ | |||
+ | === Database Size after import === | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | /hd/gamma/virtuoso/wikidata$ ls -lsh | ||
+ | total 907G | ||
+ | 4.0K drwxrwxr-x 2 th th 4.0K Oct 29 09:28 data | ||
+ | 12M -rw-r--r-- 1 root root 12M Nov 3 11:15 virtuoso-temp.db | ||
+ | 907G -rw-r--r-- 1 root root 907G Nov 3 11:08 virtuoso.db | ||
+ | 8.0K -rw-r----- 1 root root 7.1K Oct 29 10:48 virtuoso.ini | ||
+ | 4.0K -rw-r--r-- 1 root root 11 Nov 3 11:08 virtuoso.lck | ||
+ | 520K -rw-r--r-- 1 root root 515K Nov 3 11:08 virtuoso.log | ||
+ | 0 -rw-r--r-- 1 root root 0 Oct 29 10:48 virtuoso.pxa | ||
+ | 24K -rw-r--r-- 1 root root 17K Nov 3 11:18 virtuoso.trx | ||
+ | </syntaxhighlight> |
Latest revision as of 13:42, 7 November 2024
Storage Overview
./hd/gamma/virtuoso
├── docker-compose.yml
├── scripts
└── 10-bulkload.sql
└── wikidata
└── data
└── latest-all.nt.bz2
└── latest-all.nt.graph
Providing the Wikidata Dump
Downloading the Dump
$wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.nt.bz2
Virtuoso Docker
The password DBA_PASSWORD needs to be set before starting the docker container the first time.
version: "1.0"
services:
virtuoso_db:
container_name: wd_virtuoso
image: openlink/virtuoso-opensource-7
volumes:
- ./data:/database/data
- ./scripts:/opt/virtuoso-opensource/initdb.d
environment:
- DBA_PASSWORD=dba
- VIRT_PARAMETERS_NUMBEROFBUFFERS=5450000
- VIRT_PARAMETERS_MAXDIRTYBUFFERS=4000000
ports:
- "1111:1111"
- "9700:8890"
Adjusted buffer size to 64GB of Memory usage (see https://vos.openlinksw.com/owiki/wiki/VOS/VirtRDFPerformanceTuning)
Bulk load script: 10-bulkload.sql
--
-- Copyright (C) 2022 OpenLink Software
--
--
-- Add all files that end in .nt
--
ld_dir_all ('data', '*.nt', NULL)
;
--
-- Add all files that end in .bz2, .gz, or .xz, to show that the Virtuoso bulk loader
-- can load compressed files without manual decompression
--
ld_dir_all ('data', '*.bz2', NULL)
;
ld_dir_all ('data', '*.gz', NULL)
;
ld_dir_all ('data', '*.xz', NULL)
;
--
-- Now load all of the files found above into the database
--
rdf_loader_run()
;
--
-- End of script
--
Starting Docker Container and Bulk-load
Starting a new screen session
$screen -S wd_virtuoso
Stating the docker container
$docker compose up
Started the import at 2024-10-29T10:50:00 Load completed at 2024-11-01T12:13:43
Database Size after import
/hd/gamma/virtuoso/wikidata$ ls -lsh
total 907G
4.0K drwxrwxr-x 2 th th 4.0K Oct 29 09:28 data
12M -rw-r--r-- 1 root root 12M Nov 3 11:15 virtuoso-temp.db
907G -rw-r--r-- 1 root root 907G Nov 3 11:08 virtuoso.db
8.0K -rw-r----- 1 root root 7.1K Oct 29 10:48 virtuoso.ini
4.0K -rw-r--r-- 1 root root 11 Nov 3 11:08 virtuoso.lck
520K -rw-r--r-- 1 root root 515K Nov 3 11:08 virtuoso.log
0 -rw-r--r-- 1 root root 0 Oct 29 10:48 virtuoso.pxa
24K -rw-r--r-- 1 root root 17K Nov 3 11:18 virtuoso.trx