Strategies for large imports of CSV data | Voters

Strategies for large imports of CSV data

Paul Spencer

I have a large amount of data (> 90,000 rows) to import to add a property to existing relationships. Running this import pins the CPU and prevents the server from responding to other requests.

Other than not using the csv loader and doing the merges a record at a time with throttled merges, is there another way to import data so that it doesn't consume the entire server? Would upgrading to 2 CPUs at least while running the import solve this?

November 22, 2024

Gregory King

Hi Paul,
Do you have indexes created in your database for the items you are matching onto. If not, this should dramatically speed things up. 90,000 rows shouldn't take very long. 
Best,
Greg

Paul Spencer

Gregory King Thanks for the reply. I am pretty new to GraphDB so I'm pretty sure I have not created indexes. I'll do some reading!

Gregory King

Paul Spencer We all have to start somewhere :)

If you haven't tried it out already, you might want to check out our import tool for CSVs. It has the option to generate corresponding cypher for loads and might help you see how creating uniqueness constraints (which are backed by indexes) on the linking identifiers for relationships is used to ensure performant loads.