Skip to content

Releases: neo4j/graph-data-science

2.4.5

24 Aug 15:20

Choose a tag to compare

neo4j-graph-data-science-2.4.5

Bug fixes

  • Fix a bug in the triangle-related procedures with on graphs with multiple relationship types where triangles could be computed incorrectly. The following procedures are affected:
    • gds.triangleCount.[stream|mutate|write|stats]
    • gds.localClusteringCoefficient.[stream|mutate|write|stats]
    • gds.alpha.triangles

Graph Data Science 2.4.4

17 Aug 13:01

Choose a tag to compare

Bug fixes

  • Fixed a bug where arrow processes that are automatically removed when they were aborted would not be properly cleaned up

Graph Data Science 2.4.3

27 Jul 12:30

Choose a tag to compare

Improvements

  • Added COSINE as an available similarityMetric for the gds.nodeSimilarity procedure
  • When exporting graphs to CSV or using backup and restore, a more diverse node label naming is now possible by using label mapping

Bug fixes

  • Fixed a bug where array default values would not be serialized or deserialized to csv correctly
  • Fixed an issue where Speaker-Listener LabelPropagation and other Pregel procedures wouldn’t stream or mutate on graphs that are not persisted in a Neo4j database
  • Fixed a bug in graph restore on AuraDS, which was failing after shutdown when node label name contained special characters or underscores

Graph Data Science 2.4.1

27 Jun 11:41

Choose a tag to compare

Bug fixes

  • Fix a bug in K-Core decomposition that can return invalid values if core values are not consecutive.
  • Fix a bug when using mutateProperty where using the same name as an existing node property could fail. Affected procedures include:
    • gds.alpha.knn.filtered.mutate
    • gds.alpha.nodeSimilarity.filtered.mutate
    • gds.beta.pipeline.linkPrediction.predict.mutate
    • gds.beta.steinerTree.mutate
    • gds.beta.spanningTree.mutate
    • gds.knn.mutate
    • gds.nodeSimilarity.mutate

Improvements

  • Improved error handling when negative node ids are used as input in the sourceNode, targetNode, sourceNodes, and targetNodes fields.
  • Improved performance when projecting in-memory graphs when projecting larger graphs.

Graph Data Science 2.4.0

14 Jun 15:39

Choose a tag to compare

Breaking changes

  • Pass concurrency when training a pipeline to the node property steps. Before they were executed with the default concurrency of 4 if not overridden. This affects
    • gds.beta.pipeline.linkPrediction.train
    • gds.beta.pipeline.nodeClassification.train
    • gds.alpha.pipeline.nodeClassification.train

New features

Major

  • Added Bellman-Ford algorithm
  • Added K-Core Decomposition algorithm
  • Added new Common Neighbour Aware Random Walk graph sampling algorithm
  • Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable

Minor

  • You can rename node properties when writing them back to the neo4j database using gds.nodeProperties.write by placing them inside a map in the form nodeProperty: 'renamedProperty'.

  • Added minCommunitySize|minComponentSize parameter to more procedures to allow filtering the result. (Contributed by @airtyon)

  • Added new procedure gds.alpha.drop.cypherdb to drop created in-memory databases

  • Added upperDegreeCutoff parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value.

  • Added aggregation to gds.beta.toUndirected to allow the aggregation of the new undirected relationships.

  • Added new optional parameter storeModelToDisk that automatically saves serializable models after training for licensed users. This affects gds.beta.pipeline.[linkPrediction|nodeClassification].train and gds.beta.graphSage.train.

  • Added procedure gds.graph.relationshipProperties.write that allows writing relationships with multiple properties to Neo4j.

  • Cypher Aggregation has graduated, which comes with a new name and API changes:

    • The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
      • The existing 'Cypher projection' (gds.graph.project.cypher) is now called "Legacy Cypher projection"
    • The procedure name is losing the alpha qualifier and is now called gds.graph.project.
    • The old name gds.alpha.graph.project is deprecated and usages will forward to the new name while also adapting to the new API.
    • The 4th and 5th parameters nodeConfig and relationshipConfig have been merged into a single dataConfig parameter.
    • The properties configuration key in this merged dataConfig parameter has been renamed to relationshipProperties.
    • The overall projection configuration (e.g. readConcurrency) has moved from the 6th parameter to the 5th parameter.
  • Graph data retrieved via the GDS Arrow endpoint can now be partitioned via the FlightInfo endpoint.

Bug fixes

  • Fixed: Arrow server doesn't enable to project graphs with blank names anymore
  • Fixed: Arrow validates dangling relationships when creating an in-memory graph
  • Fixed: if an arrow process is aborted, creating a new process with the same name is now possible
  • Fixed a bug where gds.graph.export could fail when exporting larger graphs
  • Fixed a bug where gds.alpha.kSpanningTree returned incorrect results when called with the nodeLabels parameter.
  • Fixed a bug where gds.triangleCount would throw an ArrayIndexOutOfBoundsException when called with the nodeLabels parameter.
  • Fixed a bug where link prediction mutate results could fail when predicted probability is extremely close to zero.

Improvements

Major

  • Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
    • FastRP
    • HashGNN
    • Leiden
    • Approxmaxkcut
    • Conductance
    • LinkPrediction training
    • ToUndirected
  • Improved partitioning. This affects the parallel runtime of gds.alpha.hits, gds.beta.graph.project.subgraph and gds.beta.pipeline.linkPrediction.predict if sampleRate = 0

Minor

  • Improve progress tracking for gds.beta.graphSage.train. This will enable progress bars on the python client.
  • Improve error message for invalid nodeLabels and relationshipTypes for procedures supporting memory estimation.
  • Allow running gds.debug.sysInfo and gds.debug.arrow to run against the system database.
  • Improve automatic conversion of array property values during graph projection.
  • The Yens algorithm can now be run in parallel.
  • The node regression now verifies upfront that the all targetProperty values provided are valid when calling gds.alpha.pipeline.nodeRegression.train.
  • The scale properties algorithm has been promoted:
    • Added new procedures gds.scaleProperties.[stream,mutate] which replace gds.alpha.scaleProperties.[stream,mutate] that are now deprecated
      • The scalers L1Norm and L2Norm are not supported in the new procedures.
    • Added new procedures gds.scaleProperties.[stats,write] to return statistics from a scale properties computation and write scaled properties back to a database respectively
    • Procedures gds.scaleProperties.[mutate,stats,stream,write] support progress tracking with volumes. This will enable progress bars on the python client
    • Procedures gds.scaleProperties.[mutate,stats,write] return statistics from the performed scale computation
    • Added new parameter offset to the log scaler. This also affects procedures:
      • gds.pageRank
      • gds.eigenvector
      • gds.articleRank
    • Added new procedures gds.scaleProperties.[mutate|stats|stream|write].estimate for estimating the memory requirements of running the scale properties algorithm
    • Nodes with missing properties (null or NaN) are now omitted in the scale computation. Their scale value is set to NaN in the output.
  • Reduce the memory footprint of the binary embeddings saved by gds.beta.hashgnn.mutate.
  • Promote random forest classifier to beta tier. Added gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest which replace gds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest that are now deprecated.
  • Reduced memory allocation for the Spanning Tree algorithm.
  • A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
  • Improve memory usage when projecting very large graphs with very high degree nodes.
  • Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
  • The import of nodes with negative id via arrow into a database is now forbidden.
  • Graph restore now attempts to use the same id map implementation that has been used for the original graph.
  • Setting the useBadCollector option to true for the arrow database import will now actually trigger errors if the collector encountered a problem.

Graph Data Science 2.4.0 PREVIEW

02 Jun 13:00

Choose a tag to compare

Pre-release

Neo4j Graph Data Science version 2.4.0 is compatible with Neo4j version 4.4 and Neo4j versions 5.1 through 5.8.

Breaking changes

  • Pass concurreny when training a pipeline to the node property steps. Before they were executed with the default concurrency of 4 if not overridden. This affects
    • gds.beta.pipeline.linkPrediction.train
    • gds.beta.pipeline.nodeClassification.train
    • gds.alpha.pipeline.nodeClassification.train

New features

  • You can rename node properties when writing them back to the neo4j database using gds.nodeProperties.write by placing them inside a map in the form nodeProperty: 'renamedProperty'.
  • Added minCommunitySize|minComponentSize parameter to more procedures to allow filtering the result. (Contributed by @airtyon) This includes:
    • gds.wcc.stream
    • gds.louvain.stream
    • gds.labelPropagation.stream
    • gds.beta.k1coloring.[stream|write]
    • gds.beta.leiden.[stream|write]
    • gds.beta.modularityOptimization.[stream|write]
    • gds.alpha.maxkcut.stream
  • Added new procedure gds.alpha.drop.cypherdb to drop created in-memory databases
  • Added Bellman-Ford algorithm:
    • gds.bellmanFord.stream
    • gds.bellmanFord.stream.estimate
    • gds.bellmanFord.stats
    • gds.bellmanFord.stats.estimate
    • gds.bellmanFord.mutate
    • gds.bellmanFord.mutate.estimate
    • gds.bellmanFord.write
    • gds.bellmanFord.write.estimate
  • Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable. This affects gds.alpha.model.store and gds.alpha.model.load.
  • Added upperDegreeCutoff parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value.
  • Added aggregation to gds.beta.toUndirected to allow the aggregation of the new undirected relationships.
  • Added new optional parameter storeModelToDisk that automatically saves serializable models after training for licensed users. This affects gds.beta.pipeline.[linkPrediction|nodeClassification].train and gds.beta.graphSage.train.
  • Added K-Core Decomposition algorithm:
    • gds.kcore.stats
    • gds.kcore.stats.estimate
    • gds.kcore.stream
    • gds.kcore.stream.estimate
    • gds.kcore.mutate
    • gds.kcore.mutate.estimate
    • gds.kcore.write
    • gds.kcore.write.estimate
  • Added procedure gds.graph.relationshipProperties.write that allows writing relationships with multiple properties to Neo4j.
  • Added new Common Neighbour Aware Random Walk graph sampling algorithm gds.graph.sample.cnarw. Available under beta tier.
  • Cypher Aggregation has graduated, which comes with a new name and API changes:
    • The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
      • The existing 'Cypher projection' (gds.graph.project.cypher) is now called "Legacy Cypher projection"
    • The procedure name is losing the alpha qualifier and is now called gds.graph.project.
    • The old name gds.alpha.graph.project is deprecated and usages will forward to the new name while also adapting to the new API.
    • The 4th and 5th parameters nodeConfig and relationshipConfig have been merged into a single dataConfig parameter.
    • The properties configuration key in this merged dataConfig parameter has been renamed to relationshipProperties.
    • The overall projection configuration (e.g. readConcurrency) has moved from the 6th parameter to the 5th parameter.

Bug fixes

  • Fixed: Arrow server doesn't enable to project graphs with blank names anymore
  • Fixed: Arrow validates dangling relationships when creating an in-memory graph.

Improvements

  • Improve progress tracking for gds.beta.graphSage.train. This will enable progress bars on the python client.
  • Improve error message for invalid nodeLabels and relationshipTypes for procedures supporting memory estimation.
  • Allow running gds.debug.sysInfo and gds.debug.arrow to run against the system database.
  • Improve automatic conversion of array property values during graph projection.
  • The Yens algorithm can now be run in parallel.
  • The node regression now verifies upfront that the all targetProperty values provided are valid when calling gds.alpha.pipeline.nodeRegression.train.
  • The scale properties algorithm has been promoted:
    • Added new procedures gds.scaleProperties.[stream,mutate] which replace gds.alpha.scaleProperties.[stream,mutate] that are now deprecated
      • The scalers L1Norm and L2Norm are not supported in the new procedures.
    • Added new procedures gds.scaleProperties.[stats,write] to return statistics from a scale properties computation and write scaled properties back to a database respectively
    • Procedures gds.scaleProperties.[mutate,stats,stream,write] support progress tracking with volumes. This will enable progress bars on the python client
    • Procedures gds.scaleProperties.[mutate,stats,write] return statistics from the performed scale computation
    • Added new parameter offset to the log scaler. This also affects procedures:
      • gds.pageRank
      • gds.eigenvector
      • gds.articleRank
    • Added new procedures gds.scaleProperties.[mutate|stats|stream|write].estimate for estimating the memory requirements of running the scale properties algorithm
    • Nodes with missing properties (null or NaN) are now omitted in the scale computation. Their scale value is set to NaN in the output.
  • Reduce the memory footprint of the binary embeddings saved by gds.beta.hashgnn.mutate.
  • Promote random forest classifier to beta tier. Added gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest which replace gds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest that are now deprecated.
  • Reduced memory allocation for the Spanning Tree algorithm.
  • A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
  • Improve runtime of gds.alpha.hits for concurrency > 1 due to a better partitioning.
  • Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
    • FastRP
    • HashGNN
    • Leiden
    • Approxmaxkcut
    • Conductance
    • LinkPrediction training
    • ToUndirected
  • Improve parallel runtime of gds.beta.graph.project.subgraph when filtering relationships due to a better partitioning.
  • Improve parallel runtime of gds.beta.pipeline.linkPrediction.predict if sampleRate = 0 due to a better partitioning.
  • Improve memory usage when projecting very large graphs with very high degree nodes.
  • Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.

2.3.4

27 Apr 10:34

Choose a tag to compare

Bug fixes

  • gds.beta.pipeline.linkPrediction.train sampled relationships now only contain valid node ids and will avoid ArrayIndexOutOfBoundException during training.

Graph Data Science 2.3.3

21 Apr 13:10

Choose a tag to compare

New features

Neo4j Database Compatibility

  • This release is compatible with all Neo4j 5.x database version <= 5.7.0. Please see our compatibility matrix above.

  • Added includeGraphs parameter to gds.alpha.backup to allow backups without graphs.

Bug fixes

  • Multiclass node classification compatible with non-consecutive class ids
  • RandomWalk stable on multiple runs (user contribution by github user hindog)

Improvements

  • Make gds.alpha.restore more failsafe
    • Continue to restore graphs and models also after the first failure for a user.
    • Improve logging around failures

Full Changelog: 2.3.2...2.3.3

Graph Data Science 2.3.2

11 Apr 10:18

Choose a tag to compare

GDS 2.3.2 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9).

For GDS compatibility with previous releases, please use GDS Compatibility Table.

New features

Neo4j Database Compatibility

  • This release is compatible with all Neo4j 5.x database version <= 5.6.0. Please see our compatibility matrix above.

Bug fixes

  • Graphs imported via Arrow no longer cause invalid node mappings that produced ArrayIndexOutOfBoundsExceptions
  • Correct memory estimation of Leiden for very small graphs
  • KNN no longer result in an AIOOB exception if the array node properties did not exist for some nodes
  • CELF no longer returns negative gains for some nodes
  • GraphSage will no longer return NaN values because of incorrect neighbor sampling

Improvements

  • More accurate memory estimation on Node Similarity and filtered Node Similarity algorithms for high topN or topK values.
  • The gds.alpha.modularity procedures for computing modularity no longer require each community to be smaller than the size of the graph.
  • Improve the progress logging of gds.graph.project.cypher to be more accurate. Especially, this avoids underestimating when the relationship query is more complex.

Graph Data Science 2.3.1

16 Feb 15:42

Choose a tag to compare

GDS 2.3.1 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9) & 4.3 versions (≥ 4.3.15) Database.

For GDS compatibility with previous releases, please use GDS Compatibility Table.

New features

Neo4j Database Compatibility

  • This release is compatible with all Neo4j 5.x database version <= 5.5.0. Please see our compatibility matrix above.

Log Progress

  • New optional configuration parameter logProgress allows you to specify whether percentage logging for that procedural call is on or off.

Bug fixes

  • Louvain no longer reports the incorrect modularity
  • Leiden on weighted graphs communities are now reported correctly
  • Persisted Models no longer cause false positive error logs when loaded into the Model Catalog
  • Yens on graphs without parallel relationships would cause issues

Improvements

  • Filtered Node Similarity progress logging has been improved