Releases · neo4j/graph-data-science

24 Aug 15:20

gminneci

2.4.5

183d62e

2.4.5

`neo4j-graph-data-science-2.4.5`

Bug fixes

Fix a bug in the triangle-related procedures with on graphs with multiple relationship types where triangles could be computed incorrectly. The following procedures are affected:
- gds.triangleCount.[stream|mutate|write|stats]
- gds.localClusteringCoefficient.[stream|mutate|write|stats]
- gds.alpha.triangles

Assets 4

17 Aug 13:01

jjaderberg

2.4.4

42f5ffc

Graph Data Science 2.4.4

Bug fixes

Fixed a bug where arrow processes that are automatically removed when they were aborted would not be properly cleaned up

Assets 4

27 Jul 12:30

jjaderberg

2.4.3

fa1c2c9

Graph Data Science 2.4.3

Improvements

Added COSINE as an available similarityMetric for the gds.nodeSimilarity procedure
When exporting graphs to CSV or using backup and restore, a more diverse node label naming is now possible by using label mapping

Bug fixes

Fixed a bug where array default values would not be serialized or deserialized to csv correctly
Fixed an issue where Speaker-Listener LabelPropagation and other Pregel procedures wouldn’t stream or mutate on graphs that are not persisted in a Neo4j database
Fixed a bug in graph restore on AuraDS, which was failing after shutdown when node label name contained special characters or underscores

Assets 4

27 Jun 11:41

gminneci

2.4.1

180a2ed

Graph Data Science 2.4.1

Bug fixes

Fix a bug in K-Core decomposition that can return invalid values if core values are not consecutive.
Fix a bug when using mutateProperty where using the same name as an existing node property could fail. Affected procedures include:
- gds.alpha.knn.filtered.mutate
- gds.alpha.nodeSimilarity.filtered.mutate
- gds.beta.pipeline.linkPrediction.predict.mutate
- gds.beta.steinerTree.mutate
- gds.beta.spanningTree.mutate
- gds.knn.mutate
- gds.nodeSimilarity.mutate

Improvements

Improved error handling when negative node ids are used as input in the sourceNode, targetNode, sourceNodes, and targetNodes fields.
Improved performance when projecting in-memory graphs when projecting larger graphs.

Assets 4

14 Jun 15:39

gminneci

2.4.0

272fce3

Graph Data Science 2.4.0

Breaking changes

Pass concurrency when training a pipeline to the node property steps. Before they were executed with the default concurrency of 4 if not overridden. This affects
- gds.beta.pipeline.linkPrediction.train
- gds.beta.pipeline.nodeClassification.train
- gds.alpha.pipeline.nodeClassification.train

New features

Major

Added Bellman-Ford algorithm
Added K-Core Decomposition algorithm
Added new Common Neighbour Aware Random Walk graph sampling algorithm
Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable

Minor

You can rename node properties when writing them back to the neo4j database using gds.nodeProperties.write by placing them inside a map in the form nodeProperty: 'renamedProperty'.
Added minCommunitySize|minComponentSize parameter to more procedures to allow filtering the result. (Contributed by @airtyon)
Added new procedure gds.alpha.drop.cypherdb to drop created in-memory databases
Added upperDegreeCutoff parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value.
Added aggregation to gds.beta.toUndirected to allow the aggregation of the new undirected relationships.
Added new optional parameter storeModelToDisk that automatically saves serializable models after training for licensed users. This affects gds.beta.pipeline.[linkPrediction|nodeClassification].train and gds.beta.graphSage.train.
Added procedure gds.graph.relationshipProperties.write that allows writing relationships with multiple properties to Neo4j.
Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
  - The existing 'Cypher projection' (gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The procedure name is losing the alpha qualifier and is now called gds.graph.project.
- The old name gds.alpha.graph.project is deprecated and usages will forward to the new name while also adapting to the new API.
- The 4th and 5th parameters nodeConfig and relationshipConfig have been merged into a single dataConfig parameter.
- The properties configuration key in this merged dataConfig parameter has been renamed to relationshipProperties.
- The overall projection configuration (e.g. readConcurrency) has moved from the 6th parameter to the 5th parameter.
Graph data retrieved via the GDS Arrow endpoint can now be partitioned via the FlightInfo endpoint.

Bug fixes

Fixed: Arrow server doesn't enable to project graphs with blank names anymore
Fixed: Arrow validates dangling relationships when creating an in-memory graph
Fixed: if an arrow process is aborted, creating a new process with the same name is now possible
Fixed a bug where gds.graph.export could fail when exporting larger graphs
Fixed a bug where gds.alpha.kSpanningTree returned incorrect results when called with the nodeLabels parameter.
Fixed a bug where gds.triangleCount would throw an ArrayIndexOutOfBoundsException when called with the nodeLabels parameter.
Fixed a bug where link prediction mutate results could fail when predicted probability is extremely close to zero.

Improvements

Major

Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
Improved partitioning. This affects the parallel runtime of gds.alpha.hits, gds.beta.graph.project.subgraph and gds.beta.pipeline.linkPrediction.predict if sampleRate = 0

Minor

Improve progress tracking for gds.beta.graphSage.train. This will enable progress bars on the python client.
Improve error message for invalid nodeLabels and relationshipTypes for procedures supporting memory estimation.
Allow running gds.debug.sysInfo and gds.debug.arrow to run against the system database.
Improve automatic conversion of array property values during graph projection.
The Yens algorithm can now be run in parallel.
The node regression now verifies upfront that the all targetProperty values provided are valid when calling gds.alpha.pipeline.nodeRegression.train.
The scale properties algorithm has been promoted:
- Added new procedures gds.scaleProperties.[stream,mutate] which replace gds.alpha.scaleProperties.[stream,mutate] that are now deprecated
  - The scalers L1Norm and L2Norm are not supported in the new procedures.
- Added new procedures gds.scaleProperties.[stats,write] to return statistics from a scale properties computation and write scaled properties back to a database respectively
- Procedures gds.scaleProperties.[mutate,stats,stream,write] support progress tracking with volumes. This will enable progress bars on the python client
- Procedures gds.scaleProperties.[mutate,stats,write] return statistics from the performed scale computation
- Added new parameter offset to the log scaler. This also affects procedures:
  - gds.pageRank
  - gds.eigenvector
  - gds.articleRank
- Added new procedures gds.scaleProperties.[mutate|stats|stream|write].estimate for estimating the memory requirements of running the scale properties algorithm
- Nodes with missing properties (null or NaN) are now omitted in the scale computation. Their scale value is set to NaN in the output.
Reduce the memory footprint of the binary embeddings saved by gds.beta.hashgnn.mutate.
Promote random forest classifier to beta tier. Added gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest which replace gds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest that are now deprecated.
Reduced memory allocation for the Spanning Tree algorithm.
A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
Improve memory usage when projecting very large graphs with very high degree nodes.
Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
The import of nodes with negative id via arrow into a database is now forbidden.
Graph restore now attempts to use the same id map implementation that has been used for the original graph.
Setting the useBadCollector option to true for the arrow database import will now actually trigger errors if the collector encountered a problem.

Contributors

airtyon

Assets 4

02 Jun 13:00

Mats-SX

2.4.0-alpha06

d6b2b95

Graph Data Science 2.4.0 PREVIEW Pre-release

Pre-release

Neo4j Graph Data Science version 2.4.0 is compatible with Neo4j version 4.4 and Neo4j versions 5.1 through 5.8.

Breaking changes

Pass concurreny when training a pipeline to the node property steps. Before they were executed with the default concurrency of 4 if not overridden. This affects
- gds.beta.pipeline.linkPrediction.train
- gds.beta.pipeline.nodeClassification.train
- gds.alpha.pipeline.nodeClassification.train

New features

You can rename node properties when writing them back to the neo4j database using gds.nodeProperties.write by placing them inside a map in the form nodeProperty: 'renamedProperty'.
Added minCommunitySize|minComponentSize parameter to more procedures to allow filtering the result. (Contributed by @airtyon) This includes:
- gds.wcc.stream
- gds.louvain.stream
- gds.labelPropagation.stream
- gds.beta.k1coloring.[stream|write]
- gds.beta.leiden.[stream|write]
- gds.beta.modularityOptimization.[stream|write]
- gds.alpha.maxkcut.stream
Added new procedure gds.alpha.drop.cypherdb to drop created in-memory databases
Added Bellman-Ford algorithm:
- gds.bellmanFord.stream
- gds.bellmanFord.stream.estimate
- gds.bellmanFord.stats
- gds.bellmanFord.stats.estimate
- gds.bellmanFord.mutate
- gds.bellmanFord.mutate.estimate
- gds.bellmanFord.write
- gds.bellmanFord.write.estimate
Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable. This affects gds.alpha.model.store and gds.alpha.model.load.
Added upperDegreeCutoff parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value.
Added aggregation to gds.beta.toUndirected to allow the aggregation of the new undirected relationships.
Added new optional parameter storeModelToDisk that automatically saves serializable models after training for licensed users. This affects gds.beta.pipeline.[linkPrediction|nodeClassification].train and gds.beta.graphSage.train.
Added K-Core Decomposition algorithm:
- gds.kcore.stats
- gds.kcore.stats.estimate
- gds.kcore.stream
- gds.kcore.stream.estimate
- gds.kcore.mutate
- gds.kcore.mutate.estimate
- gds.kcore.write
- gds.kcore.write.estimate
Added procedure gds.graph.relationshipProperties.write that allows writing relationships with multiple properties to Neo4j.
Added new Common Neighbour Aware Random Walk graph sampling algorithm gds.graph.sample.cnarw. Available under beta tier.
Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
  - The existing 'Cypher projection' (gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The procedure name is losing the alpha qualifier and is now called gds.graph.project.
- The old name gds.alpha.graph.project is deprecated and usages will forward to the new name while also adapting to the new API.
- The 4th and 5th parameters nodeConfig and relationshipConfig have been merged into a single dataConfig parameter.
- The properties configuration key in this merged dataConfig parameter has been renamed to relationshipProperties.
- The overall projection configuration (e.g. readConcurrency) has moved from the 6th parameter to the 5th parameter.

Bug fixes

Fixed: Arrow server doesn't enable to project graphs with blank names anymore
Fixed: Arrow validates dangling relationships when creating an in-memory graph.

Improvements

Improve progress tracking for gds.beta.graphSage.train. This will enable progress bars on the python client.
Improve error message for invalid nodeLabels and relationshipTypes for procedures supporting memory estimation.
Allow running gds.debug.sysInfo and gds.debug.arrow to run against the system database.
Improve automatic conversion of array property values during graph projection.
The Yens algorithm can now be run in parallel.
The node regression now verifies upfront that the all targetProperty values provided are valid when calling gds.alpha.pipeline.nodeRegression.train.
The scale properties algorithm has been promoted:
- Added new procedures gds.scaleProperties.[stream,mutate] which replace gds.alpha.scaleProperties.[stream,mutate] that are now deprecated
  - The scalers L1Norm and L2Norm are not supported in the new procedures.
- Added new procedures gds.scaleProperties.[stats,write] to return statistics from a scale properties computation and write scaled properties back to a database respectively
- Procedures gds.scaleProperties.[mutate,stats,stream,write] support progress tracking with volumes. This will enable progress bars on the python client
- Procedures gds.scaleProperties.[mutate,stats,write] return statistics from the performed scale computation
- Added new parameter offset to the log scaler. This also affects procedures:
  - gds.pageRank
  - gds.eigenvector
  - gds.articleRank
- Added new procedures gds.scaleProperties.[mutate|stats|stream|write].estimate for estimating the memory requirements of running the scale properties algorithm
- Nodes with missing properties (null or NaN) are now omitted in the scale computation. Their scale value is set to NaN in the output.
Reduce the memory footprint of the binary embeddings saved by gds.beta.hashgnn.mutate.
Promote random forest classifier to beta tier. Added gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest which replace gds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest that are now deprecated.
Reduced memory allocation for the Spanning Tree algorithm.
A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
Improve runtime of gds.alpha.hits for concurrency > 1 due to a better partitioning.
Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
Improve parallel runtime of gds.beta.graph.project.subgraph when filtering relationships due to a better partitioning.
Improve parallel runtime of gds.beta.pipeline.linkPrediction.predict if sampleRate = 0 due to a better partitioning.
Improve memory usage when projecting very large graphs with very high degree nodes.
Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.

Contributors

airtyon

Assets 4

27 Apr 10:34

gminneci

2.3.4

043e0b3

2.3.4

Bug fixes

gds.beta.pipeline.linkPrediction.train sampled relationships now only contain valid node ids and will avoid ArrayIndexOutOfBoundException during training.

Assets 4

21 Apr 13:10

gminneci

2.3.3

fe5d867

Graph Data Science 2.3.3

New features

Neo4j Database Compatibility

This release is compatible with all Neo4j 5.x database version <= 5.7.0. Please see our compatibility matrix above.
Added includeGraphs parameter to gds.alpha.backup to allow backups without graphs.

Bug fixes

Multiclass node classification compatible with non-consecutive class ids
RandomWalk stable on multiple runs (user contribution by github user hindog)

Improvements

Make gds.alpha.restore more failsafe
- Continue to restore graphs and models also after the first failure for a user.
- Improve logging around failures

Full Changelog: 2.3.2...2.3.3

Assets 4

11 Apr 10:18

Mats-SX

2.3.2

ca061d5

Graph Data Science 2.3.2

GDS 2.3.2 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9).

For GDS compatibility with previous releases, please use GDS Compatibility Table.

New features

Neo4j Database Compatibility

This release is compatible with all Neo4j 5.x database version <= 5.6.0. Please see our compatibility matrix above.

Bug fixes

Graphs imported via Arrow no longer cause invalid node mappings that produced ArrayIndexOutOfBoundsExceptions
Correct memory estimation of Leiden for very small graphs
KNN no longer result in an AIOOB exception if the array node properties did not exist for some nodes
CELF no longer returns negative gains for some nodes
GraphSage will no longer return NaN values because of incorrect neighbor sampling

Improvements

More accurate memory estimation on Node Similarity and filtered Node Similarity algorithms for high topN or topK values.
The gds.alpha.modularity procedures for computing modularity no longer require each community to be smaller than the size of the graph.
Improve the progress logging of gds.graph.project.cypher to be more accurate. Especially, this avoids underestimating when the relationship query is more complex.

Assets 4

16 Feb 15:42

laeg

2.3.1

412ed2a

Graph Data Science 2.3.1

GDS 2.3.1 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9) & 4.3 versions (≥ 4.3.15) Database.

For GDS compatibility with previous releases, please use GDS Compatibility Table.

New features

Neo4j Database Compatibility

This release is compatible with all Neo4j 5.x database version <= 5.5.0. Please see our compatibility matrix above.

Log Progress

New optional configuration parameter logProgress allows you to specify whether percentage logging for that procedural call is on or off.

Bug fixes

Louvain no longer reports the incorrect modularity
Leiden on weighted graphs communities are now reported correctly
Persisted Models no longer cause false positive error logs when loaded into the Model Catalog
Yens on graphs without parallel relationships would cause issues

Improvements

Filtered Node Similarity progress logging has been improved

Assets 4

Releases: neo4j/graph-data-science

2.4.5

neo4j-graph-data-science-2.4.5

Bug fixes

Uh oh!

Graph Data Science 2.4.4

Bug fixes

Uh oh!

Graph Data Science 2.4.3

Improvements

Bug fixes

Uh oh!

Graph Data Science 2.4.1

Bug fixes

Improvements

Uh oh!

Graph Data Science 2.4.0

Breaking changes

New features

Major

Minor

Bug fixes

Improvements

Major

Minor

Contributors

Uh oh!

Graph Data Science 2.4.0 PREVIEW

Breaking changes

New features

Bug fixes

Improvements

Contributors

Uh oh!

2.3.4

Bug fixes

Uh oh!

Graph Data Science 2.3.3

New features

Bug fixes

Improvements

Uh oh!

Graph Data Science 2.3.2

New features

Bug fixes

Improvements

Uh oh!

Graph Data Science 2.3.1

New features

Bug fixes

Improvements

Uh oh!

`neo4j-graph-data-science-2.4.5`