Releases: neo4j/graph-data-science
2.4.5
neo4j-graph-data-science-2.4.5
Bug fixes
- Fix a bug in the triangle-related procedures with on graphs with multiple relationship types where triangles could be computed incorrectly. The following procedures are affected:
gds.triangleCount.[stream|mutate|write|stats]gds.localClusteringCoefficient.[stream|mutate|write|stats]gds.alpha.triangles
Graph Data Science 2.4.4
Bug fixes
- Fixed a bug where arrow processes that are automatically removed when they were aborted would not be properly cleaned up
Graph Data Science 2.4.3
Improvements
- Added COSINE as an available similarityMetric for the gds.nodeSimilarity procedure
- When exporting graphs to CSV or using backup and restore, a more diverse node label naming is now possible by using label mapping
Bug fixes
- Fixed a bug where array default values would not be serialized or deserialized to csv correctly
- Fixed an issue where Speaker-Listener LabelPropagation and other Pregel procedures wouldn’t stream or mutate on graphs that are not persisted in a Neo4j database
- Fixed a bug in graph restore on AuraDS, which was failing after shutdown when node label name contained special characters or underscores
Graph Data Science 2.4.1
Bug fixes
- Fix a bug in K-Core decomposition that can return invalid values if core values are not consecutive.
- Fix a bug when using
mutatePropertywhere using the same name as an existing node property could fail. Affected procedures include:gds.alpha.knn.filtered.mutategds.alpha.nodeSimilarity.filtered.mutategds.beta.pipeline.linkPrediction.predict.mutategds.beta.steinerTree.mutategds.beta.spanningTree.mutategds.knn.mutategds.nodeSimilarity.mutate
Improvements
- Improved error handling when negative node ids are used as input in the
sourceNode,targetNode,sourceNodes, andtargetNodesfields. - Improved performance when projecting in-memory graphs when projecting larger graphs.
Graph Data Science 2.4.0
Breaking changes
- Pass
concurrencywhen training a pipeline to the node property steps. Before they were executed with the default concurrency of4if not overridden. This affectsgds.beta.pipeline.linkPrediction.traingds.beta.pipeline.nodeClassification.traingds.alpha.pipeline.nodeClassification.train
New features
Major
- Added Bellman-Ford algorithm
- Added K-Core Decomposition algorithm
- Added new Common Neighbour Aware Random Walk graph sampling algorithm
- Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable
Minor
-
You can rename node properties when writing them back to the neo4j database using
gds.nodeProperties.writeby placing them inside a map in the formnodeProperty: 'renamedProperty'. -
Added
minCommunitySize|minComponentSizeparameter to more procedures to allow filtering the result. (Contributed by @airtyon) -
Added new procedure
gds.alpha.drop.cypherdbto drop created in-memory databases -
Added
upperDegreeCutoffparameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value. -
Added
aggregationtogds.beta.toUndirectedto allow the aggregation of the new undirected relationships. -
Added new optional parameter
storeModelToDiskthat automatically saves serializable models after training for licensed users. This affectsgds.beta.pipeline.[linkPrediction|nodeClassification].trainandgds.beta.graphSage.train. -
Added procedure
gds.graph.relationshipProperties.writethat allows writing relationships with multiple properties to Neo4j. -
Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
- The existing 'Cypher projection' (
gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The existing 'Cypher projection' (
- The procedure name is losing the
alphaqualifier and is now calledgds.graph.project. - The old name
gds.alpha.graph.projectis deprecated and usages will forward to the new name while also adapting to the new API. - The 4th and 5th parameters
nodeConfigandrelationshipConfighave been merged into a singledataConfigparameter. - The
propertiesconfiguration key in this mergeddataConfigparameter has been renamed torelationshipProperties. - The overall projection configuration (e.g.
readConcurrency) has moved from the 6th parameter to the 5th parameter.
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
-
Graph data retrieved via the GDS Arrow endpoint can now be partitioned via the
FlightInfoendpoint.
Bug fixes
- Fixed: Arrow server doesn't enable to project graphs with blank names anymore
- Fixed: Arrow validates dangling relationships when creating an in-memory graph
- Fixed: if an arrow process is aborted, creating a new process with the same name is now possible
- Fixed a bug where
gds.graph.exportcould fail when exporting larger graphs - Fixed a bug where
gds.alpha.kSpanningTreereturned incorrect results when called with thenodeLabelsparameter. - Fixed a bug where
gds.triangleCountwould throw an ArrayIndexOutOfBoundsException when called with thenodeLabelsparameter. - Fixed a bug where link prediction mutate results could fail when predicted probability is extremely close to zero.
Improvements
Major
- Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
- Improved partitioning. This affects the parallel runtime of
gds.alpha.hits,gds.beta.graph.project.subgraphandgds.beta.pipeline.linkPrediction.predictifsampleRate = 0
Minor
- Improve progress tracking for
gds.beta.graphSage.train. This will enable progress bars on the python client. - Improve error message for invalid
nodeLabelsandrelationshipTypesfor procedures supporting memory estimation. - Allow running
gds.debug.sysInfoandgds.debug.arrowto run against the system database. - Improve automatic conversion of array property values during graph projection.
- The Yens algorithm can now be run in parallel.
- The node regression now verifies upfront that the all
targetPropertyvalues provided are valid when callinggds.alpha.pipeline.nodeRegression.train. - The scale properties algorithm has been promoted:
- Added new procedures
gds.scaleProperties.[stream,mutate]which replacegds.alpha.scaleProperties.[stream,mutate]that are now deprecated- The scalers
L1NormandL2Normare not supported in the new procedures.
- The scalers
- Added new procedures
gds.scaleProperties.[stats,write]to return statistics from a scale properties computation and write scaled properties back to a database respectively - Procedures
gds.scaleProperties.[mutate,stats,stream,write]support progress tracking with volumes. This will enable progress bars on the python client - Procedures
gds.scaleProperties.[mutate,stats,write]return statistics from the performed scale computation - Added new parameter
offsetto thelogscaler. This also affects procedures:gds.pageRankgds.eigenvectorgds.articleRank
- Added new procedures
gds.scaleProperties.[mutate|stats|stream|write].estimatefor estimating the memory requirements of running the scale properties algorithm - Nodes with missing properties (
nullorNaN) are now omitted in the scale computation. Their scale value is set toNaNin the output.
- Added new procedures
- Reduce the memory footprint of the binary embeddings saved by
gds.beta.hashgnn.mutate. - Promote random forest classifier to beta tier. Added
gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForestwhich replacegds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForestthat are now deprecated. - Reduced memory allocation for the Spanning Tree algorithm.
- A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
- Improve memory usage when projecting very large graphs with very high degree nodes.
- Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
- The import of nodes with negative id via arrow into a database is now forbidden.
- Graph restore now attempts to use the same id map implementation that has been used for the original graph.
- Setting the
useBadCollectoroption to true for the arrow database import will now actually trigger errors if the collector encountered a problem.
Graph Data Science 2.4.0 PREVIEW
Neo4j Graph Data Science version 2.4.0 is compatible with Neo4j version 4.4 and Neo4j versions 5.1 through 5.8.
Breaking changes
- Pass
concurrenywhen training a pipeline to the node property steps. Before they were executed with the default concurrency of4if not overridden. This affectsgds.beta.pipeline.linkPrediction.traingds.beta.pipeline.nodeClassification.traingds.alpha.pipeline.nodeClassification.train
New features
- You can rename node properties when writing them back to the neo4j database using
gds.nodeProperties.writeby placing them inside a map in the formnodeProperty: 'renamedProperty'. - Added
minCommunitySize|minComponentSizeparameter to more procedures to allow filtering the result. (Contributed by @airtyon) This includes:gds.wcc.streamgds.louvain.streamgds.labelPropagation.streamgds.beta.k1coloring.[stream|write]gds.beta.leiden.[stream|write]gds.beta.modularityOptimization.[stream|write]gds.alpha.maxkcut.stream
- Added new procedure
gds.alpha.drop.cypherdbto drop created in-memory databases - Added Bellman-Ford algorithm:
gds.bellmanFord.streamgds.bellmanFord.stream.estimategds.bellmanFord.statsgds.bellmanFord.stats.estimategds.bellmanFord.mutategds.bellmanFord.mutate.estimategds.bellmanFord.writegds.bellmanFord.write.estimate
- Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable. This affects
gds.alpha.model.storeandgds.alpha.model.load. - Added
upperDegreeCutoffparameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value. - Added
aggregationtogds.beta.toUndirectedto allow the aggregation of the new undirected relationships. - Added new optional parameter
storeModelToDiskthat automatically saves serializable models after training for licensed users. This affectsgds.beta.pipeline.[linkPrediction|nodeClassification].trainandgds.beta.graphSage.train. - Added K-Core Decomposition algorithm:
gds.kcore.statsgds.kcore.stats.estimategds.kcore.streamgds.kcore.stream.estimategds.kcore.mutategds.kcore.mutate.estimategds.kcore.writegds.kcore.write.estimate
- Added procedure
gds.graph.relationshipProperties.writethat allows writing relationships with multiple properties to Neo4j. - Added new Common Neighbour Aware Random Walk graph sampling algorithm
gds.graph.sample.cnarw. Available underbetatier. - Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
- The existing 'Cypher projection' (
gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The existing 'Cypher projection' (
- The procedure name is losing the
alphaqualifier and is now calledgds.graph.project. - The old name
gds.alpha.graph.projectis deprecated and usages will forward to the new name while also adapting to the new API. - The 4th and 5th parameters
nodeConfigandrelationshipConfighave been merged into a singledataConfigparameter. - The
propertiesconfiguration key in this mergeddataConfigparameter has been renamed torelationshipProperties. - The overall projection configuration (e.g.
readConcurrency) has moved from the 6th parameter to the 5th parameter.
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
Bug fixes
- Fixed: Arrow server doesn't enable to project graphs with blank names anymore
- Fixed: Arrow validates dangling relationships when creating an in-memory graph.
Improvements
- Improve progress tracking for
gds.beta.graphSage.train. This will enable progress bars on the python client. - Improve error message for invalid
nodeLabelsandrelationshipTypesfor procedures supporting memory estimation. - Allow running
gds.debug.sysInfoandgds.debug.arrowto run against the system database. - Improve automatic conversion of array property values during graph projection.
- The Yens algorithm can now be run in parallel.
- The node regression now verifies upfront that the all
targetPropertyvalues provided are valid when callinggds.alpha.pipeline.nodeRegression.train. - The scale properties algorithm has been promoted:
- Added new procedures
gds.scaleProperties.[stream,mutate]which replacegds.alpha.scaleProperties.[stream,mutate]that are now deprecated- The scalers
L1NormandL2Normare not supported in the new procedures.
- The scalers
- Added new procedures
gds.scaleProperties.[stats,write]to return statistics from a scale properties computation and write scaled properties back to a database respectively - Procedures
gds.scaleProperties.[mutate,stats,stream,write]support progress tracking with volumes. This will enable progress bars on the python client - Procedures
gds.scaleProperties.[mutate,stats,write]return statistics from the performed scale computation - Added new parameter
offsetto thelogscaler. This also affects procedures:gds.pageRankgds.eigenvectorgds.articleRank
- Added new procedures
gds.scaleProperties.[mutate|stats|stream|write].estimatefor estimating the memory requirements of running the scale properties algorithm - Nodes with missing properties (
nullorNaN) are now omitted in the scale computation. Their scale value is set toNaNin the output.
- Added new procedures
- Reduce the memory footprint of the binary embeddings saved by
gds.beta.hashgnn.mutate. - Promote random forest classifier to beta tier. Added
gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForestwhich replacegds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForestthat are now deprecated. - Reduced memory allocation for the Spanning Tree algorithm.
- A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
- Improve runtime of
gds.alpha.hitsfor concurrency > 1 due to a better partitioning. - Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
- Improve parallel runtime of
gds.beta.graph.project.subgraphwhen filtering relationships due to a better partitioning. - Improve parallel runtime of
gds.beta.pipeline.linkPrediction.predictifsampleRate = 0due to a better partitioning. - Improve memory usage when projecting very large graphs with very high degree nodes.
- Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
2.3.4
Graph Data Science 2.3.3
New features
Neo4j Database Compatibility
-
This release is compatible with all Neo4j 5.x database version <=
5.7.0. Please see our compatibility matrix above. -
Added
includeGraphsparameter togds.alpha.backupto allow backups without graphs.
Bug fixes
- Multiclass node classification compatible with non-consecutive class ids
- RandomWalk stable on multiple runs (user contribution by github user hindog)
Improvements
- Make
gds.alpha.restoremore failsafe- Continue to restore graphs and models also after the first failure for a user.
- Improve logging around failures
Full Changelog: 2.3.2...2.3.3
Graph Data Science 2.3.2
GDS 2.3.2 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9).
For GDS compatibility with previous releases, please use GDS Compatibility Table.
New features
Neo4j Database Compatibility
- This release is compatible with all Neo4j 5.x database version <=
5.6.0. Please see our compatibility matrix above.
Bug fixes
- Graphs imported via Arrow no longer cause invalid node mappings that produced
ArrayIndexOutOfBoundsExceptions - Correct memory estimation of Leiden for very small graphs
- KNN no longer result in an AIOOB exception if the array node properties did not exist for some nodes
- CELF no longer returns negative gains for some nodes
- GraphSage will no longer return NaN values because of incorrect neighbor sampling
Improvements
- More accurate memory estimation on Node Similarity and filtered Node Similarity algorithms for high topN or topK values.
- The
gds.alpha.modularityprocedures for computing modularity no longer require each community to be smaller than the size of the graph. - Improve the progress logging of
gds.graph.project.cypherto be more accurate. Especially, this avoids underestimating when the relationship query is more complex.
Graph Data Science 2.3.1
GDS 2.3.1 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9) & 4.3 versions (≥ 4.3.15) Database.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
New features
Neo4j Database Compatibility
- This release is compatible with all Neo4j 5.x database version <=
5.5.0. Please see our compatibility matrix above.
Log Progress
- New optional configuration parameter
logProgressallows you to specify whether percentage logging for that procedural call is on or off.
Bug fixes
- Louvain no longer reports the incorrect modularity
- Leiden on weighted graphs communities are now reported correctly
- Persisted Models no longer cause false positive error logs when loaded into the Model Catalog
- Yens on graphs without parallel relationships would cause issues
Improvements
- Filtered Node Similarity progress logging has been improved