-
Notifications
You must be signed in to change notification settings - Fork 66
Description
What steps will reproduce the bug?
A stable nifi cluster of 2 nodes(nifi pods) is working fine on the eks cluster.
The leader election and state management happens via kubernetes (the same issue occured with nifikop version 1.10.0 with zookeeper for state management and leader election)
nifi-node-group-autoscalers and scaled-object is defined the way recommended in the official documentation with prometheus query and threashold set such that the triggers should set to true and false to test scale up and scale down respectively.
Whenever the scale up happens, only the nodes(nifi pods) that scales up goes into a boot loop as in each scaled up nifi pod stays stable for 7-10 mins and then automatically terminates itself and a new nifi pod with the same numeric id comes up.
Example:
NAME READY STATUS RESTARTS AGE
nifi-one-2-nodeshp5l 0/4 Terminated 0 12m
nifi-one-2-nodecdi6c 0/4 Container-init 0 2s
This makes the nifi cluster unstable.
Scaling down also is difficult as it leads to few orpaned pods.
Scaling down is tried via two ways
- First set the scaled object threashold such that triggers should become false
- Second delete/remove the scaled object as well as nifi-node-group-autoscalers
Both the methods lead to orphaned nodes in varying degrees, the second method is much worse than first method, but even first method doesn't help
Deleting the orphaned nodes become very difficult and they definitely leave back they statuses on nifi cluster crd object which is managed by nifikop
The nifi cluster is stable only and only when nifi nodes are defined in nifi cluster and no autoscaling is enabled.
The error in the nifikop logs:
{"level":"error","time":"2025-07-31T06:56:19.482Z","logger":"nifi_client","caller":"nificlient/system.go:52","msg":"Error during preparing the request","error":"The target node id doesn't exist in the cluster","errorVerbose":"The target node id doesn't exist in the cluster\ngithub.com/konpyutaika/nifikop/pkg/nificlient.init\n\t/workspace/pkg/nificlient/common.go:14\nruntime.doInit1\n\t/usr/local/go/src/runtime/proc.go:7353\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:7320\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:254\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700","stacktrace":"github.com/konpyutaika/nifikop/pkg/nificlient.(*nifiClient).GetClusterNode\n\t/workspace/pkg/nificlient/system.go:52\ngithub.com/konpyutaika/nifikop/pkg/clientwrappers/scale.CheckIfNCActionStepFinished\n\t/workspace/pkg/clientwrappers/scale/scale.go:151\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).checkNCActionStep\n\t/workspace/internal/controller/nificlustertask_controller.go:370\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).handlePodRunningTask\n\t/workspace/internal/controller/nificlustertask_controller.go:312\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).Reconcile\n\t/workspace/internal/controller/nificlustertask_controller.go:93\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224"}
{"level":"error","time":"2025-07-31T06:56:19.541Z","logger":"nifi_client","caller":"nificlient/system.go:126","msg":"Error during preparing the request","error":"The target node id doesn't exist in the cluster","errorVerbose":"The target node id doesn't exist in the cluster\ngithub.com/konpyutaika/nifikop/pkg/nificlient.init\n\t/workspace/pkg/nificlient/common.go:14\nruntime.doInit1\n\t/usr/local/go/src/runtime/proc.go:7353\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:7320\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:254\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700","stacktrace":"github.com/konpyutaika/nifikop/pkg/nificlient.(*nifiClient).setClusterNodeStatus\n\t/workspace/pkg/nificlient/system.go:126\ngithub.com/konpyutaika/nifikop/pkg/nificlient.(*nifiClient).ConnectClusterNode\n\t/workspace/pkg/nificlient/system.go:75\ngithub.com/konpyutaika/nifikop/pkg/clientwrappers/scale.ConnectClusterNode\n\t/workspace/pkg/clientwrappers/scale/scale.go:93\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).checkNCActionStep\n\t/workspace/internal/controller/nificlustertask_controller.go:415\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).handlePodRunningTask\n\t/workspace/internal/controller/nificlustertask_controller.go:312\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).Reconcile\n\t/workspace/internal/controller/nificlustertask_controller.go:93\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224"}
{"level":"error","time":"2025-07-31T06:56:19.542Z","logger":"nifi_client","caller":"nificlient/system.go:160","msg":"Connect node gracefully failed since Nifi node returned non 200 error since Nifi node returned non 200","error":"The target node id doesn't exist in the cluster","errorVerbose":"The target node id doesn't exist in the cluster\ngithub.com/konpyutaika/nifikop/pkg/nificlient.init\n\t/workspace/pkg/nificlient/common.go:14\nruntime.doInit1\n\t/usr/local/go/src/runtime/proc.go:7353\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:7320\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:254\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700","stacktrace":"github.com/konpyutaika/nifikop/pkg/nificlient.setClusterNodeStatusReturn\n\t/workspace/pkg/nificlient/system.go:160\ngithub.com/konpyutaika/nifikop/pkg/nificlient.(*nifiClient).ConnectClusterNode\n\t/workspace/pkg/nificlient/system.go:77\ngithub.com/konpyutaika/nifikop/pkg/clientwrappers/scale.ConnectClusterNode\n\t/workspace/pkg/clientwrappers/scale/scale.go:93\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).checkNCActionStep\n\t/workspace/internal/controller/nificlustertask_controller.go:415\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).handlePodRunningTask\n\t/workspace/internal/controller/nificlustertask_controller.go:312\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).Reconcile\n\t/workspace/internal/controller/nificlustertask_controller.go:93\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224"}
{"level":"error","time":"2025-07-31T06:56:19.542Z","logger":"scale-method","caller":"clientwrappers/common.go:17","msg":"could not communicate with nifi node","action":"Connect node gracefully","error":"The target node id doesn't exist in the cluster","errorVerbose":"The target node id doesn't exist in the cluster\ngithub.com/konpyutaika/nifikop/pkg/nificlient.init\n\t/workspace/pkg/nificlient/common.go:14\nruntime.doInit1\n\t/usr/local/go/src/runtime/proc.go:7353\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:7320\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:254\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700","stacktrace":"github.com/konpyutaika/nifikop/pkg/clientwrappers.ErrorUpdateOperation\n\t/workspace/pkg/clientwrappers/common.go:17\ngithub.com/konpyutaika/nifikop/pkg/clientwrappers/scale.ConnectClusterNode\n\t/workspace/pkg/clientwrappers/scale/scale.go:94\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).checkNCActionStep\n\t/workspace/internal/controller/nificlustertask_controller.go:415\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).handlePodRunningTask\n\t/workspace/internal/controller/nificlustertask_controller.go:312\ngithub.com/konpyutaika/nifikop/internal/controller.(*NifiClusterTaskReconciler).Reconcile\n\t/workspace/internal/controller/nificlustertask_controller.go:93\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224"}
{"level":"info","time":"2025-07-31T06:56:19.542Z","logger":"controller.NifiClusterTask","caller":"controller/controller_common.go:35","msg":"nifi cluster communication error for cluster nifi-one: The target node id doesn't exist in the cluster"}
{"level":"error","time":"2025-07-31T06:56:19.542Z","caller":"controller/controller.go:316","msg":"Reconciler error","controller":"nificluster","controllerGroup":"nifi.konpyutaika.com","controllerKind":"NifiCluster","nifiCluster":{"name":"nifi-one","namespace":"nifi"},"namespace":"nifi","name":"nifi-one","reconcileID":"63a0e21f-26e1-4066-a161-6885cbebd5bb","error":"nifi cluster communication error for cluster nifi-one: The target node id doesn't exist in the cluster","errorVerbose":"The target node id doesn't exist in the cluster\ngithub.com/konpyutaika/nifikop/pkg/nificlient.init\n\t/workspace/pkg/nificlient/common.go:14\nruntime.doInit1\n\t/usr/local/go/src/runtime/proc.go:7353\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:7320\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:254\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700\nnifi cluster communication error for cluster nifi-one","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224"}
What is the expected behavior?
The nifi nodes scale up and down smoothly without boot loop according to the triggers
Scale down and eviction of autoscaled nifi nodes should be smooth and easy without any status residuals in nifi cluster crd
What do you see instead?
Boot loop of all the nifi pods that have scaled up which leads to unstable nifi cluster
bad scale down with ophaned nifi pods
residual corrupted status of nifi pods in nifi cluster crd object even after force deletion of orphaned nifi pods
Possible solution
No response
NiFiKop version
v1.14.1
Golang version
golang 1.24.4
Kubernetes version
Client Version: v1.32.2
Kustomize Version: v5.5.0
Server Version: v1.30.13-eks-5d4a308
NiFi version
2.0.0M4
Additional context
No response