解决Flink 1.16版本报错:java.lang.ClassNotFoundException: org.apache.flink.metrics.prometheus.PrometheusReporter

Flink 1.16版本启动任务时,发现任务日志打印以下报错内容:

java.lang.ClassNotFoundException: org.apache.flink.metrics.prometheus.PrometheusReporter
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_152]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_152]
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) ~[?:1.8.0_152]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_152]
        at java.lang.Class.forName0(Native Method) ~[?:1.8.0_152]
        at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_152]
        at org.apache.flink.runtime.metrics.ReporterSetup.loadViaReflection(ReporterSetup.java:457) ~[flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.metrics.ReporterSetup.loadReporter(ReporterSetup.java:410) ~[flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.metrics.ReporterSetup.setupReporters(ReporterSetup.java:328) ~[flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.metrics.ReporterSetup.fromConfiguration(ReporterSetup.java:209) ~[flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createMetricRegistry(ClusterEntrypoint.java:458) ~[flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:397) ~[flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:282) ~[flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:232) ~[flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) [flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:229) [flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:729) [flink-dist-1.16.3.jar:1.16.3]
        at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:86) [flink-kubernetes_2.12-1.16.3.jar:1.16.3]

且curl请求本地9250端口不成功:

curl localhost:9250
curl: (7) Failed to connect to ::1: Cannot assign requested address

jar tf xxx.jar 执行该命令分析jar包中的依赖发现存在org.apache.flink.metrics.prometheus.PrometheusReporter类,最终经过一系列排查,在日志报错附近中发现这样一段WARN:

WARN  org.apache.flink.runtime.metrics.ReporterSetup               [] - The reporter configuration of 'prom' configures the reporter class, which is a deprecated approach to configure reporters. The used reporter might not support this configuration, which would cause errors while loading the reporter. Please configure a factory class instead: 'metrics.reporter.prom.factory.class: <factoryClass>' to ensure that the configuration continues to work with future versions.
2024-12-16 15:28:34,419 ERROR org.apache.flink.runtime.metrics.ReporterSetup               [] - Could not instantiate metrics reporter prom. Metrics might not be exposed/reported.

提供我们说使用prom配置reporter类的方式已经过时了,为了适配未来的新版本,需要通过metrics.reporter.prom.factory.class: <factoryClass>的方式来配置。

为此,我尝试按上述要求修改flink-conf.yaml文件:

#metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory

再次执行发现上面的报表消失了!取而代之的是下面的日志输出:

2024-12-25 11:52:41,317 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: metrics.reporter.prom.port, 9250-9260
2024-12-25 11:52:41,321 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: metrics.reporter.prom.factory.class, org.apache.flink.metrics.prometheus.PrometheusReporterFactory
2024-12-25 11:52:47,906 INFO  org.apache.flink.metrics.prometheus.PrometheusReporter       [] - Started PrometheusReporter HTTP server on port 9250.
2024-12-25 11:52:47,993 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl          [] - Reporting metrics for reporter prom of type org.apache.flink.metrics.prometheus.PrometheusReporter.
2024-12-25 11:52:55,893 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: metrics.reporter.prom.port, 9250-9260
2024-12-25 11:52:55,894 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: metrics.reporter.prom.factory.class, org.apache.flink.metrics.prometheus.PrometheusReporterFactory

至此,通过修改上面的flink-conf.yaml的report配置方式,我们成功解决该报错问题。

相同问题解决链接:

https://lists.apache.org/thread/yjv0hof5qqnzq22xcjf3y2v61j48gqh4

Logo

技术共进,成长同行——讯飞AI开发者社区

更多推荐