通常spark程序提交通过bin/spark-submit.sh,而在shell中调用的是org.apache.spark.deploy.SparkSubmit,阅读代码如下:
1.SparkSubmit:main()
|-- val appArgs = SparkSubmitArguments(args)
|-- appArgs.action =>
SparkSubmitAction.SUBMIT => submit(appArgs)
// SparkSubmitAction是枚举,SUBMIT,KILL,REQUEST_STATUS
2.SparkSubmit:submit()
|--doRunMain()
|--判断代理用户是否为空:
2.1.为空创建代理用户 UserGroupInfomation.createProxyUser(args.proxyUser,UserGroupInfomation.getCurrentUser)
2.2已有或者不为空,则runMain()
3.SparkSubmit:runMain()
|--loader创建, spark.driver.userClassPathFirst 为false -> ChildFirstURLClassLoader
否则 MutableURLClassLoader
线程设置Thread.currentThread.getContextClassLoader(loader)
|--addJarToClasspath(jar,loader)
|--System.setProperty(key,value)
|-- mainClass = Class.forName(className,true,getContextOrSparkClassLoader)
//类装载器实例类,例如isSqlShell,则mainClass = org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver,
isThriftServer则 mainClass = org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
|--mainMethod = mainClass.getMethod("main",new Array[String](0).getClass)
//反射方法调用
4.调用自定义程序开发的类和main方法