Error in running qlogin or qsub and job cannot be thrown

Asked 1 years ago, Updated 1 years ago, 99 views

I'm having a hard time because I can't throw a job because of an error when I run qlogin or qsub.
Also, although the conditions are unknown, in both cases they may succeed in rare cases.

If the qlogin command fails

 [aiueo@hostname sim]$ qlogin
Your job 865534 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled...timeout(5s)expired while waiting on socket fd4

Could not start interactive job.

qstat Results

 [aiueo@hostname sim]$ qstat-j865565
==============================================================
job_number —865565
jclass —NONE
exec_file —job_scripts/865565
submission_time:Thu Aug 17 11:52:26 2017
owner —aiueo
uid: 3021
group —nca
gid —3000
sge_o_home: /home/aiueo
sge_o_log_name —aiueo
sge_o_path:/opt/uge/bin/lx-amd64:/home/aiueo/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:/opt/dell/srvadmin/bin:/home/bin:
sge_o_shell: /bin/bash
sge_o_workdir: /home/aiueo/Documents/underconnectivity/WT/sim/0_002/1
sge_o_host —Prince
account:sge
cwd: /home/aiueo/Documents/underconnectivity/WT/sim/0_002/1
mail_list —[email protected]
notify —FALSE
job_name —v-0_002 | t-1
jobshare:0
hard_queue_list —long.q
shell_list —NONE: /bin/bash
env_list:                   
script_file —runCluster.sh
binding —NONE
mbind —NONE
error reason1: can't get password entry for user "aiueo". Either the user does not exist or NIS error!

Add

Environmental information.

$cat/etc/redhat-release
Red Hat Enterprise Linux Server Release 6.2 (Santiago)
$ ypcat passwd | grepaiueo
aiueo: $1$omH7y8kL$DVBof4qaF94JmeEZaXcOxP0:3021:3000::/data03/home/aiueo:/bin/bash

Additional information
Occasionally successful, the message for success is as follows:

Your job 865535("QLOGIN") has been submitted waiting for interactive job to be scheduled...Your interactive job 865535 has been successfully scheduled.Established/opt/uge/default/common/qlogin.sh session to host node27.local.

$qacct-j865565 | grep hostname
hostname node20.local

matlab

2022-09-30 17:17

1 Answers

At the end of the qstat run, the password entry for the "aiueo" user is missing (or NIS error).

If the host is running from a different source and destination, check to see if there are "aiueo" users in the destination as well.

add
I think the reason why there are cases of success and failure every time I run it is because there are multiple destinations for the job, and some of the hosts are unable to properly take over account information (aiueo).

·First of all, please narrow down which host experiences the problem based on the job information (qstat, qact, etc.) in case of failure.
·Once the host is narrowed down, please check if you can log in to the host in question with the same account (aiueo) from which you put the job.For example, can you log in to ssh?
·If you are unable to log in, please try adding account information or registering NIS on the host in question, or contact your system administrator.

At least 2 "Job Source" or "Destination" and "Successful" depending on the destinationSince there should be more than three hosts in total for "Failed", first list them and see how ypcat passwd|grepaiueo displays the results on each host.


2022-09-30 17:17

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.