Blob Blame History Raw
From 73b120fd6d0cd8215f3857c034cc7d4584c8ee05 Mon Sep 17 00:00:00 2001
From: Yaniv Bronhaim <ybronhei@redhat.com>
Date: Thu, 14 Feb 2013 13:46:06 +0200
Subject: [PATCH 30/30] After fail to connect to supervdsm more than 3 time
 vdsm gets into panic

Due to race between old supervdsm instance to the new instance after
prepareForShutdown, sometimes the socket is removed after
new supervdsm started to listen on it.
_pokeParent thread unlink the socket when distinguish that vdsm is dead.
This can take more time than the time that takes to vdsm to startup and
start the new instance of supervdsm. The unlink removes the socket file
and vdsm cannot communicate with supervdsm.
When the communication fails, vdsm calls panic and restart itself, this
will start supervdsm again as needed.

Change-Id: Iafe112893a76686edd2949d4f40b734646fd74df
Bug-Id: https://bugzilla.redhat.com/show_bug.cgi?id=910005
Signed-off-by: Yaniv Bronhaim <ybronhei@redhat.com>
Reviewed-on: http://gerrit.ovirt.org/11932
Reviewed-by: Saggi Mizrahi <smizrahi@redhat.com>
Reviewed-by: Dan Kenigsberg <danken@redhat.com>
Reviewed-on: http://gerrit.ovirt.org/12053
---
 vdsm/supervdsm.py | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/vdsm/supervdsm.py b/vdsm/supervdsm.py
index 6a38076..1b6402d 100644
--- a/vdsm/supervdsm.py
+++ b/vdsm/supervdsm.py
@@ -194,7 +194,14 @@ class SuperVdsmProxy(object):
     def launch(self):
         self._firstLaunch = False
         self._start()
-        utils.retry(self._connect, Exception, timeout=60)
+        try:
+            # We retry 3 times to connect to avoid exceptions that are raised
+            # due to the process initializing. It might takes time to create
+            # the communication socket or other initialization methods take
+            # more time than expected.
+            utils.retry(self._connect, Exception, timeout=60)
+        except:
+            misc.panic("Couldn't connect to supervdsm")
 
     def __getattr__(self, name):
         return ProxyCaller(self, name)
-- 
1.8.1.2