The CUCM and CUC Publisher in my lab was crashed due to a disk failure. Luckily my subscribers are in different LUN, and at least I don't need to rebuild the whole cluster. This is what I have done and I want to share my experience and hiccup during the rebuilt in this post.
UCM Publisher Rebuilt
For UCM I am following this guide and it is a well written one. This is what I have done based on the guide.
1. Gather Cluster Data on Subscriber
2 commands – show network cluster and show version active to get the existing cluster info
2. Stop DB Replication on all subscribers
This is important, you will not want the new publisher sync the NEW database with your existing one in subscriber. You want the other way round, so stop the dbreplication service.
3. Install the new CUCM Publisher with the same hostname, IP address, domain name, security passphrase, exact UCM version and installed COP files
Install it with a bootable media.
4. Update Processnode Values on the Publisher
I am running 10.5(2), therefore I need to issue the command "utils diaster_recovery prepare restore pub_from_sub" command on the new publisher CLI before adding nodes to System > Server
Retrieve the node list from the existing subscriber – run sql select name,description,nodeid from processnode
Go the the Publisher UCM Admin Page, add the node after you receive the node list.
5. Reboot Publisher
Using the command "utils system restart"
6. Verify Cluster Authentication
Do it on publisher after it restarts, make sure the cluster in the "authenticated" state.
7. Perform a new backup
Add a Backup Device, I am using a linux machine to store the backup.
Start a manual backup
8. Publisher Restore from the Subscriber DB
I have encountered an issue during restore with the error message - "Unable to send network request to master agent. This may be due to Master or Local Agent being down".
I have tried a few things
- Regenerate ipsec cert and restart DRF master and local agent – it doesn't work
Solution
- Remove cup1 and cup2 in Server list on publisher UCM admin page. Then it works. DRF requires all host up and running in the server list. One of my CUP node is not responding (due to my disk LUN failure)
Check the Publisher node check box (UCM1) and choose the subscriber DB from which restoration takes place, in my case UCM2, then click Restore.
9. Restore Status
When the restoration reaches the CCMDB component, the status text shows "Restoring Publisher from Subscriber Backup"
10. Run a Sanity Check on the Publisher DB
These 2 SQL statements will give you a gut feeling if the DB restore works or not.
11. Reboot the Cluster after restore
12. Verify Replication Setup
13. Post Restore
Activate services and install device packs
CUC Publisher Rebuilt
Steps for CUC Publisher Rebuilt are similar.
1. Gather Cluster Data
2. Stop Replication on All Subscribers
3. Install the CUC Publisher
4. Update Processnode Values on the Publisher
5. Reboot the Publisher Node
6. Verify Cluster Authentication
7. To Connect the Subscriber Server to the New Connection Cluster, and Replicate Data and Messages to the Publisher Server
This step is different. We are not using DRS to do the DB restore. Run the command "utils cuc cluster renegotiate" on subscriber
The publisher server will automatically restarts.
"show cuc cluster status" on subscriber to verify new cluster has been configured correctly.
Good luck!