Welcome to BrianDesmond.com Sign in | Join

One of my colleagues reported an issue at a customer this past weekend where every time he transferred FSMO roles, MOM would report that the MS DTC (distribution transaction coordinator) service had terminated unexpectedly on all the domain controllers in a domain at this customer. At this particular customer that bought us about 350 emails from MOM since the roles got transferred twice over the weekend in each domain. For reference, it's a highly distributed Windows Server 2003 SP1 environment with a mix of x86 and x64 installations.

A quick look at MOM & the event viewer on a suspect machine showed a standard event from the SCM, and an MSDTC event about a dc promotion/demotion:

Event Type:    Information
Event Source:    MSDTC
Event Category:    SVC
Event ID:    4145
Date:        8/5/2008
Time:        4:54:10 PM
User:        N/A
Computer:    SOMEDC

Description:
MS DTC has been notified that a DC Promotion/Demotion has happened. It is shutting down as a result.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

And from MOM:

Severity: Service Unavailable
Status: New
Source: Service Control Manager
Name: The service terminated unexpectedly.
Description: The Distributed Transaction Coordinator service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 1000 milliseconds: Restart the service.
Domain: ASIA
Agent: SOMEDC
Time: 8/2/2008 13:07:58
Owner:

(Yes the times don't match - just grabbed the first samples I could find)

I decided to take a look at this today, and bounced the FSMOs around until I determined that the PDCe specifically was the root cause here. I was able to kill the MSDTC service on every DC in the domain by moving the PDCe around. This ruled out that it wasn't some weird quirk from the patching activities that were going on this weekend when this symptom presented.

My first troubleshooting step was to see if the service was actually crashing, so, I setup Dr. Watson to collect full dumps and installed it as the default debugger on a problem machine and moved the PDCe. The service terminated as expected but nothing in Dr. Watson. This was annoying but not entirely unexpected. If a service just exits like a normal process does then we'd get this event too.

I xcopy'ed the x64 debug package to this particular machine from a utility box that had the debug tools installed (note you can just copy and paste the tools - no need to run the MSI) and fired up windbg. If you're following along at home, do this:

  1. Press F6 and find msdtc.exe and select it
  2. The debugger will break in and likely complain about symbols
  3. Issue a .symfix C:\symbols
  4. Issue a .reload
  5. Issue a g

At this point you're ready to go and when something interesting happens the debugger should break in (you'll know when the textbox at the bottom of the screen becomes enabled). I wanted to collect a process dump for this and I had a suspicion I knew what was happening, so, I also did this:

  1. Issue a bp ntdll!NtTerminateProcess

This tells the debugger to breakin on a call to NtTerminateProcess. As soon as I transferred the PDCe, my breakpoint got hit. I saved a dump (.dump /mf c:\msdtcissue.dmp) as this particular environment has Internet issues and I was having problems getting to the symbol server. When I opened it up on my workstation (Press Ctrl+D and browse in windbg), I located this stack (Issue a k):

11 Id: 1550.1590 Suspend: 1 Teb: 000007ff`fff9a000 Unfrozen
Child-SP RetAddr Call Site
00000000`0165fd78 00000000`77d5a316 ntdll!ZwTerminateProcess
00000000`0165fd80 000007ff`7fc4069b kernel32!ExitProcess+0x25
00000000`0165fed0 000007ff`7fc40863 msvcrt!_crtExitProcess+0x3b
00000000`0165ff00 000007ff`66fd25f5 msvcrt!cinit+0x143
00000000`0165ff40 00000000`77d6b69a msdtctm!DCPromoThreadFunction+0x124
00000000`0165ff80 00000000`00000000 kernel32!BaseThreadStart+0x3a

Note if you're wondering how to find the correct thread, you can issue a ~*k to dump the stack of every thread. To switch to the thread (11 here), you'd do a ~11s.

I got zero hits on this on Google searching for this, so I had a quick chat with a PSS friend who dug something up on this. It's a known bug in Windows Server 2003 and presently there's no QFE. MSDTC subscribes to dcpromo's (which I knew), but, because of the manner in which it does this, it also catches PDCe changes. This behavior is fixed in Windows Server 2008, though. If you've got a good reason you can call and try and make the case for a QFE, but, seeing as the service restarts straight away, I just am going to go tweak my monitoring so MOM ignores this.

Share this post: email it! | digg it! | bookmark it! | live it!

I discovered this afternoon that you can insert simple formulas in Microsoft Word tables, at least in Word 2007. This is really pretty helpful when you just need to do something simple like sum a column or row and don't want to build the table in Excel, copy it into Word, and then format it.

I had a table something like this:

 

Foo

Bar

Snafu

Total

X

Y

X

Y

X

Y

Day 1

.5

1.5

0

1

1.25

.5

 

Day 2

.5

1.5

0

1

1.25

.5

 

Day 3

.5

1.5

0

1

1.25

.5

 

Total

       

 

I needed to add totals for rows and columns so it would look like this:

In order to do this, click in one of the total cells, and then click the formula button on the Layout tab on the ribbon: .

You'll get a dialog something like this - it defaults to the SUM formula, and automatically figures out whether you want "ABOVE" or perhaps "LEFT". If you need a different function (like to take the average or something), use the Paste function combobox.

You'll note if you look at the screenshot of my resulting table that the sums are highlighted in gray. This is because I have Field Shading enabled in Word. When you print they won't actually be highlighted. One difference from Excel here is that the fields won't automatically recompute if you update one of the cells. In order to do this, you need to right click and select Update Field:

Cool feature - enjoy.

Share this post: email it! | digg it! | bookmark it! | live it!

So I'm working on a project which involves some documentation around disaster recovery and I flagged a word in the text for the technical reviewers to suggest a better word (BCD40). The feedback I got really made me laugh:

Share this post: email it! | digg it! | bookmark it! | live it!

I was complaining (more or less) several weeks ago about the amount of work it was going to take to upgrade my MCSE to the latest version. The good news is I went and did the Windows end of things and now I'm apparently an "MCITP: Enterprise Administrator". Now compared to MCSE, no HR person is going to know what that means. Most folks are trained to look for MCSE and it's going to be a long time before that mentality is changed I suspect. Personally if someone asks me if I have an MCSE or if I'm certified on Windows 2008, I may drop the MCITP acronym in, but "MCSE" and "2008" are definitely going in the response as that's what people look for.

I had to take three tests to convert my MCSE: Messaging 2003 to this MCITP: Enterprise Administrator:

  • 70-647 - "Pro: Windows Server 2008, Enterprise Administrator"
  • 70-649 - "TS: Upgrading Your MCSE on Windows Server 2003 to Windows Server 2008, Technology Specialist"
  • 70-620 - "TS: Configuring Microsoft Windows Vista Client"

I took 70-647 in beta in January, so I have no recollection of what it was like nor would it be a good representation of what it probably is now. 70-649 is really easy if you know what you're doing on 2003. 70-620 was easy enough, although it had simulation questions which I have never seen on a Microsoft exam before. I'm used to them from taking Cisco exams but never Microsoft. I struggled with these a bit simply because I have zero Vista experience and they were to do oddball things that you'd be stupid to ask me to do because I would just look at you funny. I own one Vista machine, a laptop, and I turn it on about four times a year, so that roughly translates to I own no Vista machines. I passed the exam with a pretty good score, so, I guess interpret that however you want.

Thanks to Russ who was standing in line when I was to take these things and unlike me he knew what tests I needed to take. Also thanks to the Prometric guy whose computer was particularly broken when I was trying to register for the wrong test and thus was able to rethink my plan with Russ' help.

Share this post: email it! | digg it! | bookmark it! | live it!

Scott's post prompted this post as I've found that unlike Scott, my recent upgrade to a third screen is really helpful. I run with two PCs under my desk. One PC is a simple Dell tower with 4GB of RAM, a dual core chip in it and a couple of SATA spindles - it's a few years old at this point I think [1]. This is what I call my personal machine - I do email, IM, and pretty much everything else on it - all my files are stored here. My second machine is a fairly high end Dell tower with dual Xeon chips, 8GB of RAM, and about 1TB of storage online in a RAID. I run a 64-bit OS on here and run numerous VMWare instances. I also have my company issued laptop on my desk sometimes sitting on the docking station when I need it for one reason or another.

On my VMWare machine, I have all of my lab and test VMs, and I also have one VM for each of my customers. By having a virtual machine for each customer, I gain a few things:

  • I keep customer data separate
  • My customer machines are portable - when I travel I copy the VMs I need to my laptop and I have everything I need to do work with that customer
  • I can be VPN'ed into multiple customers at once
  • I can run multiple types of VPN clients (right now I have four different kinds across my VMs)

Up until a couple weeks ago, I ran with two screens on my desk - Dell 19" flat panels. Each of these panels has two inputs, so, when I needed to move between machines, I would change the active input and toggle my KVM. This worked, but, it was a complete context switch. The most inconvenient thing was the lack of clipboard synchronization.

About a year ago, my manager was kind enough to ship me a screen that he felt I should have for my desk, and I left it in the box as I didn't really have a use for it. I discovered this program called Multiplicity the other day which basically acts as a KVM over the network. Multiplicity gave me a reason to unbox my third screen and connect it. I now have my screens arranged horizontally in the following fashion:

A    B    C

Screens are connected in the following fashion:

Screen

Input

Machine

A

1

Personal

B

1

Personal

2

VMWare

C

1

VMWare

 

I have Multiplicity installed as a "server" on my personal machine, and as a "client" on my VMWare machine. Now, whenever I move my mouse to the edge of screen B, it jumps to screen C which is actually physically wired to the VMWare machine. The only time I use my KVM now is if I need to do something in text mode (e.g. BIOS change or something). I also have synchronized clipboards [2]. If I upgrade to the more expensive (like $20 more) version of Multiplicity, I could even add my laptop as a client, but, I don't really have a need for that right now.

Overall, I think this new setup is great - I can bounce between tasks in a much smoother fashion now and it's just generally more convenient. This is what it looks like:

[1] I actually just got a replacement personal machine as Dell had a refurb Quad Core w/ 4GB going for ~$600 the other day. I haven't put it into production yet but I am planning to run Windows Server 2008 x64 on it now.

[2] I run so many applications that play in this space that my clipboard chain is seemingly broken half the time so this has some limited value

Share this post: email it! | digg it! | bookmark it! | live it!

If you've ever gotten the error below when you go to Windows Update on a machine or Windows Update is completely missing from the Tools menu in Internet Explorer, it's fortunately an easy registry change to temporarily back out.

Network policy settings prevent you from using this website to get updates for your computer.

If you believe you have received this message in error, please contact your system administrator. Read more about steps you can take to resolve this problem (error number 0x8DDD0003) yourself.

This is generally set as a group policy setting and thus will get refreshed, but, you can delete the registry value below and restart your browser and Windows Update will work again at least temporarily.

Key - HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer
Type - REG_DWORD
Value Name - NoWindowsUpdate

I'm of course not recommending that you circumvent your organization's policies, but, since I periodically need this setting and I usually forget it, I've put it here for the permanent record.

Share this post: email it! | digg it! | bookmark it! | live it!

I'll be down in Orlando at TechEd IT Pro next week. I'm working a booth of some fashion in the Technical Learning Center most of the week I'm told, so feel free to drop in and say hello if you're in town as well.

Share this post: email it! | digg it! | bookmark it! | live it!

The Microsoft error code lookup tool (which no Windows admin should be without) got updated today: http://www.microsoft.com/downloads/details.aspx?familyid=be596899-7bb8-4208-b7fc-09e02a13696c&displaylang=en&tm.

While it says it's for Exchange, it really covers Exchange, Windows and a number of other Microsoft products. You can plug an error code in and this tool will give you whatever definitions it finds in the headers compiled into it. If you've ever seen an event that says "the error code is in the data", or you get a message that "unknown error 0x80045500" has occurred and you have no idea what to do, this is where to start. I keep the binary in the path on my workstations. Here's a sample for one of the most common codes you'll see:

C:\Documents and Settings\Administrator>err c0000005
# for hex 0xc0000005 / decimal -1073741819 :
  STATUS_ACCESS_VIOLATION                                       ntstatus.h
# The instruction at "0x%08lx" referenced memory at
# "0x%08lx". The memory could not be "%s".
  USBD_STATUS_DEV_NOT_RESPONDING                                usb.h
# 2 matches found for "c0000005"

Generally speaking the correct result is the first one for this example. When you get more than one result though you'll have to look at the names of the header files (e.g. usb.h) and see which one makes sense.

Share this post: email it! | digg it! | bookmark it! | live it!

Getting the amount of memory installed in a machine with WMI is a bit confusing particuarly if you only read the docs partially. I was initially using Win32_ComputerSystem::TotalPhysicalMemory, but the documentation warns  "Be aware that, under some circumstances, this property may not return an accurate value for the physical memory. For example, it is not accurate if the BIOS is using some of the physical memory."

The suggested alternative is Win32_PhysicalMemory::Capacity. This was an easy switch in my script, but, I was getting numbers I knew were wrong for the machines I was querying. The part I didn't read was that each instance of Win32_PhysicalMemory represents a single stick of RAM, so, you need to loop through them all and take the sum to get the RAM installed. This snippet will get you the total memory in megabytes:

Set colItems = wmiSvc.ExecQuery("SELECT * FROM Win32_PhysicalMemory", "WQL", wbemFlagReturnImmediately + wbemFlagForwardOnly)
totalMemory = 0
For Each item In colItems
      totalMemory = totalMemory + CLng(item.Capacity) / (1024^2)
Next

 

Share this post: email it! | digg it! | bookmark it! | live it!

I fix other people's computers and IT problems all day, so the last thing I really feel like doing is fixing my own computer problems at home. My personal workstation decided to blue screen earlier in the evening which was nice. It was one of those blue screens you can't really do anything about without having driver verifier enabled and since I didn't there was nothing I could really do. I did take a SWAG based on the contents of the dump and decide to update my nVidia drivers for my apparently now practically ice age video card. When I rebooted from this driver update, my BIOS gave me some lovely message to the tune of "Primary SATA Drive 0 Not Found". Great, SATA drive 0 has left the building.

I don't really keep anything on my C drive as I have another spindle for data, but, I wasn't really planning to reload my OS and all my settings this week. I don't really do anything that complex on my home PC. I have Carbonite backing up my C drive so if there are odds and ends namely my profile which I needed to recover, I could. I powered down my PC and waited about five minutes and turned it back on and conveniently Primary SATA Drive 0 had returned.

I had been noticing (and ignoring) for the past couple of months probably that various clattering noises had been coming out of the case of this machine, and obviously I probably should have done something about it a while ago. I also ordered a couple too many drives for my other machine last summer, so I had a couple of 500GB spindles in inventory. The folks over at Acronis were kind enough to give me a copy of their Disk Director Server product to play with a while ago, and I've always been really happy with it using it to resize partitions and copy them when I've needed to upgrade the size of a drive.

Tonight I used the Acronis Rescue CD wizard to burn a CD with their toolset on it, and then I booted from that CD and copied the old drive onto the new. Their tool is so simple to use which is great, and it took all of 25 minutes to copy my 50GB of data over. You'd think I wouldn't care that much about simple being that I do this for a living, but like I said before - the last thing I want to do is be reading manuals and searching the web to make my home PC work.

I'd definitely recommend picking up a copy of this tool to have around or the Disk Director product if you don't need to run it on a server OS (I run Windows 2003 at home). I keep one of the rescue CDs laying around just in case I do need it somewhere as it works pretty much everywhere.

In other news, remember when Dell sold tool-less chassis' for their consumer models? The PC in question is a Dell Dimension 4700 minitower which is perhaps 2 or 3 years old. In order to replace this hard drive, I had to remove a screw from the bottom of the case, remove a screw from a hard to reach place inside the case, and then figure out how to properly maneuver their stamped drive carrier to unlatch it from the other stamped metal carrier in the case. This whole mechanical activity probably took me just as much time as imaging the new drive between taking it apart, figuring out how to balance the new drive in there so the short cables reached, and installing the new drive in the carrier.

Share this post: email it! | digg it! | bookmark it! | live it!

I came across these shortcuts today for navigating the group policy editor and thought they'd be worth sharing. They're holdovers from Windows Explorer that also work in the GPO editor.

  • If you press * while targeting a folder in the console, the folder and all of its' children will be expanded
  • If you press + while targeting a folder in the console, the folder will be expanded one level
  • If you press - while targeting a folder in the console, the folder will be collapsed

When you double click on a policy setting, that dialog that comes up is non-modal. What this means is you can click in the GPO editor again and the setting dialog will go to the background. The settings dialogs are not shown in the taskbar, so you'll need to use Alt+Tab to access them.

Share this post: email it! | digg it! | bookmark it! | live it!

I had to load Quicktime on my PC for one reason or another in the past few months. It's not generally a package I load, but, apparently I needed it for some reason. I of course did the extra legwork to find the Quicktime installer that didn't include iTunes, as I also don't have any use for iTunes. In the past few weeks this really annoying dialog started popping up periodically hawking not only some new QuickTime build, but also iTunes.

I have been clicking Quit for weeks now and it's been getting old. I hadn't really invested the time to figure out where this annoying application was launching itself from, but, tonight I stumbled upon the answer by accident. Check your scheduled tasks - apparently Apple took the liberty of installing a job that runs this update application of theirs. I deleted the job - hopefully it doesn't come back and hopefully I don't see this dialog anymore:

 

Share this post: email it! | digg it! | bookmark it! | live it!

Thought I'd post an informational post for folks who are moving an AD forest to Windows 2003 forest functional level (aka FFL2) as I realized today this piece of information might not be quite as well known as I might have thought. As an FYI, this change adds a number of attributes to the partial attribute set (aka the PAS or global catalog):

  • Ms-DS-Trust-Forest-Trust-Info
  • Trust-Direction
  • Trust-Attributes
  • Trust-Type
  • Trust-Partner
  • Security Identifier
  • Ms-DS-Entry-Time-To-Die
  • MSMQ-Secured-Source
  • MSMQ-Multicast-Address
  • Print-Memory
  • Print-Rate
  • Print-Rate-Unit
  • MS-DRM-Identity-Certificate

This is done when you upgrade the forest functional level because at this point there are no Windows 2000 domain controllers in the forest and thus a change to the PAS will not force a GC resync. Recall that in Windows 2000, modifying the PAS caused every global catalog in the forest to replicate the global catalog from scratch. In a large environment this could be a major undertaking. Windows 2003 removes this and only replicates the changes. By waiting until Windows 2003 FFL, you mitigate this issue of adding these attributes to the PAS.

This should be a nonevent really but if you've got any issues in the forest that might come out of the woodwork with a PAS modification then this could cause you some grief. Having made this change numerous times, I've only had an issue once and it was a replication block that worked itself out on its' own.

Share this post: email it! | digg it! | bookmark it! | live it!

So I have an MCSE: Messaging 2003. Took something like 7 or 8 tests to get that way back when and it's still good. Being the good consultant that I am I decided I'd figure out what I need to do to get whatever the new equivelants are on Windows 2008 and Exchange 2007:

The new Windows 2008 exam seems to be an "MCITP: Enterprise Administrator":

  • Windows 2003 MCSE upgrade Test
  • Windows Vista Test
  • Windows 2008 Enterprise Administrator Test

OK so, three tests total for an upgrade. That's a lot of test questions, but, seeing as my transcript says I took the Windows 2000 client test - they have a point. Unfortunately this means I'm going to have to make peace with Vista on some piece of hardware and actually use it. Not looking forward to that - 2003 runs so well on my machines.

The new Exchange 2007 tests I can run as a seperate thread - seems that's now called an "MCITP: Enterprise Messaging Administrator":

  • Configuring Exchange 2007 Test
  • Designing Exchange 2007 Test
  • Deploying Exchange 2007 Test

Well, three more tests to upgrade. 3 + 3 = 6 tests total to upgrade. I only took 7 or 8 originally so might as well not even call this an upgrade - perhaps renumbering would be a better term.

So, I need to take six tests to change the alphabet soup in my signature line at work. Speaking of alphabet soup - what is up with these new certification names? I can fit "MCSE: Messaging" in my signature without any sort of space constraint. If I plug in there that I'm an "MCITP: Enterprise Administrator and MCITP: Enterprise Messaging Administrator" I'm going to practically have a buffer overrun at only 80 characters across the screen, not to mention I'd look like one of those folks that spells out their 12 useless certifications in their signature line and pastes the jpegs in that they send you when you pass.

Time to start the test taking and theorizing on how to summarize that whole jumble.

Share this post: email it! | digg it! | bookmark it! | live it!

Sometimes one of the most useful resources at your disposal when troubleshooting a hang or other issues is the memory dump file Windows will write out during a blue screen. If a system is hung and you are not able to get to it locally, pressing Ctrl+ScrollLock, ScrollLock isn't going to be a feasible solution. If the server is an HP server with an iLO card (Integrated Lights Out), and you've set a registry key in Windows ahead of time, you can force the system to bluescreen, write the memory dump, and restart.

The key to doing this is generating what's called a nonmaskable interrupt or NMI. The long and short of it is that NMIs are hardware interrupts which have to be serviced immediately. Windows has a concept of IRQ levels, or IRQLs. The highest IRQL is always serviced, preempting any lower level interrupts which are currently being serviced. The preemptive behavior here is called masking the interrupt. So, an NMI is an interrupt which must be serviced immediately. Generally you get an NMI when there's a major hardware fault that prevents the operating system from continuing. This is exactly what happens if we trigger one manually in the iLO.

The first step to getting this functionality working is setting a registry key outlined in KB 927069. Don't mind the part about this only applying to HP Blades or that it only applies to Windows 2000. This works on 2000 and 2003 and it works with hardware other than HP blades. Here's the registry key info:

Path: HKLM\System\CurrentControlSet\Control\CrashControl

Value: NMICrashDump

Data: 1

Type: REG_DWORD

You'll need to reboot for the change to take effect.

If you have the Automated System Recovery (ASR) functionality enabled on the server and you need to get a full memory dump, you will need to turn it off as it can interfere with this process. This is a BIOS setting which I don't have the steps to change easily available. If there's demand (leave a comment), I can track them down.

To crash the box, these are the steps. I shot these screens on a DL360 G4 which is fairly recent hardware. I suspect the screens and locations of options may vary a bit by age (and especially on older legacy Compaq stuff), but the basic process is the same.

1. Login to the ILO and then proceed to the "Server and iLO Diagnostics" link on the left hand navigation:

    

2. Select the Virtual NMI Button option on the toolbar:

    

Warning! I can't guarantee that this button generates a warning when you click it on all versions of the iLO firmware. Generating an NMI will HALT your system. Don't click this button just to see what happens!

3. Generate the NMI. This button is towards the bottom of the page so if your browser doesn't automatically scroll down to it, you'll have to drill down:

    

4. You will get a warning dialog to make sure you're really certain this is what you want to happen. Remember, doing this will HALT your system!

    

5. The iLO will write a status message to the status bar in IE:

    

6. At this point Windows will crash with a 0x80 bugcheck and reboot (assuming your machine is configured to automatically reboot after a bluescreen). You can hopefully use the memory dump to assist in troubleshooting the problem at hand.

Note that this capability is present in the Dell DRAC cards (at least certain versions). I'm trying to find out what happened to the option in the latest versions of the cards as it seems to have gone AWOL. I'll post the directions whenever I find out.

Share this post: email it! | digg it! | bookmark it! | live it!
More Posts Next page »