This struck me as insane. Practically everyone I know has a large flat screen TV and broadband. Webcams are $100 and PCs are $400. The latest Webcames from Logitech claim "HD" quality, and I can tell from experience tha their RightLight technology works amazingly well. A consumer level device should be easily under $500, possibly half that with work.
I only found one consumer level device at a reasonable price point, the D-Link i2eye DVC-1000. It was a piece of crap, partially because its a little old at this point. It didn't support upnp, so I had to manually open wholes in my firewall... and then help my Dad do the same on his end. Then, the picture was crap. It desparately needed the RightLight style auto-contrast fixing software that my Logitech Webcam has. Even after that, the picture quality was poor and the frames per second was near useless. Instead, we used Skype video chat and just hooked PCs up to the TVs. That ended up not working so well either, though we've routinely used Skype for video chat other times without issue. Skype mostly just works, though its interface is too focused on audio... time to make the switch, Skype.
Going back to the conference room equipment, it has one major failing: the software sucks. They get the video and audio quality right, but two things are major fails: the addressbook and the layout choices. The default interface is to dial a number... right. Instead, try to use the addressbook. We have 40+ offices, thousands of conference rooms and people's desktop computers... and it present it all as a very slow alphabetical list. No hierarchy. You can prefix search, sorta. You can bring up a search box which does substring search... except random strings can't be searched for. It should take an engineer a week to fix this.... The other major issue is layout. You can have multiple locations called in, plus locations can project a separate screen (usually a computer). And one quirk of current VC, you really kind of need to see yourself, to make sure you're on camera, or that the group of you is on camera. With the equipment we have, you can keep hitting the layout button to shuffle all of these things on screen, but it never does what you want. I don't need to see myself twice (one local, one echo), and I certainly don't need my picture to be the largest. In some modes, it tries to make the currently talking location the largest, but often it fails to do that. It has no concept of room size or anything, so often a single person location is as large or larger than a location with 20+ people. White boards really don't work, since either their "off screen" to one side or the other, or they're at the far end... and you either zoom in on that and ignore all the people, or you see the people and can't see the whiteboard. And then someone taking minutes decides to project... and you lose half your screen real-estate to something you don't care about, and you can't tell the VC equipment to minimize or hide anything.
All of these are fixable, though some are harder than others. The hardest is that everything should just work, as easy as the telephone, at least. Some things, like the white board and large conference rooms probably require multiple cameras, possibly even cameras which automatically focus/zoom in on the speaker. If you've ever seen a broadcast conference... or awards show, what you basically want is multiple camera angles and intelligent cameras, but all automatic, no one working all of that. The AV crew for our larger "all hands" style multiple location conferences is easily 5-10, what we need is software intelligent enough to give us a close approximation. And for the prices of this equipment, that's what I'd expect.
But personally, all I need is a box that has an HDMI output, a good wide-angle camera with intelligent assist for contrast, that can auto-scale picture/audio quality based on connection speed, and a good intelligent mic with echo cancellation, etc... for about $250-300. I even debated started a company just to do it...