Cross Compile with make-kpkg

I got myself one of the fancy shmancy netbooks. Due to a habit and some hardware issues I needed to compile a kernel. The problem here though is that it takes for ever to build a kernel on one of these things. No sweat I'll just build it on my desktop, it'll only take 5-10 minutes. But of course there is a catch. My desktop is 64bit and this new machine is an Atom CPU which only does 32bit.

The process for compiling a 32bit kernel on a 64bit machine is probably a lot easier if you don't compile it the Debian way. But this is not something I want to do, I like installing the kernels through the package manager and doing this type of cross compile using make-kpkg is not trivial. There are plenty of forum and email threads about people recommending to use chroot or virtual machines for this task, but that is such a chore to set up. So here is my recipe for cross compiling 32bit kernel on 64bit host without chroot / vm, the-debian-way.

  1. Install 32bit tools (ia32-libs, lib32gcc1, lib32ncurses5, libc6-i386, util-linux, maybe some other ones)
  2. Download & unpack your kernel sources
  3. run "linux32 make menuconfig" and configure your kernel for your new machine
  4. clean your build dirs "make-kpkg clean --cross-compile - --arch=i386" (only needed on consecutive compiles)
  5. compile your kernel "nice -n 100 fakeroot linux32 make-kpkg --cross-compile - --arch=i386 --revision=05test kernel_image" for faster compilation on multi-CPU machines run "export CONCURRENCY_LEVEL=$((`cat /proc/cpuinfo |grep "^processor"|wc -l`*2))" first
  6. At this point you have a 32bit kernel inside a package labeled for 64bit arch. We need to fix this, run "fakeroot deb-reversion -k bash ../linux-image-2.6.35.3_05test_amd64.deb". Open the file DEBIAN/control with vim/emacs and change "Architecture: amd64" to "Architecture: i386" exit the bash process with ctrl+d
  7. That's it, now just transfer the re-generated deb to destination machine and install it.

Many if not all ideas for this process come from reading email threads the comments made by Goswin Von Brederlow were particularly helpful, thanks.

Debian LILUG linux software 2010-08-25 22:09:15
Versionless Distro

Every six months the internet lights up with stories that Canonical & Co has done the unthinkable they have increased the number following the word Ubuntu. In other words they have release a new version. This is a well understood concept to differentiate releases of software. As the version increases it is expected that new features are introduced and old bugs are removed (hopefully more are removed than added).

Versioning distributions and releasing the versions separately is a common practice, employed by most if not all distributions out there. Ubuntu has adopted the policy of releasing regularly and quite often. But there is a different approach it revolves around a concept I call "Versionless" where you do not have a hard release but instead let the changes trickle down. In the application world these releases are often called nightly builds. With distributions it is a little bit different.

First of all its worth noting that distributions are not like applications. Distributions are collection made up by applications and a kernel, the applications that are included are usually stable releases and so the biggest unpredictability comes from the combination and configuration there of. As a result one of the important roles for distro developers is to ensure that the combination of the many applications does not lead to adverse side effects. This is done in several ways, the general method is to mix all the applications in a pot, the so called pre-release and then test the combination. The testing is done by whole community, as users often install these pre-releases to see if they see any quirks through their regular use. When the pre-release becomes stable enough it is pushed out the door as a public release.

In an ideal world after this whole process all the major bugs and issues would have been resolved and users go on to re-install/update their distribution installations to the new release -- without any issues. The problem is that even if the tests passed with flying colors does not mean that on the user will not experience problems. The more complicated a configuration that a user has the more chances they will notice bugs as part of upgrade. This is particularly evident where there are multiple interacting systems. Doing a full upgrade of a distribution it is not always obvious what change in the update has caused this problem.

Versionless distributions are nothing new, they has been a staple of Debian for a while. In fact it is the process for testing package compatibility between release, but it is also a lot more. There are two Debian releases that follow this paradigm, Debian Testing and Debian Unstable. As applications are packaged they are added to Debian Unstable and after they fulfill certain criteria, IE they have spent some time in Unstable and have not had any critical bugs filed against them, they are then passed along to Debian Testing. Users are able to balance their needs between new features and stability by selecting the corresponding repository. As soon as the packages are added to the repositories the become immediately available to user for install/upgrade.

What it really comes down to is testing outside your environment is useful but it cannot be relied solely upon. And when upgrades are performed it is important to know what has changed and how to undo it. Keeping track of changes for 1000's of updates is nearly impossible. So update small and update often, use Debian. Good packages managers are your best friend, but only second to great package developers!

Debian LILUG linux software 2010-05-14 19:03:54
Monitor hot plugging. Linux & KDE

Apparently Linux does not have any monitor hotplugging support which is quite a pain. Every time you want to attach a monitor to laptop you have to reconfigure the display layout. This is a tad frustrating if you have to do this several times a day. And it doesn't help that KDE subsystems are a bit flaky when it comes to changing display configuration. I've had plasma crash a on me 1/3 times while performing this operation.

Long story short I got fed up with all of this and wrote the following 3 line script to automate the process and partially alleviate this head ache

#!/bin/bash
xrandr --output LVDS1 --auto --output VGA1 --auto
sleep 1
kquitapp plasma-desktop &> /dev/null
sleep 1
kwin --replace & &> /dev/null
sleep 1
kstart plasma-desktop &> /dev/null

You probably need to adjust the xrandr line to make it behave like you want but auto everything works quite well for me. Check man page for xrandr for details.

For further reading on monitor hot plugging I encourage you read launchpad bug #306735. Fortunately there are solutions for this problem, however they are on the other side of the pond.

Update: Added the kwin replace line to fix sporadic malfunction of kwin (disappearance of window decorations) during this whole operation.

code debian kde LILUG linux software 2010-04-10 16:58:58
Flash -- still crawling,

Flash is an absolute resource drain you've probably noticed how it completely hogs resources when you watch a video. Ever wonder how much more power is consumed watching flash over a regular video file? Is it a significant number? For those too lazy to read the rest, the simple answer is yes. And now to the details.

Recently I was watching a hulu video on a 1080P monitor and I noticed it was a little choppy. I decided to conduct an experiment and actually measure the difference in resource, power utilization between flash and h.264 (in mplayer). Not having the desire to make a video and encode it for flash and h.264 I randomly chose a trailer which was sufficiently long and was widely available in multiple formats. Tron Legacy, conveniently available through youtube and The Pirate Bay in 1080P, excellent.

In a more or less idle state my laptop draws around 1500mA of current (according to ACPI), CPU utilization is around 3% and clock averaged both cores is somewhere around 1.5Ghz (1Ghz min 2Ghz max .25Ghz step size utilizing the on-demand CPU frequency governor). Firing up the video through youtube in windowed mode (which scales the video to around 800pixels) The CPU utilization jumps up to around 85% and current draw to around 44000mA clock is continually kept at 2Ghz on both cores. Setting the movie to full screen (1080 pixels wide) decreases CPU usage to 70% and current draw to 3500mA, this might sound counter intuitive but it makes perfect sense as at 1920 wide the video is in native resolution and does not need to be scaled (This actually demonstrates that Flash does not make good use of the hardware scaling AKA Xv). Viewing the same 1080p trailer in mplayer, does reduce CPU load and current draw. Size of the video window does not matter much scaling it to about 800pixels or viewing in native 1920 pixels wide results in same numbers, thanks to mplayers Xv support. CPU utilization is around 40% and CPU does quite frequently clock down to reduce power consumption, current draw is around 3000mA.

So what does all of this mean. Assuming the voltage at the built in ACPI ammeter is equal to battery voltage (11.1V) that means the difference in power consumption between playing a video in flash vs Mplayer h.264 is about equivalent to medium strength CFL light bulb (1.5A*11.1V=15watts). Now this experiment is completely unscientific and has many flaws, primarily perhaps that I use Linux 64 bit flash player (10,0,42,34) the vast majority of flash users are obviously on windows and its possible that it runs better on windows platforms but I wouldn't bet money on that.

It makes me wonder if google is supposedly so concerned about being green maybe they should think about switching the default video format for youtube. We can do some interesting estimations. Lets assume that the average user of youtube watches 10 minutes worth of content in the default flash format, that means they consume about ( 10hours * 15watts / 60 minutes in an hour * 52 weeks in a year / 1000 watt hours in megawatt hours) .13 kilowatt hours per year more than using other formats. This does not sound like all that much but, assuming that 5% of the world population fits into this category it equals to about 40 000 000 kilowatts of power that could be saved. What does this number really mean? I invite you to go to the EPA Greenhouse calculator and plug it in. You'll see its equivalent to annual emission of 5500 cars. Again the numbers are completely unscientific but even if they are off by a factor of 3, it is still a significant number. It would be nice for someone to conduct a more thorough investigation.

While conducting this experiment I noticed something interesting. Playing the 1080p video in youtube would work fine for the first 1.5 min but then it would get choppy. The full trailer was fully downloaded so it didn't make much sense. Firing up KDE System monitor I was able to quite quickly to figure out the problem. As the video got choppy the CPU clock would drop while usage remained high, clearly the problem must be with cooling. System monitor was reporting CPU temperature of about 100C and power consumption of almost 6000mA. It had been a while since I cleaned the inside of my laptop, so I stripped it apart and took out a nice chunk of dust that was between the radiator and the fan. After this CPU temperature never went above 85C and current draw was at a much more reasonable 4400 while playing the flash video. Hopefully this will resolve my choppy hulu problem.

The graphs of this experiment are available. The flash graph, at first the scale trailer was played following by full screen. For the mplayer graph the inverse was done, first full screen then scaled .. but it doesn't matter much for mplayer.

LILUG software WWTS 2010-04-07 22:42:19
En guarde? La ou est le salut?

In reply to Josef "Jeff" Sipeks reply to my post entitle SMTP -- time to chuck it from a couple of years ago.

This is a (long overdue) reply to Ilya's post: SMPT -- Time to chuck it.

[...]

There are two apparent problems at the root of the SMTP protocol which allow for easy manipulation: lack of authentication and sender validation, and lack of user interaction. It would not be difficult to design a more flexible protocol which would allow for us to enjoy the functionality that we are familiar with all the while address some, if not all of the problems within SMTP.

To allow for greater flexibility in the protocol, it would first be broken from a server-server model into a client-server model.

This is first point I 100% disagree with...

That is, traditionally when one would send mail, it would be sent to a local SMTP server which would then relay the message onto the next server until the email reached its destination. This approach allowed for email caching and delayed-send (when a (receiving) mail server was off-line for hours (or even days) on end, messages could still trickle through as the sending server would try to periodically resend the messages.) Todays mail servers have very high up times and many are redundant so caching email for delayed delivery is not very important.

"Delayed delivery is not very important"?! What? What happened to the whole "better late than never" idiom?

It is not just about uptime of the server. There are other variables one must consider when thinking about the whole system of delivering email. Here's a short list; I'm sure I'm forgetting something:

  • server uptime
  • server reliability
  • network connection (all the routers between the server and the "source") uptime
  • network connection reliability

It does little to no good if the network connection is flakey. Ilya is arguing that that's rarely the case, and while I must agree that it isn't as bad as it used to be back in the 80's, I also know from experience that networks are very fragile and it doesn't take much to break them.

A couple of times over the past few years, I noticed that my ISP's routing tables got screwed up. Within two hours of such a screwup, things returned to normal, but that's 2 hours of "downtime."

Another instance of a network going haywire: one day, at Stony Brook University, the internet connection stopped working. Apparently, a compromised machine on the university campus caused a campus edge device to become overwhelmed. This eventually lead to a complete failure of the device. It took almost a day until the compromised machine got disconnected, the failed device reset, and the backlog of all the traffic on both sides of the router settled down.

Failures happen. Network failures happen frequently. More frequently that I would like them to, more frequently than the network admins would like them to. Failures happen near the user, far away from the user. One can hope that dynamic routing tables keep the internet as a whole functioning, but even those can fail. Want an example? Sure. Not that long ago, the well know video repository YouTube disappeared off the face of the Earth...well, to some degree. As this RIPE NCC RIS case study shows, on February 24, 2008, Pakistan Telecom decided to announce BGP routes for YouTube's IP range. The result was, that if you tried to access any of YouTube's servers on the 208.65.152.0/22 subnet, your packets were directed to Pakistan. For about an hour and twenty minutes that was the case. Then YouTube started announcing more granular subnets, diverting some of the traffic back to itself. Eleven minutes later, YouTube announced even more granular subnets, diverting large bulk of the traffic back to itself. Few dozen minutes later, PCCW Global (Pakistan Telecom's provider responsible for forwarding the "offending" BGP announcements to the rest of the world) stopped forwarding the incorrect routing information.

So, networks are fragile, which is why having an email transfer protocol that allows for retransmission a good idea.

Pas touche! I have not conducted extensive surveys of mail server configurations, but, from personal experience; most mail server give up on sending email a lot sooner than recommended. RFC 2821 calls for a 4-5 day period. This is a reflection of the times, email is expected to deliver messages almost instantaneously (Just ask Ted Stevens!).

As you are well aware I am not implying that networks are anywhere near perfect, it just does not matter. If you send a message and it does not get delivered immediately your mail client would be able to tell you so. This allows you to reacts, had the message been urgent you can use other forms of communication to try to get it through (phone </gasp>). The client can also queue the message (assuming no CAPTCHA system, more on that later) and try to deliver it later. Granted machines which run clients have significantly shorter uptimes than servers but is it really that big of a deal, especially now that servers give up on delivery just a few hours after first attempt?

I, for one, am looking forward to the day when I no longer have to ask my potential recipient whether or not they have received my message.

Instead, having direct communication between the sender-client and the receiver-server has many advantages: opens up the possibility for CAPTCHA systems, makes the send-portion of the protocol easier to upgrade, and allows for new functionality in the protocol.

Wow. So much to disagree with!

  1. CAPTCHA doesn't work
  2. What about mailing lists? How does the mailing list server answer the CAPTCHAs?
  3. How does eliminating server-to-server communication make the protocol easier to upgrade?
  4. New functionality is a nice thing in theory, but what do you want from your mail transfer protocol? I, personally, want it to transfer my email between where I send it from and where it is supposed to be delivered to.
  5. If anything eliminating the server-to-server communication would cause the MUAs to be "in charge" of the protocols. This means that at first there would be many competing protocols, until one takes over - not necessarily the better one (Betamax vs. VHS comes to mind).
  6. What happens in the case of overzealous firewall admins? What if I really want to send email to bob@example.com, but the firewall (for whatever reason) is blocking all traffic to example.com?
  1. Touche! I have to admit CAPTCHAs are a bit ridiculous in this application.
  2. See above
  3. By creating more work for admins. It allows users to more directly complain to the admins that the new protocol feature does not work. Yes I know admins want less work and fewer complaining users, but there are benefits. It really comes down to the fact that with more interactivity it is easier to react to changes, servers do not have brains but the people behind their clients do.
  4. Hopefully that will still happen.
  5. Well the worse protocol is already winning SMTP, dMTP (dot Mail Transfer Protocol) is so much better even if it is quite vague. MUAs will not be in charge, if they don not play ball then mail will not be delivered.
  6. Now you are just getting ahead of yourself. Stop making up problems. The solution to overzealous admins, is their removal.

[...]

And so this brings us to the next point, authentication, how do you know that the email actually did, originate from the sender. This is one of the largest problems with SMTP as it is so easy to fake ones outgoing email address. The white list has to rely on a verifiable and consistent flag in the email. A sample implementation of such a control could work similar to the current hack to the email system, SPF, in which a special entry is made in the DNS entry which says where the mail can originate from. While this approach is quite effective in a sever-server architecture it would not work in a client-server architecture. Part of the protocol could require the sending client to send a cryptographic-hash of the email to his own receiving mail server, so that the receiving party's mail server could verify the authenticity of the source of the email. In essence this creates a 3 way handshake between the senders client, the senders (receiving) mail server and the receiver's mail server.

I tend to stay away from making custom authentication protocols.

In this scheme, what guarantees you that the client and his "home server" aren't both trying to convince the receiving server that the email is really from whom they say it is? In kerberos, you have a key for each system, and a password for each user. The kerberos server knows it all, and this central authority is why things work. With SSL certificates, you rely on the strength of the crypto used, as well as blind faith in the certificate authority.

They might, the point is not so much to authenticate the user but to link him to a server. If the server he is linked to is dirty, well you can blacklist it. Much of the spam today is sent from bot-nets, in this schema all the individual botnet senders would have to link themselves to a server. Obviously, a clever spammer would run a server on each of the zombie machines to auth for itself. The catch is that he would have to ensure that the Firewalls/NATs are open and that there is a (sub-) domain pointing back at the server. This is all costly for the spammer and for the good guy it'll be easy to trace down the dirty domains.

At first it might seem that this process uses up more bandwidth and increases the delay of sending mail but one has to remember that in usual configuration of sending email using IMAP or POP for mail storage one undergoes a similar process,

Umm...while possible, I believe that very very large majority of email is sent via SMTP (and I'm not even counting all the spam).

Carton jaune, I addressed that issue in my original posting which is just 2 sentences below this one. Excessive lobotomy is not appreciated.

first email is sent for storage (over IMAP or POP) to the senders mail server and then it is sent over SMTP to the senders email for redirection to the receivers mail server. It is even feasible to implement hooks in the IMAP and POP stacks to talk to the mail sending daemon directly eliminating an additional socket connection by the client.

Why would you want to stick with IMAP and POP? They do share certain ideas with SMTP.

Carton rouge, I said nothing about sticking to IMAP/POP. The point is that the system can be streamlined somewhat.

For legitimate mass mail this process would not encumber the sending procedure as for this case the sending server would be located on the same machine as the senders receiving mail server (which would store the hash for authentication), and they could even be streamlined into one monolithic process.

Not necessarily. There are entire businesses that specialize in mailing list maintenance. You pay them, and they give you an account with software that maintains your mailing list. Actually, it's amusing how similar it is to what spammers do. The major difference is that in the legitimate case, the customer supplies their own list of email address to mail. Anyway, my point is, in these cases (and they are more common than you think) the mailing sender is on a different computer than the "from" domain's MX record.

I do not think that increasing the burden on mass mailers even good ones is such a bad thing.

[...]

I really can't help but read that as "If we use this magical protocol that will make things better, things will get better!" Sorry, but unless I see some protocol which would be a good candidate, I will remain sceptical.

And I can not help but read this as "We should not think about improving protocols because it impossible to do better." In any case I appreciate your mal-parè. The discussion is important as letting protocols rot is not a good idea.

[...]
LILUG news software WWTS 2009-04-22 10:47:24
Eric S. Raymond speaks heresy.

Recently my local LUG (LILUG) invited Eric S. Raymond (ESR) to come and speak. For those of you who are not familiar with ESR, he is one of the three largest icons of the Open Source/Free Software movement. Needless to say, it was an honor so see him speak. For the most part, his talk was quite tame but one of the points he raised seemed quite controversial. According to him the GPL and other viral licenses are no longer needed as they do more harm than good to the community. I don't want to put words into his mouth so I've transcribed what he said during the talk. You can view the ESR Q/A talk in its entirety, this specific excerpt is about 45 minutes into the video.

What is the point of being famous and respected if you can't speak heresy about your own movement. What is the point?

One of my heretical opinions is that we worry way too much about licensing. And in particular; I don't think we really need reciprocal licensing. I don't think we need licenses like the GPL, that punish people for taking code closed-source. Let me explain what I think. And then I'll explain [why] the fact we don't actually need those [licenses] matters.

I don't think we need them because. There has been a fair amount of economic analysis done in the last 10 years, significant amount of it has been done by, well, me. Which seems to demonstrate that open source is what the economist call a more efficient mode of production use, superior mode of production. You get better investment, better return out of the resources you invested by doing open source development than closed source development. In particular, there have been a number of occasions on which people have taken open source products that were reasonable successful, and just taken them closed. Effectively putting them under proprietary control, proprietary licensing and then tried to make a business model out of that. They generally fail. And the reason they fail is pretty simple. That is because when you take a product closed, you are now limited to what ever small number of developers that your corporation can afford to hire. The open source community that you just turned your back on does not, they have more people than you. They can put out releases more frequently, getting more user feedback. So the suggestion is, simply because of the numerical dynamics of the process: taking open software closed is something that the market is going to punish. You are going to lose. The inefficiencies inherent in closed source development are eventually going to ambush you, going to [inaudible] you, and your are not going to have a business model or product anymore. We've seen this happened number of times.

But now, lets look at the implications of taking this seriously. The question I found myself asking is: if the market punished people for taking open source closed, then why do our licenses need to punish people for taking open source closed? That is why I don't think you really need GPL or a reciprocal licenses anymore. It is attempting to prevent the behavior that the market punishes anyway. That attempt has a downside, the downside is that people, especially lawyers, especially corporate bosses look at the GPL and experience fear. Fear that all of their corporate secrets, business knowledge, and special sauce will suddenly be everted to the outside world by some inadvertent slip by some internal code. I think that fear is now costing us more than the threat of [inaudible]. And that is why I don't we need the GPL anymore.

-- Eric S. Raymond

Eric then went on to say that the BSD license is a good alternative to the GPL. This has sparked a heated discussion on the Free Software Round Table (FSRT) radio shows mailing list. While one can admire of the simplicity and clarity of the license it seems far fetched to say that it should be replacing the GPL. While yes there are economical incentive for corporations to keep code Open Source but the relative cost of closing the source depends largely on the size of company. Yes some small companies will not be able to afford to keep a code base alive with internal/contracted developers for larger companies the costs are a lot easier to digest.

Prime example of such a large company is Apple. In 2001 Apple came out with a completely new version of its operating system, MAC OS X. Although a successor to MAC OS 9, it was very different. OS X borrowed a very large code base from the BSDs, and some (pretty much everything but Darwin) of the code was effectively closed. This has not prevented Apple or OS X from thriving.

From the other end of the spectrum, are the companies such as MySQL AB which produce Free Software but also sell closed source licenses of the same code for a 'living.' There is a market for this, it exists because of those scared lawyers and corporate bosses. Killing the GPL would effectively kill this market, as a result development on some of these projects would slow down significantly.

The Open Source/Free Software movement is thriving, it does not mean its a good time to kill the GPL. In fact I don't think there will ever be a time when killing the GPL will do more good than harm.

lilug news software 2009-03-23 11:30:14
Mplayer: Subtitles & black bars

If you have ever watched a wide-screen foreign film with subtitles you might have noticed that the subtitles are usually put inside the picture. I find this extremely annoying as it makes the subtitles harder to read. It doesn't make much sense, if you already have black bars from the aspect ratio adjustment why not use them for subtitles? Fortunately if you use mplayer you can. Just add the following to your personal mplayer config file ~/.mplayer/config or the global /etc/mplayer/mplayer.conf

ass=1
ass-font-scale=1.5
ass-use-margins=1
vf=expand=:::::8/5

You need to adjust the last line to the aspect ratio of your screen. As a side effect all videos (even in windowed mode) will have the black bars added to them to pad them out to the aspect ratio, its a small price to pay.

LILUG Software 2009-03-13 00:36:17
Tricky Tricky
while(c1){
	switch(c2){
		case 1:
			aOk();
			continue;
		case 2:
			liveToSeeAnotherDay();
			continue;
		case 3:
			oopsyDaisy();
			break;
	}
	break;
}
Code Lilug Software 2009-02-03 12:41:47
Photo on Linux

I've gotten into a photo mood lately. Both shooting editing and organizing. With it I've discovered some new useful tools as well gotten to know the ones I've used before better.

The first and foremost is digiKam its a photo manager. Its primary job is maintain a database of all your photos. Photo-managers are not something most people have used so it might need some getting used to. The interface for digiKam is quite intuitive and easy too pick up. And for the average photo junkie it will have everything they need. But it certainly lacks some features which I think a self respecting photo managers must have. Here are some things I wish it had:

  • Way to flag a photo as a different version of another. (They should share all meta-data such as tags and description)
  • Split into a backend/frontend for centralized photo management. (KDE4 version supports multiple database roots so this can be used as a workaround)
  • Multi user support. If your whole family goes on a trip being able to collaborate on an album is essential
  • Export/import album with all meta-data (so one can share a whole album with someone else)
  • Save export options of raw images along with raw image.
  • HTML album generator needs to include meta-data (description, tags etc..)
  • Better gallery2 integration
    • better support for raw images (does not scale raw on upload).
    • Automatically fill out gallery title and description using local info.
    • Ability to preview pictures on select.
    • Better error messages.

Most of these issues are not major, especially since some of these will be solved with the multi-root support of the KDE4 release. I started with the negatives but it has a lot of cool features also. One of my favorite is calendar view. Regardless of how your galleries are organized it will use the EXIF date tag to arrange all your photos by date. It really helps when organizing photos. Tagging is also very useful, you can tag any photo and then you can view all photos by particular tag really make it easy to organize data. DigiKam also has a slew of semi-functional export features such as gallery2, flickr, and picasa. These are provided through the kipi framework, they are nice but most require some more work to become completely feature-full and userfriendly.

Almost forgot, digiKam is also an excellent tool for downloading photos from cameras. Most cameras are not plain UMS devices so they need special software to fetch the pictures out of them. If you are on windows you can usually use the manufacturer software to do this, but on Linux this is a tad complicated. Unless of course you use digiKam -- which turns the process into a magic "detect [the camera type] and download" single click operation.

To share my photos with the world I use a web based photo-manager as a front-end to my local database. Its called gallery. I have tried this tool in the past and it was just too cumbersome to use (I ended up writing my own PHP gallery system). But with the kipi export plug-in to digiKam and the remote plug in to gallery life just become easy.

The last few tools are only important for someone who is seriously into photography. The first is a gimp plug-in called ufraw, its basically a frontend to dcraw. It allows you to preform advanced raw editing before you import your photo to gimp -- you can adjust almost any aspect of your raw file conversion (lightness, white balance, hue, saturation..). UFRaw is a bit daunting but you don't always have to use all the features it provides, lightness is probably the only one you'll have to adjust on a regular basis. Another tool is called exiftool its used to read and manipulate EXIF information in pictures. There are times where you can loose the EXIF data while editing a photo (IE when saving to png in gimp) and using this tool you can quickly clone the EXIF info of one file onto another using the -TagsFromFile option. It even supports batch mode, for example "exiftool -TagsFromFile IMG_%4.4f.CR2 *.png" will copy the EXIF information to all PNGs from its parent file using the file name as mapping (sample file names: IMG_2565.png IMG_2573_1.png IMG_2565.CR2 IMG_2573.CR2)

So that's it for now, shoot away. And if you like, you can check out my public gallery.

LILUG News Software 2008-06-17 23:19:55
CenterIM; History format

My instant message client of choice is centerim (a fork of centericq). It does everything I need, send and receive messages in a very simple interface. Now this might sound like any ordinary client out there. But its special in that it runs completely in the terminal (ncurses based) -- and its good at it. I've tried some other terminal based clients and they all feel very cumbersome.

One major inconvenience with ncurses applications is the lack of clearly defined text areas. So copying text out is not trivial in fact its nearly impossible. So usually if I need to get text out of the application I just look in its log files. Unfortunately centerim has not-so-convenient history log format. It looks something like this:

IN
MSG
1212455295
1212455295
pong

OUT
MSG
1212455668
1212455668
pong

(each message entry is separated by "\f\n" not just "\n")

So using a little awk magic I wrote a simple converter which parses history file into something more readable and something you can paste as a quote.

gawk -vto=anon -vfrom=me 'BEGIN {FS="\n";RS="\f\n";}{if (match($1,"IN")) a=to; else a=from; printf("%s %s:\t %s\n", strftime("%H:%M:%S", $4), a, $5);for (i=6; i<=NF;i++) printf("\t\t%s\n", $i);}' /PATH/TO/HISTORY/FILE

You need to modify the -vto and -vfrom values to correspond to your name and the name of the person you're talking to. You obviously need to also specify the path to the file. If you don't like the time stamp you can alter the string passed to strftime (man 3 strftime for format options).

Sample output of the above sample looks like this.

21:08:15 anon:   ping
21:14:28 me:     pong
LILUG News Software 2008-06-04 14:25:12
Little Color in Your Bash Prompt

I have accounts on many computer systems (around 10) which together add up to several hundred machines. And I often find myself having ssh sessions open to multiple machines doing different things simultaneously. More than once I have executed something on a wrong machine. Most of the case its not a problem, but every-now and I'll manage to reboot the wrong machine or do something else equally bad. Its really an easy mistake to make, especially when you have half a dozen shell tabs open and screen running in many of the tabs.

I had spent some time pondering about a good solution to this problem. I already had bash configured to show the machine name as part of the prompt (IE: dotCOMmie@laptop:~$) but it was not enough, its easy to overlook the name or even the path. So one great day I got the idea to color my prompt differently on my machines using ANSI color escape codes. This worked quite well, at a single glance at the prompt I had an intuitive feel for what machine I was typing on -- even without paying attention to the hostname in the prompt. But this solution was not perfect as I would have to manually pick a new color for each machine.

For the next iteration of the colored prompt I decided to write a simple program which would take a string (Hostname) as an argument, hash it down into a small number and map it to a color. I called this little app t2cc (text to color code), you can download t2cc from the project page. The source doesn't need any external libraries so you can just compile it with gcc or use my pre-compiled 32bit and 64bit binaries. Consider the code public domain.

To use t2cc just drop it into ~/.bash and edit your ~/.bashrc to set the prompt as follows:

PS1="\[\e[`~/.bash/t2cc $HOSTNAME`m\]\u@\h\[\e[0m\]:\[\e[`~/.bash/t2cc $HOSTNAME -2`m\]\w\[\e[0m\]\$ "

And if you use the same .bashrc for both 32 and 64 bit architectures you can download t2cc_32 and t2cc_64 to your ~/.bash and the following into your ~/.bashrc:

if [ `uname -m` =  "x86_64" ]; then
        t2cc=~/.bash/t2cc_64
else
        t2cc=~/.bash/t2cc_32
fi
PS1="\[\e[`$t2cc $HOSTNAME`m\]\u@\h\[\e[0m\]:\[\e[`$t2cc $HOSTNAME -2`m\]\w\[\e[0m\]\$ "

As you can see from the examples above I actually use 2 hashes of the hostname a forward hash for the hostname and a backward hash for the path (-2 flag). This enables more possible color combinations. T2cc is designed to ignore colors which don't match dark backgrounds (or with -b bright backgrounds), this ensures that the prompt is always readable.

Initially I wanted to write this all in bash but I couldn't quite figure out how to convert ASCII character to numbers. If you know how to do this in pure bash please let me know.

So you might be wondering what does all of this look like?
 dotCOMmie@laptop:~/.bash

LILUG News Software 2008-04-16 23:06:36
dnsmasq -- buy 1 get 2 free!

I mentioned earlier that we netboot (PXE) our cluster. Before NFS-root begins, some things have to take place. Namely, the kernel needs to be served, IP assigned, DNS look-ups need to be made to figure out where servers are and so on. Primarily 3 protocols are in the mix at this time, TFTP, DHCP, DNS. We used to run 3 individual applications to handle all of this, they're all in their own right quite fine applications atftpd, Bind9, DHCP (from ISC). But it just becomes too much to look after, you have a config file for each of the daemons as well as databases with node information. Our configuration used MySQL and PHP to generate all the databases for these daemons. This way you would only have to maintain one central configuration. Which means you need to look after yet another daemon to make it all work. You add all of this together and it becomes one major headache.

Several months ago I had installed openWRT onto a router at home. While configuring openWRT I came across something called dnsmasq. By default, on openWRT, dnsmasq handles DNS and DHCP. I thought it was spiffy to merge the 2 services .. after all they are so often run together (on internal networks). The name stuck in my head as something to pay bit more attention to. Somewhere along the line I got some more experience with dnsmasq, and had discovered it also had TFTP support. Could it be possible what we use 4 daemons could be accomplished with just one?

So when the opportunity arose I dumped all node address information out of the MySQL database into a simple awk-parsable flat file. I wrote a short parsing script which took the central database and spit out a file dnsmasq.hosts (with name/IP pairs) and another dnsmasq.nodes (with MAC-address/name pairs). Finally I configured the master (static) dnsmasq.conf file to start all the services I needed (DNS, DHCP, TFTP), include the dnsmasq.hosts and dnsmasq.nodes files. Since the dnsmasq.nodes includes a category flag it is trivial to tell which group of nodes should use what TFTP images and what kind of DHCP leases they should be served.

Dnsmasq couldn't offer a more simple and intuitive configuration with 1/2 days work I was able to greatly improve upon on old system and make a lot more manageable. There is only one gripe I have with dnsmasq, I wish it would be possible to just have one configuration line per node that is have the name, IP, and mac address all in one line. If this was the case I wouldn't even need an awk script to make the config file (although it turned out to be handy because I also use the same file to generate a nodes list for torque). But its understandable since there are instances where you only want to run a DHCP server or just DNS server and so having DHCP and DNS information on one line wouldn't make much sense.

Scalability for dnsmasq is something to consider. Their website claims that it has been tested with installation of up to 1000 nodes, which might or might not be a problem. Depending on what type of configuration your building. I kind of wonder what happens at the 1000s of machines level. How will its performance degrade, and how does that compare to say the other TFTP/DHCP servers (BIND9 is know to work quite well with a lot of data).

Here are some configuration examples:

Master Flat file node database

#NODES file it needs to be processed by nodesFileGen
#nodeType nodeIndex nic# MACAddr

nfsServer 01 1
nfsServer 02 1

headNode 00 1 00:00:00:00:00:00

#Servers based on the supermicro p2400 hardware (white 1u supermicro box)
server_sm2400 miscServ 1 00:00:00:00:00:00
server_sm2400 miscServ 2 00:00:00:00:00:00

#dual 2.4ghz supermicro nodes
node2ghz 01 1 00:00:00:00:00:00
node2ghz 02 1 00:00:00:00:00:00
node2ghz 03 1 00:00:00:00:00:00
...[snip]...

#dual 3.4ghz dell nodes
node3ghz 01 1 00:00:00:00:00:00
node3ghz 02 1 00:00:00:00:00:00
node3ghz 03 1 00:00:00:00:00:00
...[snip]...

Flat File DB Parser script

#!/bin/bash

#intput sample
#type number nic# mac addr
#nodeName 07 1 00:00:00:00:00:00

#output sample
#ip hostname
#10.0.103.10 nodeName10
awk '
	/^headNode.*/ {printf("10.0.0.3 %s\n", $1)};				\
	/^server_sm2400.*/ {printf("10.0.3.%d %s\n", $3, $2)};			\
	/^nfsServer.*/ {printf("10.0.1.%d %s%02d\n", $2, $1, $2)};		\
	/^node2ghz.*/ {printf("10.0.100.%d %s%02d\n", $2, $1, $2)};		\
	/^node3ghz.*/ {printf("10.0.101.%d %s%02d\n", $2, $1, $2)};		\
	'									\
	~/data/nodes.db > /etc/dnsmasq.hosts

#output sample
#mac,netType,hostname,hostname
#00:00:00:00:00:00,net:nodeName,nodeName10,nodeName10
awk '
	/^headNode.*/ {printf("%s,net:%s,%s,%s\n", $4, $1, $1, $1)};			\
	/^server_sm2400.*/ {printf("%s,net:%s,%s,%s\n", $4, $1, $2, $2)};		\
	/^node2ghz.*/ {printf("%s,net:%s,%s%02d,%s%02d\n", $4, $1, $1, $2, $1, $2)};	\
	/^node3ghz.*/ {printf("%s,net:%s,%s%02d,%s%02d\n", $4, $1, $1, $2, $1, $2)};	\
	'										\
	~/data/nodes.db > /etc/dnsmasq.nodes

#output sample
#hostname np=$CPUS type
#nodeName10 np=8 nodeName
awk '
	/^node2ghz.*/ {printf("%s%02d np=2 node2ghz\n", $1, $2)};		\
	/^node3ghz.*/ {printf("%s%02d np=2 node3ghz\n", $1, $2)};		\
	'									\
	~/data/nodes.db > /var/spool/torque/server_priv/nodes

#Lets reload dnsmasq now
killall -HUP dnsmasq

dnsmasq.conf

interface=eth0
dhcp-lease-max=500
domain=myCluster
enable-tftp
tftp-root=/srv/tftp
dhcp-option=3,10.0.0.1
addn-hosts=/etc/dnsmasq.hosts
dhcp-hostsfile=/etc/dnsmasq.nodes

dhcp-boot=net:misc,misc/pxelinux.0,nodeServer,10.0.0.2
dhcp-range=net:misc,10.0.200.0,10.0.200.255,12h

dhcp-boot=net:headNode,headNode/pxelinux.0,nodeServer,10.0.0.2
dhcp-range=net:headNode,10.0.0.3,10.0.0.3,12h

dhcp-boot=net:server_sm2400,server_sm2400/pxelinux.0,nodeServer,10.0.0.2
dhcp-range=net:server_sm2400,10.0.0.3,10.0.0.3,12h

dhcp-boot=net:node2ghz,node2ghz.cfg,nodeServer,10.0.0.2
dhcp-range=net:node2ghz,10.0.100.0,10.0.100.255,12h

dhcp-boot=net:node3ghz,node3ghz.cfg,nodeServer,10.0.0.2
dhcp-range=net:node3ghz,10.0.101.0,10.0.101.255,12h

Debian LILUG News Software Super Computers 2008-03-13 00:30:40
MOTD

You all probably know that the most important thing on any multi user system is a pretty MOTD. Between some other things in the past couple of weeks I decided to refresh the MOTDs for the galaxy and Seawulf clusters. I discovered 2 awesome applications while compiling the MOTD.

First is a jp2a, it takes a JPG and converts it to ASCII and it even supports color. I used this to render the milky way as part of the galaxy MOTD. While this tool is handy it needs some assistance, you should clean up and simplify the JPGs before you try to convert them.

The second tool is a must for any form of ASCII-art editing. Its called aewan (ace editor without a name). It makes editing a lot easier, it supports coloring, multiple layers, cut/paste/move, and more. Unfortunately it uses a weird format and does not have an import feature, so its PITA to import an already existing ASCII snippet -- cut and paste does work but it looses some information -- like color.

Aewan comes with a sister tool called aecat which 'cats' the native aewan format into either text (ANSI ASCII) or HTML. Below is some of my handy work. Because getting browsers to render text is PITA I decided to post the art-work as an image.
Galaxy MOTD:
galaxy motd
Seawulf MOTD:
seawulf motd
I also wrote a short cronjob which changes the MOTD every 5 min to reflect how many nodes are queued/free/down

One more resource I forgot to mention is the ascii generator. You give it a text string and it returns in a fancy looking logo.

Finally when making any MOTDs try to stick to the max width of 80 and heigh of 24. This way your art work won't be chopped even on ridiculously small terminals.

Debian LILUG News Software 2008-03-02 23:41:22
NFS-root

I haven't posted many clustering articles here but I've been doing a lot of work on them recently, building a cluster for SC07 Cluster Challenge as well as rebuilding 2 clusters (Seawulf & Galaxy) from the ground up at Stony Brook University. I'll try to post some more info about this experience as time goes on.

We have about 235 nodes in Seawulf and 150 in Galaxy. To boot all the nodes we use PXE (netboot), this allows for great flexibility and ease of administration -- really its the only sane way to bootstrap a cluster. Our bootstrapping system used to have a configuration where the machine would do a plain PXE boot and then, using a linuxrc script the kernel would download a compressed system image over TFTP, decompress it to a ram-disk and do a pivot root. This system works quite well but it does have some deficiencies. It relies on many custom scripts to maintain the boot images in working order, and many of these scripts are quite sloppily written so that if anything doesn't work as expected you have to spend some time try to coax it back up. Anything but the most trivial system upgrade requires a reboot of the whole cluster (which purges the job queue and annoys users). On almost every upgrade something would go wrong and I'd have to spend a long day to figure it out. Finally, using this configuration you always have to be conscious to not install anything that would bloat the system image -- after all its all kept in ram, larger image means more waste of ram.

During a recent migration from a mixed 32/64bit cluster to a pure 64bit system. I decided to re-architect the whole configuration to use NFS-root instead of linuxrc/pivot-root. I had experience with this style of configuration from a machine we built for the SC07 cluster challenge, how-ever it was a small cluster (13 nodes, 100cores) so I was worried if NFS-root would be feasible in a cluster 20 times larger. After some pondering over the topic I decided to go for it. I figured that linux does a good job of caching disk IO in ram so any applications which are used regularly on each node would be cached on nodes themselves (and also on the NFS server), furthermore if the NFS server got overloaded some other techniques could be applied to reduce the load (staggered boot, NFS tuning, server distribution, local caching for Network File systems). And so I put together the whole system on a test cluster installed the most important software mpi, PBS(torque+Maui+gold), all the bizarre configurations.

Finally, one particularly interesting day this whole configuration got put to the test. I installed the server machines migrated over all my configurations and scripts halted all nodes. Started everything back up -- while monitoring the stress the NFS-root server was enduring, as 235 nodes started to ask it for 100s of files each. The NFS-root server behaved quite well using only 8 NFS-server threads the system never went over 75% CPU utilization. Although the cluster took a little longer to boot. I assume with just 8 NFS threads most of the time the nodes were just standing in line waiting for their files to get served. Starting more NFS threads (64-128) should alleviate this issue but it might put more stress on the NFS-server and since the same machine does a lot of other things I'm not sure its a good idea. Really a non-issue since the cluster rarely gets rebooted, especially now that most of the system can be upgraded live without a reboot.

There are a couple of things to consider if you want to NFS-root a whole cluster. You most likely want to export your NFS share as read-only to all machines but one. You don't want all machines hammering each others files. This does require some trickery. You have to address the following paths:

  • /var
    You cannot mount this to a local partition as most package management systems will make changes to /var and you'll have to go far out of your way to keep them in sync. We utilize a init script which takes /varImage and copies it to a tmpfs /var (ram file system) on boot.
  • /etc/mtab
    This is a pain in the ass I don't know who's great idea was to have this file. It maintains a list of all currently mounted file systems (information is not unlike to that of /proc/mounts). In fact the mount man page says that "It is possible to replace /etc/mtab by a symbolic link to /proc/mounts, and especially when you have very large numbers of mounts things will be much faster with that symlink, but some information is lost that way, and in particular working with the loop device will be less convenient, and using the 'user' option will fail." And it is exactly what we do. NOTE autofs does not support the symlink hack, I have a filed bug in the debian.
  • /etc/network/run (this might be a debianism)
    We use a tmpfs for this also
  • /tmp
    We mount this to a local disk partition

All in all the NFS-root system works quite well I bet that with some tweaking and slightly more powerful NFS-root server (we're using dual socket 3.4Ghz Xeon 2MB cache and 2GB of ram) the NFS-root way of boot strapping a cluster can be pushed to serve over 1000 nodes. More than that would probably require some distribution of the servers. By changing the exports on the NFS server any one node can become read-write node and software can be installed/upgraded on it like any regular machine, changes will propagate to all other nodes (minus daemon restarts). Later the node can again be changed to read-only -- all without a reboot.

Debian LILUG News Software Super Computers 2008-03-02 13:25:11
MythTV Lives On

Many of the LILUGgers will remember that I talked about uncertain times coming up in the future of MythTV. One of the companies (Zap2it) that provided all the channel information was pulling the plug on the service. Well I just heard great news.

A bit over a month ago I sent an email to Zap2it thanking them for their great service and support of the MythTV community.

From: dotCOMmie [mailto:####@#####.###]
Sent: Monday, June 25, 2007 9:44 PM
To: Zap2It-Labs
Subject: Thanks for the great service.

Hello

I'm relatively new to the mythTV community, and it is how I got to know zap2it. Non-the less I think you were a crucial part to the tremendous growth of this community, and I'm saddened by your recent decision to stop providing the channel listing service.

The reason I'm emailing you is to convey my deepest thanks to your company for providing such an excellent service to the hobbyist community. I often wish we had more companies like you.

Best Regards
--dotCOMmie

And here is the reply I received today:

From: "Roberge, Andy" <########@#######.###>
To: "dotCOMmie" <####@#######.####>
Date: Tue, 7 Aug 2007 21:17:44 -0500
Subject: RE: Thanks for the great service.

August 8, 2007

Zap2it Labs received many emails inquiring if television listings could be provided on a paid for basis once the current service is discontinued. Today we are pleased to announce an agreement that will allow for many of you to continue to have access to your personal television listings data.

In collaboration with Schedules Direct, a non-profit organization created by founding members of MythTV and XMLTV, an agreement has been reached that will continue to support the open source and "freeware" communities. As of September 1, 2007, there WILL BE an alternative television listings source for certain Zap2it Labs users who become members of Schedules Direct, which includes a membership fee.

While Schedules Direct will continue to support the open source community and the users of "freeware", it will not support users of "commercial" products, such as DVRs, that were purchased from either a retail outlet or a company that used Zap2it Labs as its television listings source.

For those of you who lived by the original spirit and intent of Zap2it Labs and wish to continue to have access to listings, we encourage you to visit Schedules Direct at http://www.schedulesdirect.org to set up your personal Schedules Direct membership today!

Labs Admin.

I honestly have to say I wasn't expecting a reply let alone such good news. Thank you once again, Zap2it.

Lilug MythTV News Software 2007-08-07 22:37:12