Shuffling the lines of a large file

There is a lot of methods for sampling a dataset. One simple way is the random sampling, where you shuffle the instances of a collection, here represented by the lines of a file. According to the size of your file it can be performed using the ‘shuf’ tool, provided by linux coreutils. One example of common usage is given above:

shuf [input-file] > [output-file]

However, this tool requires that all the input size fits in the memory. If your file size exceeds the memory size, you can use the linux ‘sort’ with parameter ‘-R’. An example of its usage is as follows:

sort -R [input-file] > [output-file]

Recall that in some environments the sort does not work properly for case sensitive texts (as is highlighted in sort manual). If this is your case, you may execute the following command before the sorting:

export "LC_ALL=C"

See ya!

Deploying With Git Using a Remote Repository

Im deploying Ruby on Rails applications with Git. The code of my personal application was hosted in because it gives me one free private repo. Following steps can give a basic deployment:

  • On the remote server in which your application is running.
  1. cd /var/www/your_application_directory
  2. git init
  3. git remote add production
  4. git pull production master (*)(**)
  • On the local machine in which you are changing your code.
  1. Change/Add some code
  2. git add .
  3. git commit -m “Changing some stuff”
  4. git push beanstalk master
  • On the remote server in which your application is running in order to get the new modifications
  1. git pull production master
  2. stop the server, in my case mongrel 
    mongrel_rails stop
  3. restart the server 
    mongrel_rails start -p PORT_NUMBER -d -e production -P log/
* If you have the erro

Permission denied (publickey,keyboard-interactive). fatal: The remote end hung up unexpectedly

you have to set a new RSA key for the remote server. To set a new RSA public key in the remote server, you just need to run 


, answer some questions, and copy the content of ~/.ssh/

** You have to add your RSA public key to grant the access from the server in which your application is running. On Beanstalk, accessing the you can dot it.

Pidgin From Source Farsight And GstInterfaces

While using Jaunty and having problems with Pidgin 2.5.5 and its MSN protocol error “Connection error from Notification server: Unable to connect”, and also with WLM error “nexus stream error”, I ended up installing 2.10.0 from source. During the installation, I had the following problems to install the sound and video dependencies:

checking for GSTINTERFACES… no
checking for FARSIGHT… no

Ok, for GSTINTERFACES I was not finding easely what package to install. Looking into “configure” file, it was requiring a “gstreamer-interfaces-0.10” package, but this was not found in Jaunty. So, I figured out that the right package is libgstreamer-plugins-base0.10-dev. Therefore,

sudo apt-get install libgstreamer-plugins-base0.10-dev

For FARSIGHT things were more complicated. I needed to install libnice 0.0.9, and farsight2 0.0.10 (which required lots of denpendencies) both from source.

The result was

checking for GSTINTERFACES… yes
checking for FARSIGHT… yes

and I could finally use Pidgin 2.10.0 with sound and video on Jaunty.

More resources of information regarding Pidgin installation from source can be found here, and here.

Using mirrors in SED

It took me long to discover mirrors in SED. That’s a feature that may help you a lot in some replacing problems. Mirrors are denoted by \1, \2, \3, .. \9 and are used to cut a part of the input string to use it in the replacement string, like a temporary variable. For use mirrors in SED, you need to use extended regexp, flag -r. Let’s see some applications:

Chaging a data:

$ echo 12/31/2004 | sed -r 's@([0-9][0-9])/([0-9]{2})/([0-9]{4})@\1.\2.\3@'
$ echo 12/31/2004 | sed -r 's@([0-9][0-9])/([0-9]{2})/([0-9]{4})@\2.\1.\3@'
$ echo 12/31/2004 | sed -r 's@([0-9][0-9])/([0-9]{2})/([0-9]{4})@\2-\1-\3@'

Inverting field order:

$ echo abcdefg | sed -r 's/(a)(b)(c)/\3:\2:\1/'
$ echo abcdefg | sed -r 's/(a)(b)(c).*/\1:\2:\3/'
$ echo abcdefg | sed -r 's/(a)(b)(c).*/\3:\2:\1/'

Read more:

See ya!

Totem Without PulseAudio on Jaunty

Since I quit PulseAudio, I’d a problem with Totem, it was not having sound anymore. In order to solve this I uninstalled gstreamer for pulseaudio with the following command

 sudo apt-get remove gstreamer0.10-pulseaudio

Restart, and sound will be working in Totem.

Installing Lexmark Printer 2600 Series on Linux

In this tutorial I will show how to install the Lexmark printers 2600 series. In my case I have one printer model x2695 and it is up and running on Ubuntu 9.04, even the scan feature. Looks like there is a lack of information about Lexmark drivers, even in the OpenPrinting project the 2690 series is tagged as “Paperweight”.

First download the driver here, or in this backup link. Extract it and, on the terminal, type


Now you will be guided by screens to …

… the error that happened in the last screen, “The installer package is not supported by your system. Installer will exit.”. In order to know where this message came from, and to try to understand what happened I needed to check the source of the installer. In order to do that I needed to go to the terminal and type:

./ --noexec --target lexmark

It created a folder called lexmark containing the source of the installer. There I found the run.lua file that was the source the message that was displayed in the error. As soon as I analysed the code in this script, I changed a piece of it:

vim /lexmark/config/run.lua

on the line 16 from

g_usetar = false


g_usetar = true

So, I needed to run the installer again but using the changed file:

cd lexmark

Now the instalation was finished succesfully, and I could even use XSane to scan documents. You can find here the output from the succesful installation because maybe it can gives you some insights if your installation was not fine. Now, in my printers configuration screen I have two 2600 printers, one created by the installer, and another created when the system found the printer right after the driver was installed:

Well, I had not even one printer working, now I have two :¬)

Compiz Instead Metacity When Starting Gnome

Suddenly Metacity was being loaded in lieu of Compiz when starting Gnome. I tried restart compiz with:

compiz --replace

but it did not work and Metacity kept itself up and running.

What worked for me was open gconf-editor (Alt+F2 > “gconf-editor” > Enter) and navigate trough “desktop > gnome > session > required_components and set windowmanager key as “compiz”. The next time you start Gnome, Compiz will be loaded.