Using mirrors in SED

It took me long to discover mirrors in SED. That’s a feature that may help you a lot in some replacing problems. Mirrors are denoted by \1, \2, \3, .. \9 and are used to cut a part of the input string to use it in the replacement string, like a temporary variable. For use mirrors in SED, you need to use extended regexp, flag -r. Let’s see some applications:

Chaging a data:

$ echo 12/31/2004 | sed -r 's@([0-9][0-9])/([0-9]{2})/([0-9]{4})@\1.\2.\3@'
$ echo 12/31/2004 | sed -r 's@([0-9][0-9])/([0-9]{2})/([0-9]{4})@\2.\1.\3@'
$ echo 12/31/2004 | sed -r 's@([0-9][0-9])/([0-9]{2})/([0-9]{4})@\2-\1-\3@'

Inverting field order:

$ echo abcdefg | sed -r 's/(a)(b)(c)/\3:\2:\1/'
$ echo abcdefg | sed -r 's/(a)(b)(c).*/\1:\2:\3/'
$ echo abcdefg | sed -r 's/(a)(b)(c).*/\3:\2:\1/'

Read more:
http://aurelio.net/curso/sucesu/sucesu-seder-prompt.html

See ya!

Advertisements

Profiling a C/C++ code with Valgrind + KCachegrind

First let’s explain what is profiling. In general terms, profiling is a technique that can be used when you experience performance problems in your application. Basically, it consists in measure the spent time of each code function, to identify where is the bottleneck, i.e., the lines of the code that concentrate more the execution time.

There is a lot of tools for this proposal. In Java we have JPerformance and that are very nice. For C/C++, one that is very used is the GNU gprof, which is very simple to use (just recall to compile your code with -pg flag). Recently I knew KCachegrind, which is a cache profiler based on Valgrind, that surprised me by the simplicity of its QT interface. Valgrind, for those who does not know, is not a profiling tool, it is a memory management tool that helps you to find bugs due to memory leaks, memory conflicts. It is very recommended to use Valgrind since the beginning of coding. Let’s explain a little bit more of KCachegrind.

You can install it by apt-get repostitory:

user@host:~$ sudo apt-get install valgrind kcachegrind

For demonstration, I use a database simulator written in C (by me and others) as a target application. The simulator repeat the database operation thousand of times and compute the average of some measured metrics. The normal execution takes hours, so I run a simple experiment with valgrind (which makes the execution much slower). Before call kcachegrind is necessary to call valgrind to profile the cache data.

user@host:~$ valgrind --tool=callgrind ./simulator --param param.txt --queries 10 --seed 65270
user@host:~$ kcachegrind callgrind.out.12208

I took some screenshots to highlights some KCachegrind features:

A first feature provided by Kcachegrind, showed in the first picture, is a table with the cumulative cost of each function. It consider that main() has 100% of cost and we can follow how this cost is distributed along the functions called by main(). The same feature, explained above, can be viewed in a graphic view, as depicted in the second figure. And another way to analyse the code is through the graph view, which starts in main(), showing the cumulative cost and walk through the code graph. On this graph we can views that the higher cost of the code is concentrated on vprintf function, which is used to log the simulator execution for debug purposes.

See ya!

Recursive makefile

I extended a Makefile, developed by Dr. R. K. Owen for general purposes. It’s usage is very simple. Put this Makefile in one directory above your source code. It considers that all your code are in a directory named ‘src’. It will search recursively on sub-directories of ‘src’ directory, generate the object code .o and link them all. There is also some debug flags and optimization flags that can be used or removed according to the situation.

# ----------------------------------------------------------------------------
# Makefile
# 
# release: 0.1 (28-Ago-2010) create makefile
# 
# purpose: searches recursively in current directory for c/cpp files (using find),
#          compile each source file and link them in a executable.
# ----------------------------------------------------------------------------

APP     = simulator
CC      = gcc
RM      = rm
SRCDIR  = src
SRCEXT  = c
OBJDIR  = obj

SRCS    := $(shell find $(SRCDIR) -name '*.$(SRCEXT)')
SRCDIRS := $(shell find . -name '*.$(SRCEXT)' -exec dirname {} \; | uniq)
OBJS    := $(patsubst %.$(SRCEXT),$(OBJDIR)/%.o,$(SRCS))

DEBUG   = -pg
INCLUDE = -I./inc
CFLAGS  = -Wall -DEBUG -lm -c $(DEBUG) $(INCLUDE)
OFLAGS  = -lm -msse2 -ffast-math -ftree-vectorize

all:    $(APP)

$(APP): buildrepo $(OBJS)
        @echo "$(CC) $(OFLAGS) $(OBJS) -o $@"
        @$(CC) $(OBJS) $(OFLAGS) -o $@

$(OBJDIR)/%.o: %.$(SRCEXT)
        @echo "$(CC) $(CFLAGS) $< -o $@"
        @$(CC) $(CFLAGS) $< -o $@

clean:
        $(RM) -r $(OBJDIR)

buildrepo:
        $(call make-repo)

define make-repo
        for dir in $(SRCDIRS); \
        do \
                mkdir -p $(OBJDIR)/$$dir; \
        done
endef

See ya!

How to generate a bash script with an embeeded tar.gz (self-extract)

Consider that you need to perform a routine in a remote server, where you need to decompress a tar.tz and execute a list of commands on this data. One alternative is send the tar.gz file to the remote server throught a ftp or scp and then log in the remote server and run a shell script or run manually a list of commands. Recall Java JRE setup, they use script.bin that comes with an embeeded tar.gz, which is self-extracted in the beginning of script execution. To build the self-extraction script I follow a tutorial published by Stuart Wells, which consists in four steps:

1) Create/identify a tar.gz file that you wish to become self extracting.

2) Create the self extracting script. A sample script is shown below:

> cat extract.sh
#!/bin/bash
echo "Extracting file into `pwd`"
# searches for the line number where finish the script and start the tar.gz
SKIP=`awk '/^__TARFILE_FOLLOWS__/ { print NR + 1; exit 0; }' $0`
#remember our file name
THIS=`pwd`/$0
# take the tarfile and pipe it into tar
tail -n +$SKIP $THIS | tar -xz
# Any script here will happen after the tar file extract.
echo "Finished"
exit 0
# NOTE: Don't place any newline characters after the last line below.
__TARFILE_FOLLOWS__

3) Concatenate The script and the tar file together.

> cat extract.sh example.tar.gz > example.sh
> chmod +x example.sh

4) Now test in another directory.

> cp example.sh /tmp
> cd /tmp
> ./example.sh

See ya!

How to transpose multiple table rows to a single row in MySQL

I need to use a multi-valued field in a table some days before, where I need to associate a list of tags to webpages, similar to the bookmark manager del.icio.us. The adopted modeling alternative was create a relational table with two fields (webpage-key and tag-key) and map the tags to the correspondent webpage. So, the tags are in a column and one webpage may have more than one tag. How to transpose those tags, to show each webpage and it list of tags in a single line? In MySQL you can use ‘GROUP_CONCAT’:

DROP TABLE IF EXISTS page, tag, page_tag;

CREATE TABLE page(pid INT PRIMARY KEY, url VARCHAR(32) );

CREATE TABLE tag(tid INT PRIMARY KEY, tag VARCHAR(32) );

CREATE TABLE page_tag(pid int, tid varchar(1), PRIMARY KEY pk(pid, tid) );

INSERT INTO page VALUES 
   (1, 'https://lembra.wordpress.com'), 
   (2, 'http://grooveshark.com'), 
   (3, 'http://stackoverflow.com');

INSERT INTO tag VALUES 
   (1, 'blog'), (2, 'music'), (3, 'questions'), (4, 'social-network'), (5, 'rss');

INSERT INTO page_tag VALUES 
   (1, 1), (2, 2), (3, 3), (1, 4), (2, 4), (3, 4), (1, 5);

SELECT url, GROUP_CONCAT(tag) tags 
FROM page p, page_tag r, tag t 
WHERE p.pid = r.pid AND r.tid = t.tid 
GROUP BY url;

-- +-----------------------------+--------------------------+
-- | url                         | tags                     |
-- +-----------------------------+--------------------------+
-- | http://grooveshark.com      | music,social-network     |
-- | https://lembra.wordpress.com | blog,social-network,rss  |
-- | http://stackoverflow.com    | questions,social-network |
-- +-----------------------------+--------------------------+
-- 3 rows in set (0.00 sec)

See ya!

[C/C++] Include headers with crossed references

Hi fellows,

To instantiate a struct (created in a header) in more than one source file the first thing that must be done is use a guard directive #ifndef. It will make gcc export data types and function signatures only once (according to the Makefile order). Meanwhile, if you need that a struct in a header h1 has a pointer to a struct in a header h2, which already has a pointer to a struct from h1 it will result in an error since gcc cannot resolve header cross reference (when gcc is processing header h1, it will find the #include h2 and when it try to create the struct h2 it will no find the type h1 returning the error: “expected specifier-qualifier-list before h1”). The first thing is rethink in the solution design. To overcome this, one alternative is forward struct declaration. Follows an example:

h1.h

#ifndef H1_H
#define H1_H
struct h1;
typedef struct h1 h1;
#include "h2.h"
struct h1 {
	h1 *v1;
	h2 *v2;
};
#endif

h2.h

#ifndef H2_H
#define H2_H
struct h2;
typedef struct h2 h2;
#include "h1.h"
struct h2 {
	h1 *v1;
	h2 *v2;
};
#endif

main.c

#include "h1.h"
#include "h2.h"
int main() {
	h1 v1;
	h2 v2;	
	return 0;
}

See ya!

Evolution: IMAP headers and contents for offline working

I’ve been trying to use evolution as my default e-mail client. Since IMAP download only message headers, I suppose that it retrieves message content (with caching) each time you click on a message for the first time in a session, which I causes some delay. I thougth in back to POP3, but I miss the feature to delete server messages. So I found an option in evolution which allows to download message content of a given server directory locally. To make it, right click in the desired server directory and mark both “Copy folder content locally for offline operation” and “Always check for new mail in this folder”.

See ya!