Archive for the ‘misc’ Category

convert text to number in oocalc

Friday, January 27th, 2012

when pasting text into the open office / libre office spread sheet application, the pasted numbers are recognised as text. to convert them,

  1. select cells
  2. right click — format cells — number
  3. menu “edit”: find and replace — from ./ to & — check regular expressions

found on http://mynthon.net/howto/-/OpenOffice%20-%20Calc%20-%20convert%20text%20to%20number.txt

look up bibliographical information from an arxiv id

Tuesday, October 18th, 2011

this python script takes one or more arxiv ids as input (command line arguments) and gives bibtex entries back which carry the bibliographic information.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/usr/bin/env python
 
# get the arxiv id
import sys
from string import strip, split
for arg in sys.argv[1:]:
    arg = strip(arg)
    arg = strip(arg, chars="arxiv:")
    arg = strip(arg, chars="http://")
    arg = strip(arg, chars="www.")
    arg = strip(arg, chars="arxiv.org/abs/")
    arg = split(arg, sep='v')[0]
    xid = strip(arg)
 
    # download the xml
    import urllib
    from xml.dom import minidom
    usock = urllib.urlopen('http://export.arxiv.org/api/query?id_list='+xid)
    xmldoc = minidom.parse(usock)
    usock.close()
 
    print xmldoc.toxml()
    print ""
 
    d = xmldoc.getElementsByTagName("entry")[0]
 
    date = d.getElementsByTagName("updated")[0].firstChild.data
    text_year = date[:4]
 
    title = d.getElementsByTagName("title")[0]
    text_title = title.firstChild.data#.encode('ascii', 'ignore')
 
    authorlist = []
    first = True
    for person_name in d.getElementsByTagName("author"):
        # get names
        name = person_name.getElementsByTagName("name")[0]
        text_name = name.firstChild.data#.encode('ascii', 'ignore')
        text_given_name = ' '.join(text_name.split()[:-1])
        text_surname = text_name.split()[-1]
        authorlist.append(text_surname+", "+text_given_name)
        #first author?
        if first:
            text_first_author_surname = text_surname
            first = False
 
    # output
 
    print "@MISC{"+text_first_author_surname+text_year[-2:]+","
    print "author = {"+" and ".join(authorlist)+"},"
    print "title = {"+text_title+"},"
    print "year = {"+text_year+"},"
    print "eprint = {"+xid+"},"
    print "URL = {http://www.arxiv.org/abs/"+xid+"},"
    print "}"

count interesting posts in an rss feed with the feed reader canto

Wednesday, October 5th, 2011

the rss feed reader canto can be extended quite easily with own python functions. i’ve given the keys z and x a meaning: z opens the story in the browser and increases the number of interesting stories by one, x does not open the story and increases the number of uninteresting stories by one. the results are saved over different sessions so that you can make a statistic evaluation at the end of the year…

~/.canto/conf.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
from canto.extra import *
 
link_handler("firefox \"%u\"")
image_handler("eog \"%u\"", text=True, fetch=True)
filters=[show_unread, None]
never_discard("unread")
colors[1] = ("yellow", "black") # unread
colors[2] = ("blue", "black") # read
 
def my_start_hook(gui):
    import os
    import pickle
    try:
        a = open(os.path.dirname(gui.cfg.path)+'/interesting.py', 'r')
        gui.interest = pickle.load(a)
        a.close()
    except:
        gui.interest = {}
 
start_hook = my_start_hook
 
def my_end_hook(gui):
    import os
    import pickle
    a = open(os.path.dirname(gui.cfg.path)+'/interesting.py', 'w')
    pickle.dump(gui.interest, a)
    a.close()
 
end_hook = my_end_hook
 
def interesting(gui):
    journal = gui.sel["tag"].tag.encode('ascii', 'ignore')
    if journal not in gui.interest:
        gui.interest[journal] = {"interesting": 1, "not interesting": 0}
    else:
        gui.interest[journal]["interesting"] += 1
    gui.goto()
    gui.just_read()
    gui.next_item()
 
keys['z'] = interesting
 
def uninteresting(gui):
    journal = gui.sel["tag"].tag.encode('ascii', 'ignore')
    if journal not in gui.interest:
        gui.interest[journal] = {"interesting": 0, "not interesting": 1}
    else:
        gui.interest[journal]["not interesting"] += 1
    gui.just_read()
    gui.next_item()
 
keys['x'] = uninteresting
 
add("http://feeds2.feedburner.com/DilbertDailyStrip?format=xml", tags=["Dilbert Daily Strip"])
add("http://www.arcamax.com/cgi-bin/news/page/1007/channelfeed", tags=["Garfield"])
add("http://www.phdcomics.com/gradfeed_justcomics.php", tags=["PHD Comics"])

look up bibliographical information from a doi

Wednesday, October 5th, 2011

this python script takes one or more doi as input (command line arguments) and gives bibtex entries back which carry the information provided by crossref. you have to register there and enter the api key they give you in this script (5th line).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
#!/usr/bin/env python
 
debug = False
 
crossref_api_key = 'your_crossref_api_key'
 
# get the doi
import sys
from string import strip
for arg in sys.argv[1:]:
    arg = strip(arg)
    arg = strip(arg, chars="doi:")
    arg = strip(arg, chars="http://")
    arg = strip(arg, chars="dx.doi.org/")
    doi = strip(arg)
 
    # clear from previous
    text_journal_title = ""
    text_year = ""
    text_volume = ""
    text_issue = ""
    text_title = ""
    text_first_author_surname = ""
    text_first_page = ""
    text_last_page = ""
    authorlist = []
 
    # download the xml
    import urllib
    from xml.dom import minidom
    usock = urllib.urlopen('http://www.crossref.org/openurl/?id=doi:'+doi+'&noredirect=true&pid='+crossref_api_key+'&format=unixref')
    xmldoc = minidom.parse(usock)
    usock.close()
 
    if debug:
        print xmldoc.toxml()
    print ""
 
    a = xmldoc.getElementsByTagName("doi_records")[0]
    b = a.getElementsByTagName("doi_record")[0]
    c = b.getElementsByTagName("crossref")[0]
    d = c.getElementsByTagName("journal")[0]
 
    journal_meta = d.getElementsByTagName("journal_metadata")[0]
    journal_title = journal_meta.getElementsByTagName("full_title")[0]
    text_journal_title = journal_title.firstChild.data#.encode('ascii', 'ignore')
 
    journal_issue = d.getElementsByTagName("journal_issue")[0]
    date = journal_issue.getElementsByTagName("publication_date")[0]
    year = date.getElementsByTagName("year")[0]
    text_year = year.firstChild.data#.encode('ascii', 'ignore')
 
    try:
        journal_volume = journal_issue.getElementsByTagName("journal_volume")[0]
        volume = journal_issue.getElementsByTagName("volume")[0]
        text_volume = volume.firstChild.data#.encode('ascii', 'ignore')
    except IndexError:
        pass
 
    try:
        issue = journal_issue.getElementsByTagName("issue")[0]
        text_issue = issue.firstChild.data#.encode('ascii', 'ignore')
    except IndexError:
        pass
 
    journal_article = d.getElementsByTagName("journal_article")[0]
    titles = journal_article.getElementsByTagName("titles")[0]
    title = titles.getElementsByTagName("title")[0]
    text_title = title.firstChild.data#.encode('ascii', 'ignore')
 
    contributors = journal_article.getElementsByTagName("contributors")[0]
    for person_name in contributors.getElementsByTagName("person_name"):
        text_given_name = ""
        text_surname = ""
        # get names
        given_name = person_name.getElementsByTagName("given_name")[0]
        text_given_name = given_name.firstChild.data#.encode('ascii', 'ignore')
        surname = person_name.getElementsByTagName("surname")[0]
        text_surname = surname.firstChild.data#.encode('ascii', 'ignore')
        authorlist.append(text_surname+", "+text_given_name)
        #first author?
        sequence = person_name.attributes.getNamedItem("sequence")
        if sequence.nodeValue == 'first':
            text_first_author_surname = text_surname
 
    try:
        pages = journal_article.getElementsByTagName("pages")[0]
    except:
        pages = None
    try:
        first_page = pages.getElementsByTagName("first_page")[0]
        text_first_page = first_page.firstChild.data#.encode('ascii', 'ignore')
    except:
        pass
    try:
        last_page = pages.getElementsByTagName("last_page")[0]
        text_last_page = last_page.firstChild.data#.encode('ascii', 'ignore')
    except:
        pass
    # physical review
    if pages == None:
        try:
            pages = journal_article.getElementsByTagName("publisher_item")[0]
        except:
            pages = None
        try:
            first_page = pages.getElementsByTagName("item_number")[0]
            text_first_page = first_page.firstChild.data#.encode('ascii', 'ignore')
        except:
            pass
 
    # output
 
    print "@ARTICLE{"+text_first_author_surname+text_year[-2:]+","
    print "author = {"+" and ".join(authorlist)+"},"
    print "title = {"+text_title+"},"
    print "journal = {"+text_journal_title+"},"
    if not text_volume == "":
        print "volume = {"+text_volume+"},"
    if not text_issue == "":
        print "number = {"+text_issue+"},"
    print "year = {"+text_year+"},"
    if ((text_first_page != "") and (text_last_page != "")):
        print "pages = {"+text_first_page+"-"+text_last_page+"},"
    if ((text_first_page != "") and (text_last_page == "")):
        print "pages = {"+text_first_page+"},"
    print "doi = {"+doi+"},"
    print "}"

find your publications in a bibtex file

Thursday, June 30th, 2011

starting from a bib file with many entries, among them publications where you are co-author, you would like to extract only these and list them in a tex file and on an html webpage.

you could simply use bibtex2html which does a very nice job but doesn’t allow you to fine-tune the ordering of the results (i think). so the approach i’m presenting here combines this tool with pybtex which i guess would be able to do the whole job by itself.

this script produces tex and html file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
#!/usr/bin/env python
# Does two tasks:
#
# 1) Generates pubs.html as refereed pubs then non-refereed pubs,
#    sorted reverse-chronologically
# 2) Generates LaTeX-formatted publications, same format
 
from operator import itemgetter
from pybtex.database.input import bibtex
 
bibfile = "/home/buschi/cv/db.bib"
texout = "sbusch_publications.tex"
htmout = "sbusch_publications.html"
 
parser = bibtex.Parser()
bib_data = parser.parse_file(bibfile)
 
# take everything with my name
sbusch_all = {}
for key in bib_data.entries.keys():
    try:
        authors = bib_data.entries[key].persons['author']
        for author in authors:
            try:
                if ((author.first()[0] == u'Sebastian') and (author.last()[0] == u'Busch')):
                    sbusch_all[key] = bib_data.entries[key]
            except IndexError: # no first / last name
                pass
    except KeyError: # no author (e.g. a collection)
        pass
 
# categorise
sbusch_nonref = {}
sbusch_nonref_sort = []
sbusch_proc = {}
sbusch_proc_sort = []
sbusch_ref = {}
sbusch_ref_sort = []
for key in sbusch_all.keys():
    publ = sbusch_all[key]
    year = -int(publ.fields['year'].strip('-')) # strip for 2009--; - to get the ones with largest years first
    for i in range(len(publ.persons['author'])):
        if ((publ.persons['author'][i].first()[0] == 'Sebastian') and (publ.persons['author'][i].last()[0] == 'Busch')):
            mypos = i
            break
    if ((publ.type == "techreport") or (publ.type == "mastersthesis") or (publ.type == "phdthesis") or (publ.type == "misc") or ("nonrefereed" in publ.fields.keys())):
        sbusch_nonref[key] = sbusch_all[key]
        sbusch_nonref_sort.append((key, year, mypos))
    elif (publ.type == "inproceedings"):
        sbusch_proc[key] = sbusch_all[key]
        sbusch_proc_sort.append((key, year, mypos))
    elif (publ.fields['journal'] != "in preparation"):
        try:
            vol = publ.fields['volume']
        except KeyError:
            vol = None
        if (vol != "submitted"):
            sbusch_ref[key] = sbusch_all[key]
            sbusch_ref_sort.append((key, year, mypos))
 
# sort
# the newest publications first
#     the ones where i'm first author first
sbusch_nonref_sorted = [i[0] for i in sorted(sbusch_nonref_sort, key=itemgetter(1,2))]
sbusch_proc_sorted = [i[0] for i in sorted(sbusch_proc_sort, key=itemgetter(1,2))]
sbusch_ref_sorted = [i[0] for i in sorted(sbusch_ref_sort, key=itemgetter(1,2))]
for i in [(sbusch_nonref_sorted, 'nonref.txt'), (sbusch_proc_sorted, 'proc.txt'), (sbusch_ref_sorted, 'ref.txt')]:
    f = open(i[1], 'w')
    for line in i[0]:
        f.write(str(line)+"\n")
    f.close()
 
from os import system, remove
from re import compile, DOTALL
 
pubs_html = ''
 
# iterate over refereed and non-refereed
for o in [ ['Refereed Publications', '--no-footer', 'ref.txt'], ['Proceedings', '--no-footer', 'proc.txt'], ['Non-Refereed Publications', '', 'nonref.txt'] ]:
    # output is pubs.html
    pubs_html += '<h1>%s</h1>' % o[0]
 
    # add non-/refereed pubs to pubs.html
    # sort by reverse-date; don't generate keys; use sbusch_web.bbl
    # writes into sbusch.html
    system("bibtex2html -q -d -r -dl -nobibsource -nokeys -m macros.tex -citefile %s -s sbusch_web -nodoc %s -o auxfile %s" % (o[2], o[1], bibfile))
    try:
        sbusch_html = open('auxfile.html', 'r')
        pubs_html += sbusch_html.read()
        sbusch_html.close()
    except IOError:
        pass
 
# change \"[ bib ]\" into \"[&nbsp;bib&nbsp;]\"
biblinkRE = compile(r'\[ (<a href="[^"]+">bib</a>) ]')
pubs_html = biblinkRE.sub(r'[&nbsp;\1&nbsp;]', pubs_html)
# remove explicit line breaks
deletebrRE = compile(r'<br />')
pubs_html = deletebrRE.sub('', pubs_html)
 
# write into output file pubs.html
pubs_html_file = open(htmout, 'w')
pubs_html_file.write(pubs_html)
pubs_html_file.close()
 
#########
 
# now we're going to generate a LaTeX version of my pubs, also sorted
 
# RE selects bibitems from bbl (key stored in group(1), entry in group(2))
bibitemRE = compile(r'\\bibitem\[\]\{([^\}]+)\}(.*)', DOTALL)
# refs.tex will contain the LaTeX version of my pubs
refs_tex = open(texout, 'w')
 
refs_tex.write("\section{Publications}\n")
 
# separately loop through refereed and non-refereed pubs
for o in [ ['ref.txt', 'Articles in Refereed Scientific Journals'], ['proc.txt', 'Articles in Conference Proceedings'], ['nonref.txt', 'Other'] ]:
  auxfile = open(o[0], 'r')
  sorted_keys = auxfile.read().split('\n')
  auxfile.close()
 
  refs_tex.write('%s\n\\renewcommand\\refname{%s}\n\\begin{bibunit}[unsrt]\n' % ("%", o[1]))
  for key in sorted_keys[:-1]:
      refs_tex.write('\\nocite{'+str(key)+'}\n')
  refs_tex.write('\\putbib[%s]\n\\end{bibunit}\n' % bibfile[:-4])
 
refs_tex.close()
 
# clean up temp files
try:
    remove("auxfile.html")
except OSError:
    pass
try:
    remove("nonref.txt")
except OSError:
    pass
try:
    remove("proc.txt")
except OSError:
    pass
try:
    remove("ref.txt")
except OSError:
    pass

the tex file can then be included in another document and processed with bibtex.

find non-ascii characters

Friday, June 10th, 2011

some non-ascii characters in the bibtex file, for example the long hyphen −, result in an error message like


! Package inputenc Error: Unicode char \u8:− not set up for use with LaTeX.

in order to find non-ascii characters in the bibtex file, the following command lists all these characters which can then be searched with a text editor:


tr -d "\000-\011\013-\177" < file.bib | sed '/^$/d'

found on http://www.unix.com/302107579-post5.html

backref with natbib’s compress option

Thursday, June 9th, 2011

in a latex document where the citations are handled by natbib and where the compress option is active, citations can end up as [1-4]. if the backref option is active, the items in the bibliography will show on which page they are cited. this does not work in the present case though: only references 1 and 4 will show which page they were cited on, 2 and 3 won’t. this can be fixed by changing natbib.sty:

--- natbib.sty.old 2009-07-23 10:44:10.000000000 -0400
+++ natbib.sty  2009-11-01 17:07:53.309765500 -0500
@@ -408,6 +408,7 @@
         \@ifnum{\NAT@nm=\@tempcnta}{%
          \@ifnum{\NAT@merge>\@ne}{}{\NAT@last@yr@mbox}%
         }{%
+           \Hy@backout{\@citeb\@extra@b@citeb}%
           \advance\@tempcnta by\@ne
           \@ifnum{\NAT@nm=\@tempcnta}{%
             \ifx\NAT@last@yr\relax

found on http://tex.stackexchange.com/questions/13653/hyperref-with-the-backref-page-option

find latex symbols easily

Thursday, June 9th, 2011

more easily than in http://mirrors.ctan.org/info/symbols/comprehensive/symbols-a4.pdf, latex symbols can be found at http://detexify.kirelabs.org/classify.html

bash auto completion

Tuesday, May 31st, 2011

to get the final slash for symbolic links to directories, either press tab twice or add

set mark-symlinked-directories on

to the ~/.inputrc

found on http://superuser.com/questions/271626/bash-autocomplete-on-symlink-to-directory-complete-to-whole-directory-including

silent thunderbird

Monday, May 23rd, 2011

thunderbird displays a “sending message” notice which is notoriously in the way. it can be hidden in the config editor by setting the key mailnews.show_send_progress to false.

found on http://www.geoffblog.com/2006/06/hiding-thunderbird-sending-messages.html .