Archive for July, 2008

Revision control for LaTeX: in search of an answer

Wednesday, July 30th, 2008

diffing LaTeX

As an ultimate LaTeX addicted, I hate to admit that there is nothing in the TeX universe comparable to the amazingly simple and intuitive revision tracking system that Microsoft implemented in Word. OpenOffice apparently has an equally powerful version control system built in its Writer.

Those of you who ever ventured into the territories of TeX-based collaborative writing certainly know how painful it can be to keep track of changes among several authors in TeX. TeX sources are raw text, so if you need proper diffing or revision tracking you will probably have to resort to some revision control system (such as Subversion or Git). Revision tracking via RCS, however, can be a nightmare to set up and learn to use fluently if you’re not already familiar with some basic notions of software revision control.

After an ugly lot of email exchanged with coauthors to let each other know who was doing what with a manuscript, I decided to search the Web for an answer.

(more…)

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

I have moved

Sunday, July 27th, 2008

I’m now at the Max Planck institute for Human Development, Berlin.

I’ll be working on the large Knowledge Collider project (LarKC), an EU grant that aims to create a platform for real-time reasoning on the semantic web.

The semantic web is being around for years, but only now it’s getting enough momentum. This is an area where I think what we know about semantics in psychology may make an important contribution.

This is called web 3.0 and is supposed to be the next big thing.

What’s interesting about Web 3.0
The basic idea is that the web is written for humans who read text; but it could be a lot more useful if it was also written for machines that can ‘reason’ on structured information. Right now, a machine can do much inference on propositions. The basic unit on the semantic web is the RDF triplet, which is basically a [SUBJ, VERB, OBJ] proposition.

Many existing databases are converted to a format that is, basically, a propositional analysis. But most of the web is, of course, plain text and not amenable to straight conversion between some other structured format to RDF. There are ongoing efforts to do this automatically from plain text. And, as happens with ‘automatic propositional analysis’, it’s not very good. But the fact that some companies do have databases (which are structured information, and easily translated into RDF triplets) is very promising.

Imagine that a spider for a search engine could read on the amazon webpage:
[JoseQuesada, likes, Kintsch-12345]
[Kintsch-12345, hasTitle, Comprehension]

Or from the wikipedia page:
[Kennedy, said, Prop-9876]
Prop9876: [Ich, bin, ein-Berliner]

Then, a machine could draw a lot of interesting inferences; and browsing the web will not need to be reduced to keyword matching: people could construct queries in a language that operates with RDF (SPARQL) or we could make search engines to translate natural language into SPARQL.

If all this sounds to you like ‘making the web a humongous database’ or ‘the return of good old-fashioned AI’, you may be right.

The challenge now is to write something that can do reasoning with trillions of propositions in something close to real time. This larKC project puts together people from parallel computing, cogSci, and hardcore reasoning people to create a service like that. It has funding for 3 years.

I will be posting more on semantic web stuff here (when appropriate) but mostly on the larkc blog.

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

More pre-PhD advice: give yourself homework

Sunday, July 27th, 2008

Jose posted an article last week about one person’s PhD experience, highlighting many of the common difficulties encountered when doing what’s largely a self-directed research project. There are loads of books about how to finish a PhD that expand on these questions – of supervisors, organizing your time and so on – but I’ve found that their advice can be frustratingly abstract. When I started my PhD I couldn’t help but wonder “yes but what should I do RIGHT NOW?”.

One useful trick I discovered was to set myself regular assignments. If you’re coming to a PhD from an undergrad or Masters level degree, chances are you’re more used to having teachers give you tasks rather than setting off into uncharted waters on your own. What’s more, you’ve got a big mountain of work sitting in front of you labelled ‘lit review’ and it can be hard to know where to start.

(more…)

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

Three tips to increase your chances of pleasing a journal editor

Sunday, July 27th, 2008

Recently I met with someone who is the editor for one of the top journals in my field. We discussed what would increase your chances of pleasing a journal editor. He gave me three clear pointers that I thought would be interesting to the readership here. But, I also think I’m going to try a different method to get them to you: mail. There are about 2000 RSS subscribers, and only a few dozen email subscribers. I think it’s those subscribers show a lot of commit to what I have to say here and it’s about right that they get extra content. If you use RSS over email because you find it more convenient, then my apologies. You can always subscribe to get the content, then unsubscribe, although I plan to decouple the two sources and prepare extra content that goes to the mail subscribers only in the future.

As Jason calacanis and Nova Spivack put it:

Why have I been doing so much more Twining than blogging and social networking? First of all, I’m not interested in having a conversation with the entire general public, or ever being an A-List blogger, or interacting with networks of random strangers. What I want is to efficiently participate in many different specific groups and communities around particular interests and relationships I have.

I still think that ap.com could be a great community where we share really effective tips (this one email is probably one of these). Just a quick reminder that posting is open to anyone who has anything to say (posts are reviewed). There is a post describing how to make a post. And of course, the comments are open.

EDIT: Since lots of RSS subscribers felt alienated, I have added the content here. I hope you understand why I thought it might be sensible to keep it to a reduded audience. The error in my logic was that email subscribers show more commitment: in fact RSS subscribers think the technology is superior and that’s why they do not subscribe using email.

As promised in my blog post, here are three tips to increase your
chances of pleasing a journal editor (and getting your paper published).

(1) Don’t take no for an answer. This editor told me that in many cases the reviews were not completely damaging, but many authors assumed that the paper was beyond repair and never resubmitted. Sometimes, even though you didn’t get a ‘revise and resubmit’, you can write back to the editor and say that you do not agree with some of the reviewers’ points, and that you have fixed the paper. Note that you were not invited to resubmit, but you are doing it anyway. Sometimes the editor will agree with your point and keep the process going.

This little sneaky tactic can save you a lot of time waiting for another journal to start the process from scratch, not to mention psychological wear-and-tear taking rejections.

Note: my editor in question said he would be more than happy to reevaluate such cases, but he may be an exception.

(2) Write it clearly. In a world where everybody rushes papers for publication, a well-written paper feels like fresh air. How do you know your paper is well-written? Leave it alone for a week. If you come back to it and you cannot understand your own point at first read, rewrite. Use your lay-man friends, or people from a different discipline, as testers.

Another trick that I’ve seen good writers do is to use very large fonts so they concentrate on one paragraph at a time (one screen full of large fonts). They move to the next screen only when they are totally satisfied with their writing. This often involves rewriting each sentence a few times, and shortening it.

(3) Don’t resubmit in a week. It shows disrespect for the entire review process. If the reviewers and editor took a few hours of their time to make your paper better, by all means do not disregard the changes they propose. Rarely you can address all suggestions in just one week.

What happens when you take an extraordinarily large time to resubmit? I thought it’d be catastrophic, but this editor concretely thought that this is not an issue. Sometimes life gets in the way. By all means resubmit even if you think your reviews have forgotten about you. They probably have anyway even if you submit in a snap :)

Hope this helps!

-Jose

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

Randy Pausch passed away, but left great advice on time management (on top of his motivational tips)

Friday, July 25th, 2008

Just learned that Randy Pausch has passed away. Most of you will have been exposed to his extremely moving talk on ‘how to get the things you want’ at CMU, his last lecture, when he was diagnosed with cancer and give only a few months to live. What I didn’t know is that he also gave a talk on time management.

What I liked the most is his first slide: The goal is FUN. This one, Tim Ferris got right, and he seems very successful at indoctrinating people with this idea. I really hope it works out, because I’d love to see more people having fun around.

However, I challenge you to find a single academic that can live the life that Tim is proposing! But that a different conversation.

Another crucial point is his understanding of the famous 2 x 2 table in , Stephen F. Covey’s 7 Habits of Highly Effective People. Key: the 2 and 3 are often misplaced; people tend to do important, non-urgent last (i.e. after non-important, urgent). So the best order for the four cells is:

Urgent

Not urgent

Important

1

2

Not important

3

4

Also: a recipe for having short meetings: stand up. That limits the time a person can talk to you (you both get tired).

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

100+ Places to Find Funding For Your Research | OEDb

Thursday, July 24th, 2008

The people at the Online education database have put together a list of resources for academics to get funding. While most of the usual suspects are there, you may find new options. It’s US-centric, so if you are outside the US you may not find it as appealing. It’d be nice if someone could put together a list like this for other locations
100+ Places to Find Funding For Your Research | OEDb

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

A centralized repository for academic journals: it’d need to borrow credibility from those established in the field

Monday, July 21st, 2008

Today I found a comment (again in hacker news!) that is very relevant to the discussion we are having here. I’m going to repost it, but credit goes to the author mlinsey. We have discussed in the past how soft peer review could change the landscape in science. The original model of peer review worked well when there were few submissions to journals and people communicated by snail mail, but it’s getting crazy for our current environment. Dario proposed an alternative, and mlinsey presents a similar one, maybe even more radical. Enjoy.

The whole journal system itself is broken. My university’s Math & CS library dropped its subscriptions to several journals a few years ago because they cost too much. Even though they picked the least important/prestigious journals, at one of the top CS departments in the country, this should not happen. And this is to say nothing of a lone individual who wants to benefit from research and teach himself some of it. They can hardly pay to subscribe to any of these journals.

And think about what a journal provides: a forum for researchers to submit the results of their research and a mechanism for selecting which of the submissions are worthwhile for folks in the field to know about.

What I just described is essentially just a karma system, albeit you would have to find a way to take the credibility of the rater into serious consideration. Assuming you solved the chicken-and-egg problem of getting enough credible people from academia to be raters and to submit their best work to your site (quite a tough problem considering many large universities are much more like big companies, or worse government bureaucracies, than startups), you could totally replace the entire system of academic journals.

Think of all the other free extras you would get by having a web app host all journal articles: at minimum, the process of citing references and looking at the background of a paper could be improved: you could visually trace the findings of the paper you’re looking at all the way back to the founding of the field by what each of it’s references used as references. Search would be a lot better, as would recommendation engines (lots of professors have grad students waste time simply scanning journals for articles that are relevant for their work). If you’re into NLP than you would have a much better dataset and a clear application for doing summarization. And think about the possibilities of social networking or productivity-app type features enabling all sorts of new possibilities for collaboration among people at different universities!

But the real big play is that once you do all this, you’re well on your way to replacing universities themselves, which any undergraduate can tell you are bloated enterprises which spend large amounts of money and pass the costs onto their customers, who accept it because the university system has a monopoly on giving out credentials for people going into the working world.

One of universities main products is research, and in many fields (biology, physics) you need the big backing of university (and government) dollars to support research. In many other fields (math, Computer Science, philosophy) you don’t. Researchers in these fields usually need to somehow pay their living expenses, and the actual equipment expenses are minimal. They mainly need: -a place to find like-minded collaborators -credibility for their work (ie, ability to publish in journals). You could give them both of those things. Now people in these fields wouldn’t even need to choose the career path of grad school and then professorship (in other words, staying their entire life in the university monopoly) in order to contribute their research to humanity’s body of knowledge.

So in other words, what you need is to build a HN/Reddit style voting/peer review system that weights the credibility of the voter heavily. Then you need to find some early adopters who are credible enough to lend your own site credibility. Then you could be well on your way to reinventing the academy in a way that is much more democratic and makes its results much more widely available and usable by the public.

Anyone want to build this? My email address is in my profile. Or just go ahead and use this idea yourself - I just really want to be able to use this service somehow, though probably more as a consumer than a producer of research. Maybe someone who actually went to grad school and had lots of papers published themselves would be in a better position to build this idea.

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

Learning the COLEMAK keyboard layout

Sunday, July 20th, 2008

I have determined I want to learn the COLEMAK keyboard layout. The point? It’s not about typing speed. The layout requires far less wrist motion than qwerty, and it feels very confortable. You can see that your hands are not moving much (the author claims that your hands travel 2.2 times more on QWERTY).

IF you spend most of your time using your keyboard (and if you are an academic, chances are you do), this one-time investment of your time might be worth it. We only have a set of hands for life, and if you imagine all the papers you should write in a lifetime stacked, you’ll feel the immediate urge to protect your hands :).

I’m not a touch typist on QWERTY, and wanted to learn touch typing, so I decided to go with COLEMAK instead.

It seems that you can only learn for about an hour and a half a day, and thus it will take a month before you can do any work at all. Some people have tried to go cold-turkey, but I have to get actual work done. If you peruse the forums, there are people posting detailed reports on their experiences.

There are lessons available in the website; one is supposed to go through them till reaching 96% accuracy or more. They recommend against relabeling or reorganizing the keys. Instead, the way to go seems to be to tape a copy of the layout on your monitor, like this:

BTW, If you suffer from back pain I have friends who swear by John Sarno’s book.

I’ll post more on how things go for me on the new layout. The good thing is that it’s not an all-or-none change: I can still do QWERTY when I have to get something done under a deadline.

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

What you should read before starting your PhD

Thursday, July 17th, 2008

Here is one of the best summaries of how things can go wrong when one chooses to follow the academic path. I got this from Hacker news. The author of this well-written piece came from the industry, and compares the world he knows with what he encountered at the academia.

Things he finds:

  1. Doing a PhD is lonely
  2. Your picking the right advisor will determine your happiness level more than anything else
  3. The way you code within the academic world has nothing to do with the way people code in the
    industry

But maybe we already know this.

What I’d like to see is someone writing a similar piece on life after your PhD. I had this silly idea that things would be easier and I’d have more time after my PhD Thesis for… you know, hobbies and other stuff normal people do. Nothing farther from reality.

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

Science in the 21st Century

Monday, July 14th, 2008

(Conference announcement via Gerry McKiernan)

Science in the 21st Century: Science, Society, and Information Technology

Waterloo, Ontario, Sep 8-12, 2008.

Times are changing. In the earlier days, we used to go to the library, today we search and archive our papers online. We have collaborations per email, hold telephone seminars, organize virtual networks, write blogs, and make our seminars available on the internet. Without any doubt, these technological developments influence the way science is done, and they also redefine our relation to the society we live in. Information exchange and management, the scientific community, and the society as a whole can be thought of as a triangle of relationships, the mutual interactions in which are becoming increasingly important.

(more…)

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

Who needs theories when one has lots of data?

Tuesday, July 8th, 2008

This article poses an interesting question. Sometimes one has enough data to make accurate predictions without having an understanding of what causes the phenomenon (a model). Nowadays, it’s getting easier and easier to get huge datasets, which are often sufficient to do this.

For example… Google uses massive amounts of misspellings to give ‘on the fly’ corrections. It also uses massive corpora of bilingual texts, such as their French/English translation engine by feeding it Canadian documents which are often released in both English and French versions. But they don’t have any theory of language doing smart stuff in the background.

So are theories redundant, or obsolete, in a world where one can do proper predictions without them?

Wired’s own Chris Anderson explores the idea:

Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show.

The point here is that statistics can find patterns in basically any area; so maybe we don’t need an specific science to take care of those problems.

There are issues with this line of thinking. Of course, correlation doesn’t imply causation, so doing just this we’d be blind to cause-effect relationships:

Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.

Comments by Deepak:

We all know that more data means new approaches to science, especially since this has happened so quickly.

We’ve always worked with partial understanding, or in the case of medicine, less than partial understanding, but that’s precisely why medicine is beginning to fail. Not knowing mechanisms, etc is what results in a VIOXX. Not knowing why is what creates the next disaster.

Trying to solve the exact same problems as Google, we have a camp that does think that knowing ‘why’ is important: the semantic web proponents. Under this paradigm, the web would become a huge ontology. And machines would operate with propositions (RDF triplets) to deduce new knowledge. In this case, you do know how the machine reached certain conclusion. They do face the same huge datasets (i.e., try to operate with ‘the entire web’ at some point; not now, since only a small fraction of the sites use RDF at all), but instead of using the raw content that is prepared for human consumption, they will use machine-ready content.

If after plowing though petabytes of data, a semantic search engine reaches an interesting conclusion, at least it can show us the logical path it used. The promise for pharmaceutical companies is that they could find new drugs and interactions by just letting the algorithms traverse a corpus of, say, proteins. But, again, in this case, there is no ‘human’ postulating a theory either.

Probably, what all this means is that we scientists will need to adapt our methods to collaborate with these smart machines. There are things, like deep search, that are better left to them; whereas some other, like tagging images, are really hard for machines but trivial for humans.

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

And the academic most likely to accidentally eradicate human life is…

Sunday, July 6th, 2008

Ok, this is just a quick, relaxing post.
In your view who is the academic most likely to accidentally eradicate human life?

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!

How to read a book

Wednesday, July 2nd, 2008

My typical day can be divided roughly into thirds: part administration, part analysis/thinking, and part reading. And while I like the new ideas that come with reading, it can be awfully tedious at times. Who wants to spend all day plowing through a big stack of papers and books, especially now that it’s summer? So while this title may be a bit jokey, I’m going to share some serious tips for how to speed up your reading of the most time-consuming materials: books.

(more…)

If you enjoyed this post, make sure you sign up for our mailing list or subscribe to our RSS feed!