Getting Text from LaTeX without line breaks

Getting Text from LaTeX without line breaks

Sorry, this entry is only available in Japanese.

LaTeX文書からテキストを抜き出したい場面は結構あります.(私は主に英文校正に出す時ですが,世の中にはWord文書でしか受け付けてくれない○○なジャーナルとかあったりするので,Wordへの流し込みをするときにも必要ですね…) それを可能な限り手出しする量を減らしたい場合にどうすれば良いかのメモです.

pdftotextを使う

xpdfに付属するpdftotextを使います.Macなら,

$ port install xpdf-japanese

で一発です.

使い方ですが

$ pdftotext submit.pdf

とすれば,submit.txtにテキストが保存されます.
改ページ(^L)がいくつか残るのと,itemizeのポチが文字化けすることを除けば,テキスト自体の変換効率はとても高く,ほとんどそのまま使えます.

追記

xpdf には pdftotext が付属しなくなっていた.
替わりにxpdf派生プロジェクトのpopplerをインストールすればよい.

$ port install poppler

おわり.

2 months

2 months

After 2 months, my daily life finally become stable.
Taiji is going to a kindergarten and learning English step by step.
He can remember what native speaker said by ear and can pronounce as he listen. Very good job…

I thus started research works with SAIL.
We are now discussing three theme that can be collaborated with each other.
Students in KIT also involved in this collaboration, and perhaps I will invite some of them to Canada during my stay.

Dispatch to Canada

Dispatch to Canada

Osamu Mizuno applied to the dispatch program for young researchers in KIT.
Today, I have noticed that my application is accepted. I am going to stay in Queen’s University in Canada from March 2012 to January 2013.

For this reason, Software Engineering Laboratory will postpone the recruitment of new B4 students in 2012. But from 2013, we will restart to recruit new M1 and B4 students.