helps you use the command line to work through common challenges that come up when working with digital primary sources.
. . . click on a button and it will show you a basic command, broken down to show what each piece does.
. . . for more descriptive breakdowns, explore with explainshell.
. . . before executing commands, review and install dependencies.
. . . consider contributing.
Sometimes you want to take text data out of searchable pdfs, so you can begin to use text analysis methods and tools.
This will convert all text searchable pdfs in a folder to text files.
for file in *.pdf; do pdftotext "$file" "$file.txt"; done
Sometimes you want to generate text data from images of pages, so you can begin to use text analysis methods and tools.
This will convert all tiff images of pages in a folder to text files.
for i in *.tiff; do tesseract $i yourfoldername_$i; done;
Sometimes you want to remove HTML markup from webpages you save, so you can begin to use text analysis methods and tools.
This will remove markup from HTML files and convert them to txt files.
textutil -convert txt *.html
If you take notes in markdown, sometimes you want to publish them as html.
This will convert a md file to an html file.
pandoc foobar.md -f markdown -t html -s -o foobar.html
If you want to generate a simple slideshow from a text file that uses markdown syntax.
This will convert a txt file to an html slideshow.
pandoc -s --webtex -i -t slidy input_filename -o slideshow_name.html
If you have a bunch of PDFs of images, studying them as images computationally is easier if you change them to an image format.
This will split multipage PDFs and convert them to individual PNG files.
find ./ -name "*.pdf" -exec mogrify -format png {} \;
If you have a segment of a video that you want to convert to an animated gif for the web, and possibly even presentation fun.
This will split and convert a segment from an mp4 video file into an animated gif.
ffmpeg -ss 0 -t 13 -i inputfile.mp4 outputfile.gif
If you have a large video and want to resize it to make it more accessible.
This will resize a video by dimension.
ffmpeg -i inputfile.mp4 -vf scale=640:480 outputfile.mp4
If you have a video file and want to extract the audio.
This will extract audio from a video file and create an mp3.
ffmpeg -i inputfile.mp4 -b:a 192K -vn outputfile.mp3
Sometimes you want to save work on the command line, so you can remember what steps you took with your data.
This will save command line history to a txt file.
history > history.txt
Sometimes you need to edit many file names, so that you can make them more consistent.
This will remove 'foobar' or any variation in that space from every file.
for file in *.txt; do mv "$file" "${file/foobar/}"; done
Sometimes you want to download many documents or images that are linked from a webpage.
This will use a list of item URLs to automate download of items.
wget -w 10 --limit-rate=20k -i item_urllist.txt
Sometimes you want to traverse multiple levels of a website to download certain file formats like images.
This will save all the .jpg files within three links from 'http://www.foo.bar'.
wget -w 1 -r -l 3 -A jpeg,jpg --limit-rate=100k http://www.foo.bar
⚒ by Thomas Padilla and James Baker, adapted from ffmprovisr and Script Ahoy