I’ve been using my Kindle (the vanilla one) for a while now. What a beautiful device.
Because of my line of work/study, I spend inordinate amounts of time staring at a stupid computer screen. It gets a bit old, especially if I’m trying to read a long article. Backlights are bad for eyes. There’s things you can do to reduce eye strain, but in the end it’s going to be more comfortable to stare at pieces of paper than it is to stare at a monitor.
This is where my Kindle shines; the display is basically just paper. Now, rather than being glued to my computer, I can take news, articles, and books with me as if I were carrying around a bunch of paperbacks. I’m sure you’ve already heard this sales pitch before, so I’ll spare you.
Really, I just want to talk about a few things I’ve used to get even more mileage out of my Kindle.
Tools you should totally look into if you have a Kindle
Readability is an awesome service. Basically, it extracts articles from websites. Say, for example, you’re trying to read something on one of those stupid web-3.0 zomg tech news websites and it’s trying to bring down your browser and distract you. You’re sick of the bullshit, so you click the Readability browser extension and ask to just have the article.Now you’ve got a nice, clean copy of the stuff you care about. Awesome.
In addition to the article extraction stuff, Readability is a “bookmarking” service of sorts; you can save articles for later and read them on other devices (iOS, Kindle, etc.) This is the part that lets your Kindle really shine! The browser extension lets you send articles directly to your kindle.
I’ve just started using this service, and I’ve been blown away by it. Kindlefeeder is a handy service that takes RSS feeds from different websites and compiles them down into a beautiful eBook that it sends directly to your device.
This works best for blogs; I’ve found that tech news sites like Hacker News and Slashdot don’t work well, since they’re really just links to other news sites.
The other day I threw this together for fun. It’s a bit of a hack, but there’s enough going on that it’s worth writing a little about!
The premise is simple: I’m writing a lab report in LaTeX, but most of the data I took was in Excel (what can I say, it’s not bad for calculations…don’t judge me). I want to throw a nice LaTeX table into my report but don’t want to do everything by hand or save it as a CSV file and use a conversion tool.
That’s where Excel => LaTeX comes into play! With the fancy new HTML5 file stuffs it’s possible to parse XLSX files in the browser without having to run any code server-side! All you have to do is grad an XLSX file into the browser and it spews out a LaTeX table. Howcoolisthat?!
XLSX and Office Open XML
Before we dive into the implementation, lets talk a bit about the XLSX / Office Open XML formats.
Microsoft is totally down with the open standards thing, yo. They’re nice enough to go as far as creating an ECMA standard format for all of their Office apps. Granted, the real formats expand quite a bit on OOXML, but even then they publish how those formats work.
For poking around purposes, I created a simple Excel file:
First things first: XLSX files are zipped, so we need to unzip the file first. Here’s the contents:
What’s going on here? Lots of things.
Confession: I didn’t read the spec. I figured all of this out by poking around the XML files…so I only understand a very small subset of the format :(
The only directory we care about is xl/, where the data lives.
Here are the important files:
xl/worksheets/sheet1.xml - the data
xl/sharedStrings.xml - table of strings
Lets take a look at sheet1.xml:
It’s fairly obvious what’s going on here: <row> denotes a row (with the r attribute being the number), <c> is a column, and <v> is a value.
But wait! Where are our previous letters and strings?! They certainly aren’t in any of the value tags!
Calm down. They’re safe and sound in sharedStrings.xml.
Columns have a “t” attribute which denotes the data type. In the case of strings, we see t=”s”. This means we’ll be able to find the string in the string table at the index given to us in the v tag.
So what’s in sharedStrings.xml?
No explanation is really needed here.
Now, to the actual conversion code!
HTML5, files, shenanigans
The main feature of this is being able to drag files into the browser for conversion. HTML5 provides a super kewl API for this sort of thing.
Everything is as simple as the dragover and drop events. Lets take a look at a sample:
The dragover event gets triggered when the user starts dragging the file into the browser window. Most websites use this to display some kind of clue to the user that things are happening.
The real magic happens on the drop event. This gets fired when the file is actually dropped. As you can see, we supplied an event handler to kick off the processing.
Also note that each event has a call to stopPropagation() and preventDefault(). We want to make sure that the only actions that happened on these events were specified by us.
Now lets take a peek at the event handler:
We can grab the File objects passed in with the dataTransfer.files thingamajig. You can find more info on the File object in the W3C spec.
Thankfully, though, someone was one step ahead of me and write zip.js, which takes care of all of this magic for me. It’s a beautiful library.
Passing in the binary blob we got in the previous step, it spits out an array of File objects for us to use.
This is all straight out of the zip.js documentation for the most part. Nothing crazy, we just give a callback for each file that gets read in.
Note that we filter out all files that aren’t the worksheets or the string table.
Another thing to keep in mind: all of this file IO is asynchronous…you can’t read the file and move on to other code that requires data from it. Structure things in such a way that all of the file handling is separated into functions that call each other rather than procedural code which expects the file reading to block.
Extracting the data
So now that we have a buncha XML in the form of strings, we can start the parsing process!
This is pretty simple: we use jQuery’s parsing abilities to create a new DOM tree as a jQuery object. From here we can use our favorite jQuery selectors to grab the data.
Here’s an example where we build the string table (we do this first so we can substitute the string values when reading in the worksheet itself):
Pretty simple, eh? The string table is just an array; when values refer to strings in the worksheet they simply give us the index into the table.
Now for the main event: reading in the table itself!
There’s not much going on there; we read in each row, grab the columns, get the data. If it’s a string, we look it up in the string table and call latexEscape(), which escapes symbols with special importance in LaTeX so there’s no conflicts in the final output.
We’ve now got the data built up in a 2D array! We can now generate the LaTeX table!
This part was a little messy. I’d prefer not to go into the code, since I’d like to rewrite it in the near future. It’s a huge mess and I feel like there’s a few bugs in it. But here it is anyways:
This was a fun hack to throw together. It’s far from perfect, but it does exactly what I need it to for basic numerical and string data.
There’s a few things I’d like to add in the future:
* Handle formatting, such as the number of decimal places for numbers
* Text formatting, such as bold and italics
* Border styles