The Newseum displays over 600 daily newspaper front pages from around the world in their original, unedited form (they also have a map version). Pretty cool.
Dave Bednarski, a Signal vs. Noise reader, sent over a really cool Mac OS X Automator script (download the script) to pull the front pages from the papers you like and combine them into a single PDF. It’s a great way to build your own front-page-only headline newspaper!
Download the Automator Script (original updated by Karim)
Dave Bednarski
on 16 Apr 08One additional note… I’ve noticed most US front pages are updated for download around 6:30AM EST… if you’re thinking about setting it up the run automatically each day.
Paul M. Watson
on 16 Apr 08Nice script, works well and is useful. Thanks guys.
Dan Glegg
on 16 Apr 08That’s awesome. Automator is sorely underused for all the potential it has.
Mauricio Gomes
on 16 Apr 08I made some changes to the workflow so it doesn’t create so many PDFs on your desktop. I create a tmp folder on the desktop and then concatenate all the headline PDFs into a single one called “Todays Headlines.pdf” on the desktop and then open it. I then move the tmp folder to the trash. Very cool idea though!
http://edge14.com/headlines.zip
JF
on 16 Apr 08Mauricio, can I download that script and host/link it on our server?
Randy Kopplin
on 16 Apr 08Any suggestions on how to run similar script in Windows environment?
Mauricio Gomes
on 16 Apr 08JF, absolutely.
Dave Bednarski
on 16 Apr 08By turning into images you lose the ability to zoom in, etc…
JF
on 16 Apr 08Good point, Dave. Any ideas on how to take what Mauricio did and use PDFs instead?
Mauricio Gomes
on 16 Apr 08That’s true Dave. That is a cool feature to have. I changed it to be more like yours, as in it doesn’t generate a new PDF, but rather just displays it in Preview.
Although, have you guys had it move the temp folder it creates to the trash? I notice in my workflow it has my particular username in there, not sure if it replaces it with ~/Desktop in the workflow.
Re-download from my original link.
Tim Lundin
on 16 Apr 08Great concept / idea and having looked at both Bednarski’s version and Gomes version I have a couple of comments.
While both take on the concept to give the reader one document to read in preview, I like the HD effect that Bednarski’s renders the reader vs the one size fits all “New PDF from Images” version of Gomes which Bednarski’s already provides except in a better quality that stands out and says “read me”.
On that same note, I like the idea of setting this up in iCal to run every morning at 6:45am and when I go to my computer, my Newspapers are open and waiting for me and I have no choice to to look at them or close them. Bednarski’s version allows for this, Gomes saves it to the desktop and I have to remember to open it. If you have ever seen my desktop, you will know that it won’t happen.
With that said, I like the disposal function that Gomes uses to rid the excess files.
Lastly, one might consider adding the “mail” function onto this script and use it when your out of town to have your newspapers sent to you or others when you might be traveling.
Karim
on 16 Apr 08Jason, I’ve combined the best of both into a new version that uses the /tmp folder for storage and still generates a single PDF named ‘Todays Headlines.pdf’ and emailed it to you.
Owen van Dijk
on 16 Apr 08Is this Leopard only? Can’t get it to work on 10.4
JF
on 16 Apr 08Thanks Karim. I’ve replaced the link with the updated file.
Jason
on 16 Apr 08I just visited the Newseum and spent 6 hours there. Anyone visiting DC should check it out.
Mauricio Gomes
on 16 Apr 08Karim,
The only problem with your version seems to be that it doesn’t empty out the PDFs dropped into /tmp. Every time the script runs it just appends -2, -3, etc to each of the PDFs. After the first run for me, it also seems to not work.
Check out the updated one I mentioned above. It opens up the combined PDF in preview as well as clean up after itself.
Also as a sidenote, I added the app to my applications folder and now launching it with Quicksilver is really awesome.
Martin
on 16 Apr 08Randy: I have never seen an app like Automator for Windows … I’m sure someone knows a way to do it tho. Automator is an amazing little program.
Tomahawk
on 16 Apr 08@Jason I would visit it, if I wasn’t so expensive. $20 a person is ridiculous, considering everything around it is free (I’m not say it should be free, but $10 would be more reasonable). If I were go there with my wife and child, it would easily cost me over $100 after paying for all the meals and getting a lite meal.
Josh Walsh
on 16 Apr 08Very cool idea. Thanks for sharing this.
Karim
on 16 Apr 08@Mauricio: I don’t mind it creating files in the /tmp directory, that’s what it’s for and it gets deleted when you restart your computer anyway. There does seem to be a bug with it, due to it trying to rename the file and it can’t rename it if there’s already one there. So I just took that part out from the workflow and don’t really care about the filename being something random since it’s in tmp anyway. If I really want to give it a good filename and save it, I can do that myself in Preview.
Aaron Kassover
on 16 Apr 08Cool! Feels like we’re watching the rapid evolution of a miniature open source project right here in the comments. Dave B., thanks for sharing your idea with us. Maricio and Karim, thanks for taking the idea and running with it.
dimboo
on 17 Apr 08Hmm, i’m not getting this to work on 10.4.10. The pdf-urls that “Get Link URLs from” returns all look like this (other urls look fine):
http://media/dfp/pdf17/WSJ.pdf
Any clues?
Josh
on 17 Apr 08This is great. Not only will the workflow application be incredibly useful, informative and fun to use, but I’ve just learned quite a bit about Automator, and can’t wait to start using it for other applications!
I have checked out both the versions, Dave/Karim’s, and Mauricio’s. It was confusing at first because the workflow files of both don’t match up with their companion “Application” files.
I was thrown for a loop when I first downloaded and ran Dave/Karim’s, I didn’t understand why I was only getting one paper. Turns out, in the Application file, only LA Times is enabled. The original workflow though (the one in this article’s screenshot) includes four papers that are all enabled.
In Mauricio’s, the workflow includes the same four (that are all enabled.) But in the accompanying Applications file, three more MI-based papers are included (and enabled.) Also, his workflow creates a “New PDF from Images” but the Application instead “Combines [the] PDF Pages.” As mentioned above, the “New PDF from Images” step isn’t ideal for this purpose. Not only does it remove the ability to zoom in as cleanly, it also renders the final output’s text to be unsearchable. With “Combine PDF Pages” instead, not only is the final document fully zoomable with beautiful results, but the text itself on the pages is searchable, which is super-cool to see in action!
Now, I like Mauricio’s strategy of creating a folder on the desktop for the files to get downloaded into, and then having that folder be moved to the trash. Since I’m on laptop that’s not always on, I plan to run this Application manually from time to time, so having items moved to the trash for me to empty later isn’t that big of a deal. The alternative, of having the downloaded - not so tiny - files clog up the /tmp folder, doesn’t sound as appealing to me… especially since I rarely restart this laptop. (So they’d be sitting there for quite awhile.) Every time I ran the Application I’d be thinking of that /tmp space that wouldn’t be reclaimed until I restarted the laptop, whenever that would be. Mauricio’s method also means I can go in and save/print/email any of the documents that day if I chose to.
I’ll keep tinkering and combining my favorite aspects of both… adding a few other papers’ headlines I’d like to see, and seeing what other Automator goodness I can get going.
One other question, if you’re still reading. I notice the original download is called “Headlines – random” and didn’t understand why. But that inspired me to think… would there be any way to incorporate randomness into the workflow? That is to say, have it also download a random Newseum PDF from somewhere in the US or world? I’ve got mine set to download the same 6 papers whenever I run it, and I imagine that could get stale/repetitive rather quickly. It might be nice to be surprised/enlightened by the inclusion of some other, random headlines from new papers as well.
I did some googling and found this Newseum project from someone with a similar idea: http://roger.carbol.com/newseumnow/index.html ... but his “Random Page” appears to be dead. If there’s a link on Newseum’s page that includes a random daily PDF we could use that, but I wasn’t able to find one. Anyone else?
Thanks again for sharing such a cool idea. I’m eager to use it, and pass it along to others.
Josh
on 17 Apr 08Speaking of “random,” it seems like the resulting “Combined PDF” shows the headlines in no particular order, and it changes each time I run the workflow. It certainly doesn’t match the order I set up in the “Get Specific URL” stage.
I’d like my local paper to always be the first page, the first displayed. Any way to accomplish that?
jc
on 17 Apr 08Cool! I’ve never used Automator before, finally something useful for it to do. Now how do I attach the resulting document to a Mail message to send out? Also how can this script automatically run every day at a certain time? Thanks in advance.
stuart
on 17 Apr 08@jc – save it as an application, and in iCal, set an event for whatever time of day you like, in the alarm field, set the alarm type to ‘application’ and find your automator app in the file select wnidow. Set the alarm to go off whenever you want to invoke the download process.
I have this set-up and working, and to alilieviate the problem of s many PDF files lying about, I’ve just created another automator app, and scheduled it to run 15 minutes before the orginial one each day, to wipe all of the PDF files from my ‘papers’’ folder. This is such an awesome idea. I love the thought of having the front-pages waiting for me when I sit down to my computer in the morning. Wish I had the back pages too :(
winkyeah
on 17 Apr 08Very interesting! I am also having a problem with 10.4. The filter url’s block strips out the host names so we get URL’s like this:
http://media/dfp/pdf17/CA_LAT.pdf, http://media/dfp/pdf17/NY_NYT.pdf, http://media/dfp/pdf17/DC_WP.pdf, http://media/dfp/pdf17/WSJ.pdfIs there any way to insert the original host names?
Rahul Sinha
on 17 Apr 08I second Josh’s request; I modified Dave/Karim’s to spit out the file to my desktop (and write over the previous day’s) rather than opening Preview automatically, from a temp file, but have no idea how to space the image downloads such that they arrive in the order the download requests go out…
Thanks! -RS
Josh
on 17 Apr 08@stuart, re: “Wish I had the back pages too :(“
There are quite a few services out there that deliver full PDF-style files of newspapers, with the same formatting and full articles as their dead-tree counterparts. (Sort of like Zinio, but for daily newspapers, instead of magazines.)
Of course, they cost some money.
Here are some, and I’ve pretty sure I’ve heard of more: http://select.nytimes.com/gst/timesreader.html http://www.newsstand.com/ http://www.newspaperdirect.com/ http://www.pressdisplay.com/pressdisplay/helpandsupport.aspx?subpage=PressdisplayOverview
Sure, it’d be great if this Automator workflow included complete articles in the PDF, or clickable links to the online stories, but it’s free, and great for a quick glance at the headlines in the morning. If I want to read more I guess I can just go to the parent website of that newspaper and read it there. This workflow is great for my casual interest in the news, not to mention my limited attention span. :)
jc
on 17 Apr 08@stuart, thanks! I have it working nicely.
When this script saves the PDF to ”/tmp” where is this folder? What would need to be done to save it to another folder and only keep the current day (or two or three) copy?
Mustafa
on 17 Apr 08There is a bug in the Tiger version of the Automator. It will not resolve relative URLs correctly. You can see the relative URLs in the page HTML.
I believe it is fixed on Leopard.
Josh
on 17 Apr 08One weird thing I am noticing about Automator. If I edit an Application file I have previously made, and then try to save it, the changes don’t stay.
I have to go back to the original workflow file, make changes there, and then “Save As” an Application to make my changes stick.
I am on Leopard.
Andy
on 17 Apr 08I’m running Tiger and had the same problem mentioned above. I entered the pdf url directly (if you click on the “Readable PDF” link at the top right) and then I removed the “Get Link” and “Filter URL” steps in the workflow. Works great.
Thanks for all the ground work though.
Josh
on 18 Apr 08So I created an iCal schedule to launch this, every morning at 8 AM.
I have a MacBook Pro so I just figured it wouldn’t run if the lid was closed or was otherwise sleeping. Not so!
I was happily surprised to see it runs anytime after 8 AM, just as soon as you wake your laptop up.
Not sure why I expected otherwise, but I did, and if anyone else out there did too, then this note’s for you.
Rob
on 18 Apr 08I’m not quite sure I follow you Josh.
So you’re saying this will NOT run while the Mac sleeps???
Or it still will, while the lid is still closed???
Or it won’t run UNTIL you wake your Mac??
I’m very confused…
Josh
on 18 Apr 08Nothing happens at all when the MacBook’s lid is closed.
(I’m not sure what would happen on a desktop that was sleeping. I suspect it would run fine, waking up the Mac as necessary?)
All I’m saying is, with an 8 AM scheduled kickoff for this task, when I open and use the MacBook instead at, say, 8:17 AM, the Automator application launches right away. For whatever reason, I’d expected “missed tasks” to be skipped over, and was pleasantly surprised to see iCal actions perform more intelligently in this regard.
Make sense now?
Duncan
on 18 Apr 08Thanks for putting this together. I made a Vancouver edition of this automator script. If anyone is interested they can grab it here:
zip file | blog post
Lazarus
on 19 Apr 08Fantastic idea for a script! I was already aware of Newseum but I never even thought of setting up an Automator script to grab them on a daily basis.
I have made a couple of changes to my own version though. I would suggest adding two “Rename Finder Items” so that you don’t end up with an arbitrary filename. I have the first rename set to ‘Single Name Item’ -> ‘Basename Only’ to “Daily News”. The second is set to ‘Add Date or Time’ so that I can have the file dated so I know when it was created. The formatting is up to personal preference but I currently use ‘Created,’ ‘After Name,’ ‘Space,’ ‘Year Month Day,’ ‘Dash,’ ‘Use Leading Zeros’ which produces a format like:
Daily News 2008-04-19.pdf
Which I find works quite well.
(NOTE: If you are renaming the file, you need two more actions before the rename. The first, “Get Selected Finder Items” to grab the relevant file, but this will NOT WORK on its own because that also grabs the script. You need to add a “Filter Finder Items” configured to ‘File type,’ ‘is,’ ‘PDF File’ which will make sure only the news file is grabbed. With that said, it was originally doing it when I was first working with a newly saved file, and it was finding the old filename as part of that action. Now that I have closed and reopened the new file clean, the action is only returning the PDF. Still, it may be an issue some people run into, and it certainly doesn’t hurt to just double check that the correct file is being handled.)
So in summary: 1. Get Selected Finder Items 2. Filter Finder Items -> File type is PDF 3. Rename Finder Items -> Name Single Item, “Daily News” (or some variant) 4. Rename Finder Items -> Add Date or Time (as desired)
Once this is done, I also added a “Move Finder Items” to move the file to the Desktop, but of course you could put the file somewhere else into a folder so that the dated file format would allow you to build up a collection of whatever newspapers you read.
Timestamping the file isn’t going to be the way to go for everyone, particularly people who just want to be able to open the exact same file and get the day’s headlines without a pile of them building up, but it’s a good way to make sure that your script ran properly if you schedule it to run, it lets you keep an eye on when the file was created, and if some interesting story comes up that you want to keep but forget about it, you don’t want a script coming along and wiping the file. Even without the timestamping though, it’s still a good idea to rename the PDF and drop it in a more relevant folder.
In terms of putting the files into another temporary folder as Mauricio suggested, I think that’s a really good idea because like Josh, I don’t fully reboot the machine too often, and I don’t want a lot of temp files clogging the system up until then. The way I did it (I haven’t looked at Mauricio’s variation as yet, it may be the same) was to create a “New Folder” action at the very top of the workflow (so it doesn’t interfere with the I/O of anything else) and set it to create a folder called “News” on the Desktop. At the bottom, I did a “Get Specified Finder Items”, defined the ‘News’ folder, then a “Filter Finder Items” to exclude a file with “Daily News” (the name of my file) in the name, because otherwise it seemed to dump my new file into the trash as well. Finally, I added a “Move Finder Items to Trash” action which dumped the ‘News’ folder into the trash.
In summary: 1. New Folder -> News (as first action) 2. Get Specified Finder Items -> ‘News’ folder on Desktop 3. Filter Finder Items -> File Name does not contain ‘Daily News’ (or personal variation) 4. Move Finder Items to Trash
It also means that if you do find a particular page you want to keep you can go and retrieve it a little more easily from the trash than from the tmp folder.
I was hoping there would be some way to preserve the order of the URLs without using an external script. I originally sorted the URLS into the order of preference, because I added quite a few newspapers. The closest I got to something like that was to “Rename Finder Items” with a timestamp at the beginning of the filename, on “Seconds after 12M” which accurately renames them in the order they were created which would lend itself well to being passed into the combine PDF action. The problem with it is that there is a fair bit of variation in the size of the files, from around 250kb to 1.8mb in the particular ones I have been working on, so inevitably certain files download before others and that completely skews the timestamping process, since they are timestamped from when the download ends, not when it begins.
I also added a “Set PDF Metadata” action near the end. Completely unnecessary, but I thought for the sake of internal asthetics I’d add a few details to the file just to indicate its title and description.
Josh
on 19 Apr 08Thanks Lazarus. I’m incorporating your tip to rename the resulting combined PDF from the seemingly random “SVyfcp” or “YMfhya” type of names it was getting initially, into “Daily Headlines 4/19/2008.pdf” instead.
Much nicer and helps with troubleshooting too.
I have the script put the “Daily Headlines Temp” folder as well as the combination PDF into the trash for me automatically because my usual workflow is to just glance at the headlines in Preview, and not want to keep the files around. I’ll just scroll down and quit Preview. But I can still go into the trash and extract whatever PDF I might want to save. Most likely I will just empty the trash later in the day.
It’d be nice if Newseum offered full archives for browsing/saving, but it appears that due to copyright they only offer a few “notable” dates of important events.
Howard Weaver
on 20 Apr 08I’m a news executive and would like to have a way to automatically download front page images from our papers that I can turn into a Mac screensaver. This won’t work with pdfs, will it? Could somebody help me figure how to accomplish that?
Thanks,
\-\/\/ (email response appreciated)
Matt
on 20 Apr 08I’d also be interested in a Windows way to do this -
John D
on 21 Apr 08Love an XP way to do this also
benwikler
on 21 Apr 08This is great! It would rock to make a web service that would do this and automatically stick the generated document into scribd, so that a user could access it from any computer she wanted. Also, put up a list of available papers (searchable by name, region, etc) so that she could simply tick the boxes of the desired papers.
And: this could be-should be-a stand-alone app. With settings to make it happen every day and, say, automatically print, or automatically email the results to a list (a staff list, for example).
Woo-hoo!
Mike
on 21 Apr 08Yeh, I’m an avid watcher of the newseum site. This is a great trick, however, I’m still running tiger and it’s not working for me. I read through the board and have not found a solution thats working for me. Any clues for Tiger?
Matthew
on 21 Apr 08This is fantastic—Since the iPhone supports viewing PDFs I’m adding some additional actions to upload the final PDF to a webserver. While walking to the train in the morning I can hit the bookmarked URL and have all of the front pages load up and be ready for reading.
Thanks for the tip!
benwikler
on 21 Apr 08@Josh, @Rahul, @Lazarus—I think the only way to get everything in a particular order is to download each newspaper separately and rename the file. This is a pain to set up; anyone have other ideas?
Rob
on 22 Apr 08I am new using automater how do you set it up to do it automaticaly everyday?
greet
on 23 Apr 08good article,and,welcome to my sites,about blogshome, niceyfood, sunnylifes, hot music all, nicemovie, nicfilms, soaringhealth
This discussion is closed.