You might recall that I’d mentioned the 1940 census indexing project, run in part by familysearch.org. Last week familysearch.org announced that the indexing work had been completed:
The timing of this announcement is a big deal because the original estimated time frame for completing the indexing was 6 months (or at least by the end 2012). Instead it only took 124 days, very impressive!
What I want to focus on though is the link in that tweet, it goes to http://eepurl.com/oeasH
My first thought was that they have a fairly active blog, so this link must go to a new post at https://familysearch.org/blog/. Not so, instead it goes to a MailChimp page at http://us2.campaign-archive2.com/?u=b0de542dc933cfcb848d187ea&id=c6e095aa92
Great announcement, but it should have be driven to a post at https://familysearch.org/blog/ instead. Why? First and foremost it is about owning your content. This is a big deal that potentially many people will want to link to and share, why would you drive them all to a third party site? Worse yet, what if something happens to MailChimp and that URL stops working? Next, you decrease the likelihood that someone reading the announcement will read through other pages on familysearch.org. There are only two links on that page that go back to familysearch.org and they are part of content, towards the very bottom. None of the navigation or other outside elements link back to familysearch.org.
FamilySearch, this is one of the reasons you have a blog in the first place, a portion of your site where you can make big announcements like this. Since you have comments enabled on the site it is also a place for your users and fans to talk about this achievement. Posting this announcement on your blog would have been better for familysearch.org and it would have been better for your readers and users.
With the 1940 census images now available to the public the folks over at familysearch.org have added those images to their volunteer indexing service. They have a map now showing the progress of indexing the census data, state by state. Here is what it looks like today:
familysearch.org 1940 census indexing
Delaware was the first state to have 100% of the indexing complete. Colorado, Kansas, and Oregon are all over 80% complete. The indexing effort is very impressive. During the first three days over 2.5 million records were indexed.
The folks at familysearch.org have been developing a new site geared around family history being told through stories: kinfolio.com.
I had the opportunity to read through a preview of some of the ideas being developed for kinfolio. One of the goals is clearly to make recording family history information as simple and direct as possible.
Development is still on going, so if you go to kinfolio.com right now the only thing you can do is signup for email news and updates.
There are plenty of questions left about how exactly things will work. For instance, if you already have family tree data at familysearch.org will you be able to import that information? Will familysearch.org be able to import information from kinfolio? Right now there isn’t an export feature for familysearch.org, will kinfolio have an export feature? Hopefully more information will be made available prior to the site going live.
Either way the preview looked interesting enough that I’ll be giving the site a try once it goes live.
I’ve been using familysearch.org for genealogy research and it has been wonderful to have so much information available for free. And they continue to add more and more information to their search index. It is that second point that I’ve been thinking about lately.
Search only shows me what matches were found today. If their system doesn’t have what I’m looking for today, it might later on. So I go back to run the same search in hopes that new records have been added. I don’t know exactly how often they add new data to their search database, making my results very hit and miss (mostly miss). Clearly there should be a better way.
One option would be to use their person search API (requires an account with their devnet service) to see if new matches show up. Writing a script to do that once every 45 days wouldn’t be terribly difficult. But that would be horribly inefficient, in most cases I’d be making queries that don’t turn up any new data. (after trying the person search API I discovered that it only searches people you have already entered as part of your family tree, not the familysearch.org search index)
A better way would be to have a system that allows me to register a search that is automatically run against copies of new data as it is added to the system. Any matches could then be emailed to me, with links to the full records. Something roughly along the lines of Google Alerts.
I like this idea so much that I’d be willing to build it if they had an API that provided the data updates.
UPDATE ( 24 Feb 2012 )
Here is a perfect example of why this feature is needed. Familysearch.org just added more than one million records to the search index. The only way to determine if it has information that I’ve already looked for is to re-run my previous searches.
I recently came across the1940census.com, which is, in their own words:
… a joint initiative between Archives.com, FamilySearch, findmypast.com, and other leading genealogy organizations, will coordinate efforts to provide quick access to these digital images and immediately start indexing these records to make them searchable online with free and open access.
Infrastructure is already in place to leverage volunteer time indexing images of the census records.
Like many others I’m excited to see this information available online and searchable ( for free! ). I’ve got a few relatives that I’m hoping to find out more about, and they definitely fall into the 1940 timeline.
Family Tech | Technology tips for genealogists and family historians..
Powered by WordPress, nice! Now if they could just ditch the captcha requirement for comments.
FamilySearch.org posted some interesting information recently.
First up, videos about the Granite Mountain Records Vault:
The vault is in Little Cottonwood Canyon, about 15 minutes from my house. Too bad they don’t do tours, I’d love to see that in person.
One hundred years initially to scan all of those images, ouch! Bringing that number down to ten is huge. Even at ten years though, that is a tremendous amount of work. Talk about your large data sets. I wonder if they produce graphs of the data size over time. No matter how you look at it, that is an amazing challenge.
And if catching up with the already collected data isn’t enough, more is coming in every day. The indexing project has added 100 million new records during the first half of 2010 and expects to hit 200 million by the end of the year.
It’s neat to see all this data come online. Somethings I can’t figure out though, like why http://pilot.familysearch.org/ is done entirely in Flash. They appear to have recreated HTML using Flash, with less usability. Yuck! It’s 2010, I know you can do better than that!
And what’s up with those crazy URLs, http://blog.fsbeta.familysearch.org/node/861? Friendly URLs are good for people and search engines. No reason for a blog not to have friendly URLs; the first post on blog.fsbeta.familysearch.org is January 2009, well after pretty much everyone else figured this out already.