Nominations for the 2008 Jolt Product Excellence Awards are now open. The deadline for nominations is November 12, 2008. Eligible books and products must have been published or had a significant version release in the calendar year 2008. Please do not nominate books in press, beta versions or unreleased products.
You may submit a 5-minute Flash video, PowerPoint or webinar consisting of 5-10 examples of your product’s most outstanding features and functionality. (The examples may be illustrated with static screenshots, real-time animation or an application demo.) White papers, customer testimonials and other information may also be made available. Although the supporting product information is not a requirement for submission, it does help us see what your product does and aids in the selection of Finalists. We highly recommend that you submit any supporting information that is available, as judges only fully examine and evaluate products in the Finalist phase.
For further information regarding the rules and frequently asked questions about the nomination schedule and process, please click the appropriate link.
For a description of the categories, please click here.
The cost for each nomination is $300.00 per nomination per category. You may select up to three sub-categories for each nomination. The cost for open source, non-profit companies is $40.00 per category. Products supported and funded by a non-open source parent company are not eligible for the reduced entry fee. Note that eligibility for the open source, non-profit rate will be determined on an individual basis.
Early Bird: Nominate your product on or before October 24, 2008, and receive a 20% discount off the normal registration price.
Prizes and Logos
Jolt Award Winners receive a can of hyper-caffeinated Jolt encased in Lucite, Productivity Award Winners receive a plaque, and the Hall of Fame inductee receives a crystal champagne bucket along with an ice-cold can of Jolt beverage. Award winners may also display the Jolt Award logo on their winning products, books or web sites.
Posted by admin, in Uncategorized on Oct 06 08 No Comments | Read MoreAs of 11/15, all indexes are back on a regular update schedule. News is updated daily, Podcasts and Blogs are updated weekly, and the full crawl of 24,000+ sites is now being incrementally updated continuously.
Emphasis in the search index continues to be on deeply search sites of utility for software developers.
In the queue for the next round of updates is better support for podcast and blog search, and performance improvements to the results clustering server.
You can access the search site directly HERE. If you want to bypass the Carrot2 Clustering Engine, go HERE.
Posted by admin, in Uncategorized on Nov 19 07 No Comments | Read MoreAs I was building FindITAnswers, three software tools were critical to managing my spider indexes. Where spider exclusion rules act as a first line of defense for maintaining the quality of the index, a few simple utilities on the back-end are also immensely valuable:
Merge Utility: Merges multiple indexes into one. This was an invaluable utility since FIA’s spider crawl was divided into 125 index segments. The indexes were organized around key platform vendor sites, and sites with similarly structured content. Using multiple indexes has lots of obvious advantages, including:
This was the Lucene utility I developed, and while in-elegantly coded due to my superficial Java knowledge, works as anticipated.
Kelvin Tan, developed two small utilities for me that are also critical to increasing the relevance of search results. When you don’t have a team of astrophysicists building your search algorithms, tools that improve the quality of your indexes can help your search engine a lot:
De-duplication Utility: An unavoidable byproduct of using multiple spider crawls/index segments is inevitable duplication of some pages. Rather than checking for and suppressing dupes at search-time, this simple utility looks at a merged index and deletes any duplicated pages.
Ad-Hoc Deletion Utility: This tool allows deletion of index records based on keywords, terms, wildcards and regular expressions — and allows for searching specific index fields. This is great for scrubbing pages that pollute search results — and catch anything that got through the initial spider exclusion filters.
Combining the simple utilities above with a good database of URLs to crawl, and well-planned spider exclusions can vastly improve the results your search engine delivers by feeding it higher-quality indexes.
In Part 4 of this series, I’ll discuss clustering search results — and my experience having the Lingo3G Document Clustering Engine integrated with Lucene.
Posted by admin, in Search on Oct 14 07 No Comments | Read MoreThere’s a lot to like in Office 2007, but the learning curve for the new UI is steep. If you’ve been a casual user of the apps you can probably quickly find the few features you’ve become accustomed to using, but if you live in the apps (like I do) prepare yourself for days of hunting for the new locations of your most used features.
Over time, the new ribbon bars do become handy time savers, but in the meantime prepare yourself for a big hit to your productivity. After a few months, I still find myself peridiodically going into brain lock trying to remember the location of a simple menu item or button.
There is hope. I recently installed Classic Menus for Office from www.addintools.com. Essentially this $19.95 utility gives you back your Office 2003 menu system.
I’ve found this tool to be a handy timesaver, and cetainly worth the price. The utility adds a new Office menu called "Menus."
Selecting "Menus" will give you a ribbon bar containing Office 2003 style menus and menu bars. If you prefer to work with the Office 2007 ribbons, they are still availabe to you in their standard locations.
There are only three drawbacks worth mentioning. 1) Load time for your Office apps will increase. On my machine by 2-3 seconds; 2) While this app will certainly ease the hits to your productivity during the first few weeks of upgrading to Office 2007, in the long-run it might actually keep you from discovering some of the suites’ nifty new features if you never expore the new UI; 3) I can’t pin this entirely on Classic Office Menus since I downloaded Microsoft’s Vista patches this week, but I have noticed some strange UI behaviors after installing the patches, that seem to be related to patch/menu interactions.
If time is money to you, this is a pretty inexpensive solution to managing your migration to the new Office 2007 user experience.
We recently moved out of our Bay Area ranch — a house that was pretty easy to set-up both wired and wireless networks in, into a bigger house with lots of hard-to-wire rooms.
On a whim I bought a few Powerline Netgear XE102 Wall Plugged Ethernet Bridges. In the past it seemed unnatural to plug an ethernet cable into such a small device attached directly to an AC outlet, but I finally broke-down and started testing this in our house. So far so good.
Powerline devices use your home’s electrical wiring to transmit data.
I have to admit I half expected my laptop to go up in smoke the first time I plugged it directly into one of these devices,
Set-up was a snap. One XE102 is plugged into an AC outlet and connected to an ethernet switch in my office, with the other devices plugged-in throughout the the house with a mix of PCs and wireless access points connected to them.
Soo far the devices have preformed flawlessly, and allowed me to forgo adding wired ethernet drops throughout our house.
Some reviews have noted that data transfer rates are low on these devices, but it’s higher than the throughput I’m getting from our cable internet provider.
Posted by admin, in Uncategorized on Jul 14 07 No Comments | Read MoreIn addition to the software-based factors influencing search like the quality of the indexing and retrieval algorithms, vertical search has an advantage over broad-based search engines because you as the administrator can constrain the content you crawl — and thus use human QA to make up for the deficiencies in purely algorithmic search. If you weed out irrelevant content, you can go a long way towards improving the quality of results. Some of the factors that you can use to influence the quality of your index:
Sites you crawl
Paths you include and exclude
Pages you include and exclude
Utilities to arbitrarily delete documents from the index based on pattern matching
Let’s look at each of these.
Sites You Crawl
I developed my own seed database starting with a few thousand sites related to software development, IT and the environment. You have a number of options for spiders to build your crawl database. I ran my spider letting it do 5 hops from each initial URL. After the crawl was done, I took the crawl logs for the URLs in the first 5 hops and used them as the basis for a second crawl — also 5 hops deep.
The second crawl became the basis for the FIA database that has subsequently been enhanced with other sites added manually.
Every two or three months the spider database is updated by using the crawl logs to add URLs in hops 2-5 has new root URLs.
In practice each iteration of the crawl produces deeper results since the crawls are starting at progressively deeper root URLs. In each of these crawls the spider is allowed to harvest pages on sites external to the root URLs.
Paths and Pages to Include and Exclude
As you analyze the results of your spider crawl it will become obvious fairly quickly which sites, paths and documents you’ll want to exclude from your crawl database and your indexes.
Virtually every search platform has the capability to create rules excluding certain sites or documents. You’ll want to exclude commonly linked-to sites like digg, technorati, NY Times, Yahoo, etc. You’ll also want to add rules to your search engine spider to ignore prevalent documents like login*; privacy*, aboutus*, *print=*, etc. — you get the idea. In practice this will become a long list — and is one of the keys to increasing the quality of the results you return for queries. You’ll also want filter rules that exclude gambling, porn, hotels, travel and other common search engine spam. Utilities to Arbitrarily Delete Indexed Documents
You’ll find that despite a good database of crawling rules, you still get undesirable results in your index. I had a tool developed that allows for SQL-style select queries against a Lucene index and allows deletions based on pattern matching and reg ex. This is a handy way to delete docs that slip through the spider filters — or sites that use overly aggressive SEO. You’ll also probably want to filter sites that use poorly constructed pages that for example use the same title on every page (alternatively you could use document heading tags rather than meta title as the basis for your index more on that in a future post).
In the next post, I’ll give an overview of how you give users access to the index, and present results.
Posted by peter, in Search on Jul 05 07 No Comments | Read MoreFor the past few months I’ve been experimenting with information discovery and vertical search. Despite the power of Google and other search engines, it’s still much too difficult to find relevant information. This may change as Google and others begin using profiling information to enhance search results, but I wanted to test a less high-tech approach to increasing relevancy. Primarily, I wanted to test the extent to which controlling context, and influencing user expectations would encourage use of more relevant search terms, and more relevant results.
For example, when you go to Google, you pretty much assume they’ve crawled every site on the web. This influences the way you form your search queries, and your expectations about the results you’ll get back. I wanted to test the extent to which calling a site a “search engine for developers and IT Pros” would influence how users formed their queries, and influenced their expectations for results. In other words, could I provide a “good enough” search platform, and rely on users to enhance their own search experience by structuring context and expectation –creating a superior user experience. Initial results were promising, and hopefully I’ll have a chance someday to take this to the next level. In the meantime, here’s what I learned.
There were a couple of initial ground-rules that were set for the site that eventually became FindITAnswers.com:
I evaluated a range of search solutions both commercial and open source, and fairly quickly settled on Lucene as the search engine of choice. Autonomy would have been an excellent commercial solution for the type of app I was building, and Thunderstone’s Webinator was a close number two, but both were expensive, and Thunderstone’s license forbids its use as a search portal. I’ll talk about Autonomy in more detail in a future post.
I almost immediately ruled-out developing a Lucene-based platform myself since there were several promising projects available including Nutch and Solr. (Solr was open-sourced after the project kicked-off, so Nutch was the only platform evaluated). In the Fall of 2006, Nutch lacked some of the features and functionality required “out of the box” for FIA. That said, it’s a very powerful platform — and given more developer resources would have been an a good choice. A port to either Nutch or Solr is on the FIA roadmap.
Searchblox turned out to be the best choice for my proof of concept. It’s built on Lucene, and many of the tools, utilities and extensions I would have built are part of the product. There are some limitations to Searchblox’s scalability as a search portal — but it’s really not intended to be used this way (I’ll discuss in a future post). As an Intranet or enterprise search solution, it’s totally worthy of consideration.
For hardware I had Central Computer in San Francisco build an AMD64 x2 - based server with 8 gigs of RAM and a pair of RAID drives. The initial set-up was on Windows 2003 Server running Apache and Tomcat. I did have a Linux-based server running for evaluation, but didn’t see any performance edge, and the Win server was easier to administer. As of June 2007, the site is running on a Unix system because I found an ISP, eapps that offered attractive Tomcat hosting plans. The original server is now a mirror.
In my next post, I’ll talk a bit about how FIA was architected.
Posted by peter, in Search on Jul 05 07 No Comments | Read MoreHazy Hot and Humid is an outgrowth of the work I did on www.finditanswers.com, a search engine for software developers and IT Pros. FIA crawled more than 25,000 IT-related sites. I’ve scaled-back FIA due to time constraints, but will continue to update and manage this sub-set of the search crawl focusing on software development, IT and the environment.
There is some functionality that I’ve stop maintaining. In addition to crawling sites, FIA indexed thousands of blogs, podcasts, whitepapers and research reports. I’m continuing to index a subset of the blogs, but the other stuff won’t be maintained. I still have all the sites maintained in the spider database, so it’s easy enough to restart. The reason I’m dropping these, is despite lots of work to filter, re-encode, and normalize the feeds, invariably there was always invalid data from somewhere that caused problems.
Posted by peter, in Search on Jun 30 07 No Comments | Read More##CONTENT##
Posted by admin, in Uncategorized on Jan 01 70 No Comments | Read More##CONTENT##
Posted by admin, in Uncategorized on Jan 01 70 No Comments | Read More
Recent Comments