What is Searchviews?

Searchviews is the company blog of Reprise Media. We impart daily insights on Search Marketing, Social Media and SEO. Read More...

Contact Us

Send us a message at searchviews@
reprisemedia.com


Search

Archives


MyBlogLog - Readers

Search: Innovations

« Previous Entries

Yahoo Holds it Together with Glue

Written By Drupad Sil | May 8, 2008 | Share This |

Yahoo! India

Despite all the questions swirling around Yahoo thanks to its rejection of Microsoft’s buyout offer the company has managed to implement an interesting new way of presenting search results called Glue. Barry Schwartz at SearchEngineLand expands:

 

“The Glue Pages combine classic search results on the left hand column with more visual information related to your query in the middle and right section of the page. The results contain images, videos, articles and more.”

I have to admit, searching on Glue Pages is pretty cool. The modules of information are split between traditional Yahoo directories (finance, maps, news, images) and well-known third party sites (Wikipedia, YouTube, Quick Facts, MonsterTrak) and change depending on the specific query. For example, a search for “Microsoft” pulled up the company’s stock charts from Y! Finance, job postings in India from MonsterIndia.com, and Yahoo! News postings. A search for “Mumbai” pulls up the city’s Wikipedia entry, train information, restaurant locations and phone numbers, and a gallery of Flickr images with the Mumbai tag. Finally, a search for “Ferrari” pulls in a Yahoo! Images gallery, HowStuffWorks.com, relevant eBay items priced in Indian rupees, and a set of YouTube videos tagged with Ferrari. Third party sites can sign up to be placed in a module by emailing the Glue Pages team.

Classic search results are still on the left-hand side of the page, and sponsored links modules also exist. Something no one is mentioning, however, is that the Glue idea is very similar to how Ask.com presents its results. Below are screenshots from Yahoo! Glue Pages and Ask.com searches for “mango”.

 

 

 

 

 

The most noticeable difference is the emphasis given to classic search, with Ask.com placing them prominently in the center, and Yahoo more shunting them off to the side. Glue Pages definitely puts more weight on the module content and consequently each module has more information then the corresponding one in Ask. Furthermore, Yahoo possesses more third-party sources of module content than Ask, with an emphasis on image and video multimedia. Overall, I’d say the conclusion is that while Ask.com has had the right idea, Yahoo’s Glue Pages has gotten it right in this beta.

If nothing else, it allows users to understand a topic they are searching for information on at a glance while creating a high probability of the user finding relevant specific information on the main page in an aesthetically-pleasing manner. For now, Glue Pages is only beta for India, which houses one of Yahoo’s key research and development centers in Bangalore, but between this and Google launching YouTube India just a little earlier, it’s definitely a good time to be a user in that large and rapidly growing market.


The Next Generation of Image Search

Written By Drupad Sil | April 28, 2008 | Share This |

Bad Result

A new innovation in image search may soon prevent this picture from showing up for a query of “mcdonalds”. This is a story that’s gotten quite a bit of coverage today, starting with the New York Times. From the Times:

“On Thursday at the International World Wide Web Conference in Beijing, two Google scientists presented a paper describing what the researchers call VisualRank, an algorithm for blending image-recognition software methods with techniques for weighting and ranking images that look most similar.”

How is this different from what is currently done? Danny Sullivan at SearchEngineLand:

“Image search at the major search engines today relies largely on looking at words that are used around images – on the pages that host them, in image file names and in ALT text associated with them. No real image recognition is done by any of the majors. Search for “apples”, and they haven’t actually somehow scanned the images themselves to “see” if they contain pictures of apples.”

In their paper, Yushi Jing and Shumeet Baluja introduce algorithms that can actually “look” at the image itself rather than the associated text, find similarities, and rank the pages in order of similarity to an original image deemed to be the correct result for the query. In the words of VentureBeat’s Anythony Ha:

“The new system proposed in the Google paper ranks images based not on text, but on the common ‘visual themes’ found in each search result. In the McDonald’s example, the VisualRank system would see that the company’s famous golden arches are a common visual theme, and prioritize pictures that feature the arches prominently. Testes of this new system returned 83 percent fewer irrelevant search results than Google Image Search, according to the VisualRank paper.”

This is definitely a cutting-edge development, if it can be implemented successfully. It will help cut back on duplicate images, but more importantly, will reduce image spam, where photos are tagged inappropriately and show up for unrelated searches.


Google Expands Web Coverage

Written By Drupad Sil | April 18, 2008 | Share This |

GoogleBot

Earlier this week, Google announced its practice of crawling forms on high-quality sites in order to expand its web coverage. From Google’s webmaster blog itself:

“Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.”

Google was also quick to assure webmasters that their crawl agent, Googlebot, could be prevented from crawling specific areas of sites, and would not crawl forms that require password inputs or use terms associated with personal information. The response from the online search community has been generally positive, as the change will allow Google to map slightly more of the deep web, the section of the Internet that is currently unsearchable. From Josh Catone at ReadWriteWeb:

“Last year [Google] ate through 100 exabytes of data, but there’s still a lot that it can’t get access to. Known as the deep web (or hidden web, or invisible web, etc.), it is estimated that the majority of online data is hidden safely from Google’s prying eyes — private intranets, unlinked pages, some non-textual content, and until today dynamic content returned via form input was all inaccessible to the search engine… it is estimated that the deep web is several orders of magnitude larger than the regular, public world wide web. While there is some content that Google will never — and should never — get its hands on, by crawling form results Google is now peering just a little bit deeper into the Internet.”

There are certainly SEO implications of this change. To get a better picture, we turned to our very own Mark Pilatowski, SEO Manager. Mark outlines some main points to be aware of:

“One of the benefits is that Google will be able to crawl and index additional pages that they may not have been able to find before. Since these pages require user input they will be more targeted for a variety of long tail keywords. This has the potential to provide a more personalized user experience for the user who lands on these pages.

It reinforces the need to write relevant and keyword rich content so Google can find important pages using keywords on the site. One of the most intriguing aspects is that Google is grabbing keywords they find on the site and placing them in text boxes to find more content. If they find unique and relevant content using those keywords they will be able to index those pages and theoretically rank those pages higher for those keywords in their own index.”

Of course, the ironic part of this whole change is that while Google is using bots to fill out forms on websites, they frown heavily on anyone who hits their servers with a bot.

From Google’s own Webmaster Help Center:

“Google’s Terms of Service do not allow the sending of automated queries of any sort to our system without express permission in advance from Google. Sending automated queries absorbs resources and includes using any software (such as WebPosition Gold™) to send automated queries to Google to determine how a website or webpage ranks in Google search results for various queries.”

They even hate it when people exhibit bot-like behavior on their site.

While it seems like a case of “do as I say, not as I do”, let’s wait and see how much more of the web this allows Google to process and how it affects the page rankings of high-traffic sites. It’s certainly one small step for mankind into the unknowns of the Internet; the question is, in what direction?


Social Networking: The End of Web Search?

Written By Drupad Sil | April 17, 2008 | Share This |

Faceboogle

An article by Glenn Derene at Popular Mechanics made waves today with its first sentence: Search is dead. The thesis, formulated during a discussion with an anonymous venture capitalist, is that search engines of the future will not be used for pure information gathering and consumption, but rather will index users’ online footprint data to tailor customized results. These results will go beyond just utilizing browsing history to deliver results, as Google and Yahoo already attempt to do, but will also incorporate information from Web 2.0 applications, like a Facebook friend’s movie recommendation, for example. In Glenn’s words:

“But what may turn out to be the strongest signal of all is the footprint you make with your online identity. Consider how much information you voluntarily provide on your Facebook profile. Now imagine if you could combine that with your Netflix renting and Amazon buying habits. Then throw in the suggestions of your friends and the pages you visit the most often. All those various sources of information about you are currently stored in different locations—on your computer’s browser history, on your Facebook page, on the servers for Netflix and Amazon—but just imagine how accurate a search could be if every time you had a query, the mass of data about you that exists on the Internet could inform the results.”

While the idea is not entirely original, it predictably has spawned a great deal of debate. One counterargument was posed by Greg Sterling at SearchEngineLand:

“And while it’s very true that word-of-mouth has moved online and people care very much about what their friends and other contacts think about things, those “recommendations” are not a substitute for search. Indeed, I recently spoke the other day to one of the founders of Socialight, an internet and mobile-social network. One of the interesting things the company has discovered through experience is that people don’t just care about their networks’ recommendations. It turns out — and this is common sense — that expert and top-down editorial content matter equally and in some cases more than what their friends may think.”

Alexander van Elsa points out a couple more issues on his blog:

“Why would such an attempt fail half of the times (or something in that order)? Because it doesn’t take human behavior into account. There are at least two barriers that can hardly be overcome by any computer algorithm or data hog system. First of all, on-line I’m not who I really am off-line. On-line people can have multiple identities, lie about themselves, provide us with profiles that look better than real life… Secondly, a computer algorithm can hardly interpret my mood of the day. Depending on how I feel, what I have experienced earlier, what I’m about to do in the future, the coffee I had for breakfast, etc, etc, I might be looking for different things when I type “I am looking for a car” in the search bar. Chances are that by taking into account my profile information, social graph, interactions on Facebook or any other social network, the “social search” algorithm will be way off.”

While most people will agree that search giants Google, Microsoft, and Yahoo are here to stay for a while longer, there’s no denying that a search engine that could seamlessly combine information and expert recommendations from the web with the preferences of your circle of friends would be an attractive proposition. Perhaps Udi Manber, Google’s vice president of search, phrases it best:

“Search has always been about people. It’s about getting people what they need. The art of ranking is one of taking lots of signals and putting them together. Signals from your friends are better, stronger signals.”

 


Taking Ask to Task: Privacy Groups vs. AskEraser

Written By Sepideh Saremi | January 24, 2008 | Share This |

eraser.jpg

Last month, we reported that Ask.com’s AskEraser expanded privacy options, allowing users to opt out of having their search data tracked. Now privacy groups, including the Electronic Privacy Information Center, are taking issue with AskEraser, calling it “unfair and deceptive” and lodging a complaint with the Federal Trade Commission.

The groups allege that AskEraser isn’t as pro-privacy as it claims, for three reasons (paraphrased): it requires cookie-blockers turned off in a browser for the installation of the AskEraser cookie, which then remember not to track that user; said cookie is a way to identify a user because of time stamps; and Ask can disable AskEraser without notice. Ask.com says they unsuccessfully tried to speak to EPIC before the group filed with the FTC, and that EPIC’s document is inaccurate and outdated. From Wired, which quotes Ask.com spokesman Nicholas Graham:

EPIC’s filing is flawed in the sense that the document they filed is factually inaccurate, and simply shows a fundamental misunderstanding of the functionality of our product. In addition, many of the issues they raise are outdated, while others are completely misguided from the outset, and others deal with changes that Ask.com already made to AskEraser weeks ago, and were subsequently posted publicly on our website.

Changes “made to AskEraser weeks ago” were editing the cookie settings so there’s no longer a time stamp, so at least part of EPIC’s claim is based on an outdated claim.

But what’s more interesting with this issue is Search Engine Land’s point wondering why these groups didn’t lead with the fact that that Ask.com actually does collect some data for its partners, most famous of whom is probably Google. From Search Engine Land:

That’s a far bigger issue, and I’m surprised EPIC didn’t lead with that, rather than the three other points that are easy to take apart. Someone engaging AskEraser probably does not understand or expect that their query and IP address, along with perhaps a unique cookie ID, is flowing over to Google so that Ask can retrieve ads. And they are not reasonably expecting they have to go to Google or another partner to try and delete information there (if they can — they probably can’t).

That’s the big flaw with AskEraser. The complain also notes that those using the Ask toolbar won’t get AskEraser protection, even if enabled. On that point, I think the FAQ is clear enough.

Ask.com is fairly thorough and forthcoming in its AskEraser FAQ, and AskEraser is definitely way ahead of the privacy policies of other engines. What do you think: Are the privacy groups’ claims that AskEraser is “unfair and deceptive” justified?

Further reading: See The Iconoclast for an in-depth explanation of the time-stamp issue, and Techdirt for an interesting take on the privacy groups.


Searchviews: Week in Review

Written By Sepideh Saremi | January 18, 2008 | Share This |

searchviewsLogoLarge.gif

In this edition of the Week in Review, the Facebook news keeps coming, MySpace tries to wrap its arms around the whole child online safety thing, and it might start costing more to watch video online than just to buy it in a store.


Searchviews: Week in Review

Written By Sepideh Saremi | January 11, 2008 | Share This |

searchviewsLogoLarge.gif

This week, we’re launching Searchviews: Week in Review, a digest of sorts in which we’ll highlight the past week’s Searchviews posts, along with noting other top stories in search, social media, and Internet news. Look for it every Friday. Happy weekend-reading:


Wikia Search Launches in Alpha, Slammed with Bad Reviews

Written By Sepideh Saremi | January 7, 2008 | Share This |

wikia.jpg

The long-anticipated search engine from the people behind Wikipedia, Wikia Search, launched today in alpha to mostly poor reviews. Mathew Ingram sums up the blogosphere’s reactions at The Globe and Mail:

Mike Arrington — the editor of TechCrunch, a technology blog that can help to make or break companies with a favourable review — called the service a “letdown,” while the Centernetworks blog described it as “not ready” for prime time. Stan Schroeder, who writes for a popular tech blog called Mashable, said point-blank that Wikia Search “sucks.” Others were even less complimentary.

After a year of hype and $14 million poured into this project, the resulting search engine interface is pretty (and they win cutest search engine logo, for sure) but the actual search results are indeed disappointing: Mashable’s Stan Shroeder notes that the first result for “Wikipedia” is a German listing, and Search Engine Roundtable points to abysmal results for “George Bush.” Andy Beal at Marketing Pilgrim also writes that Wikia Search looks very susceptible to SEO black-hat tricks, which may kill the project outright.

But it’s important to remember that Wikia Search’s human-powered, social-search approach means that the search results pages will be thin until people start using and contributing to the engine (or rather, if they do so). This is not unlike challenges faced by another social search engine, Mahalo, which is faring a lot better than people had expected. In TechCrunch comments, Wikipedia founder Jimmy Wales himself admits/defends the engine’s shortcomings (as does this caveat on the Wikia Search about page), noting that Wikipedia had a similar paltry start, and he says Wikia Search is a project to build a search engine, not the completed search engine itself:

So the comparison to Google on day one is just mistaken. Google didn’t launch a project to build a human-powered search engine, they launched an algorithmic search engine with a clever new idea. So they didn’t have to wait for the humans to come in and start building it.

We aren’t even running with a real index yet, just a placeholder index. Yeah, the search sucks today. But that’s not the point. The point is that we are building something different.

Wikia Search relies on user-written Mini Articles (here’s one about Google), but it’s strange that they don’t utilize the already user-written content from Wikipedia to help fill these out - why reinvent their own wheel rather than take advantage of their massive content base? It would be a mistake to write off Wikia Search outright, so we’re filing this one under sites to check back on.


Google Wants to Read Images, Video

Written By Sepideh Saremi | January 4, 2008 | Share This |

google.gif

Google patent applications published this week reveal the company’s ambition to “read” images and video - i.e., to recognize and understand text in them. This has obvious implications for video and image search, and significant implications for SEO and web accessibility, as search engines currently rely on oft-insufficient alt text, on-page keyword tags and other surrounding text to make sense (or not) of an image on the web.

Because Google has been taking pictures for the Google Maps Street View feature, an image-text reading capability means it gets closer to its goal of indexing the entire world, which would be a boon for its local search capabilities. Google explains in its application:

Digital images can include a wide variety of content. For example, digital images can illustrate landscapes, people, urban scenes, and other objects. Digital images often include text. Digital images can be captured, for example, using cameras or digital video recorders… Image text (i.e., text in an image) typically includes text of varying size, orientation, and typeface. Text in a digital image derived, for example, from an urban scene (e.g., a city street scene) often provides information about the displayed scene or location. A typical street scene includes, for example, text as part of street signs, building names, address numbers, and window signs.

The patents also specifically mention indexing images taken in stores and museums (with robots, natch), which again would have a huge impact on local business and also on education. And of course, video search would get infinitely more sophisticated if Google learns to understand text spoken in videos.

One caveat: Information Week notes that Street View privacy issues will get even more complicated. It’s definitely something to be concerned about; though online privacy is really a thing of the past at this point, violations (perceived or real) of offline privacy will really get people up in arms. But it’s a good bet that’s something that will get ironed out if this innovation comes to pass soon, because it would really change search in a huge way.


New Search Engine Blekko: Future Google Challenger?

Written By Sepideh Saremi | January 3, 2008 | Share This |

blekko-boo.jpg

Look out, Google: Rich Skrenta, co-founder of news site Topix and the Open Directory Project, and the inventor of the first computer virus, has announced he will be taking on the giant with his new search startup, Blekko. Skrenta wrote that he’ll be taking on Google because the web has changed and grown remarkably since Google’s founding a decade ago, and yet it still has very little significant competition:

The web is big. Really, really big. It’s literally billions and billions of pages. It’s Carl Sagan big. And it’s doubling in size every year or two.

So the idea that what you can see in positions 1-3 above the fold on Google are the sum of what the web has to say about every possible query is crazy.

And yet they have 85%+ market share, and little effective competition. At the same time there is such a fabulous business in search. It’s the highest monetization service on the web, by far. Why does this Coke have no Pepsi?

Skrenta is not revealing much about his strategy for taking down Google, but he has written that he doesn’t like PageRank. Blekko thus far consists of six people and $2 million in funding, and TechCrunch reports that there may not be a public prototype until 2009. Though it’s hard to imagine Google bested in search, it was once a little startup too, so it’s not impossible that another company could eventually claim part of its significant market share. And as Arrington at TechCrunch points out, Blekko may not need significant market share to be a success, as even 1% of the search market is said to be worth $1 billion.


« Previous Entries