Tuesday, June 10, 2008

Powerset Factz

Although I've now left Inxight to join Mu-Gahat (a leading manufacturer of RFID and gaming technologies), I came across an email today mentioning Powerset Factz, which are aggregated from across Wikipedia articles to summarize a topic. When you click a word in Factz, Powerset will show you the sentence it was derived from.

So I tried a few Mu-Gahat-related queries.

For example, the query on Powerset "who invented RFID?" resulted in no Factz, but did inform me that "In 1946 Léon Theremin invented an espionage tool for the Soviet Union which retransmitted incident radio waves with audio information." So not entirely bad.

A query on "playing card" resulted in these Factz:
beat: record.
flip: card.
show: suits.
depict: figures.

Hmm. Not very useful.

However, I then tried a search on everyone's favorite -- Steve Jobs. Jackpot! Over 123 Factz generated including:
  • introduced: iPod, service, Mac OS X, AirPort, ibook, QuickTime TV, device, iMac, powerbook, suffix, Apple II, generation, mini, computer, duo, iPhone, MacBook Air, application, GarageBand, Macs, original, and Macintosh.
  • founded: Apple, company, NeXT, Apple Computer, Pixar, markets, and Apple Inc.
  • announced: music, iPhone, Safari, iPhone, change, widgets, ski, Fi, App Store, charge, device, Developer Transition Kits, plans, partnership, Halo, deal, models, and computer.

So I think this technology shows some promise, but still has a long way to go (introduced suffix? Is that a new OS? )

Oh, and just for fun, I tried one more:

Who is the mole?

Alas, Powerset has not yet developed Poweset Psychicz. Maybe next year.

Thursday, September 27, 2007

Welcome Back!

Welcome back from your summer vacation and welcome back to my blog.

Today's thought is short:
  • Yahoo!'s mission is to connect people to their passions, their communities and the world's knowledge.
  • Google's mission is to organize the world's information and make it universally accessible and useful.
  • Business Objects' mission is to transform the way the world works through intelligent information.
Business Objects is the leader in organizing structured information (and, now, with the acquisition of Inxight, becoming capable of organizing text as well). Their new On Demand service? A nice complement to either Google or Yahoo's offerings.

Ergo, Yahoo or Google should merge with Business Objects?

Discuss!

(The views expressed in this blog are my own and not that of my employer)

Thursday, June 14, 2007

Searching for A Living?

I've been doing a lot of searches lately, looking for information and new companies involved in a wide variety of topics (social networking, entity extraction, content policing, business intelligence, etc etc).

Maybe I should go work for Mahalo.

Monday, June 11, 2007

Random Thoughts Again

1. First off, I was wrong about salesforce.com and google. I guess they were being coy because the announcement they were about to make was so boring.

2. Mark Andreessen makes an excellent point about Microsoft hiring for logic puzzle ability:

For example, a classic Microsoft interview question was: "Why is a manhole cover round?"
The right answer, of course, is, "Who cares? Are we in the manhole business?"

I had a similar experience interviewing at Google, where the question was something about fitting a 2-mile long runway into a city where you'd been told you could only have one mile to put it in. My first response was to figure out why the city would only give me that much, whether there were alternate spots in the city to explore, etc. Then, once they got frustrated with that line, I started thinking up silly ideas like putting the whole runway underground or building an elaborate 'in the sky' structure.

I'm not sure if they didn't hire me because of that non-engineering answer, the fact that the Palm Pilot ("does one thing well") at the time was my favorite product (It was only in an interview several months later that Marissa revealed the correct answer was "swiss army knife"), or the fact that I didn't have a PhD in engineering.

3. My final random thought: The "Attention Crash" gets the attention of Valleywag, Mark Andreessen, and Steve Rubel. "We are reaching a point where the number of inputs we have as individuals is beginning to exceed what we are capable as humans of managing." Oh, so true.

Thursday, May 31, 2007

Blogging Peter Norvig's Talk

I've never done a live blog report before, so I thought I'd try my hand at it. However, I'm not going to do one of those transcript blogs where I try to write down everything that's said, I'll just try to capture what I think is interesting.

I'd first like to point out that the Google conference music is far cooler than the music at the salesforce.com conference .

Poor Peter ended up violating one of Guy Kawasaki's first rules of speaking - try to speak in a tiny room so that the crowd feels more intense. People have started trickling home already, so the crowd is lighter than one would expect.

"Why do you want to go to Google?"
"That's where the data is"

The rise of probability models
Percentage of ACL Papers with statistical/probablistic concepts in the title:
1979: 0%
1989: 6%
2006: 55%

Size of training corpus is far more important than the algorithm applied. Duh.

The LDC corpus contains about 100 gigabytes of speech -- but the internet contains about 100 trillion words (10^14). Google's LDC N-gram corpus contains 1,024,908,267,229 tokens and 95,119,665,584 sentences. It's for sale in a lovely 7 CD set. As I've always said, if anyone is going to make machine learning work for extraction, it'll be Google.

I wonder if one could identify potential terrorists using Google sets...Hmm. A set that starts out with only Osama bin Laden as the root identifies "US Presidential Candidates", "Bill Clinton", "George W. Bush", "Taliban Islamic Movement", "Ayman Al Zawahiri", "Terrorism", "John Gotti", "War", and... "Bob Hope". (When given two names, say, Osama bin Laden and Ayman Al Zawahiri, it did a much better job)

Interesting comparison of most commonly occuring unique terms in particular categories (such as "drugs") vs. most common queries in that category. There is a huge mismatch.

Many things he has spoken about before -- statistical machine translation, human NLP techniques vs machine learning, etc. On machine translation, Chinese is one of the hardest, which is borne out by the ACE test results I've seen. He gave an example showing how incredibly hard translating Chinese is, especially because you need to take into account complex word sequences.

He discussed all kinds of technical nuances and tricks in terms of bit representations, lexical co-occurrence representations, etc. They did a series of experiments that showed that truncating words at 7-8 characters is almost as good as true stemming. Truncating at 4 characters is actually better than true stemming at capturing meaning. (?!?) That's going to bother me for awhile.

He also discussed that better models would be based both on the writer and the searcher and the interaction between the two. (Of course, Google and the other search engines have the advantage of seeing both.) But he didn't go into this in any more depth than that.

Lots of interesting questions. One of the most interesting for me personally was around whether Google was investigating predictive analytics in terms of, say, "reading" financial information and being able to predict future stock performance. The answer Peter gave was "no", which either means he's being secretive or that Google has missed a really interesting and cool application of technology. This seems to point up that in "organizing the world's information" Google is still not really organizing all of the world's information.

They also were asked about their current focus on organizing and analyzing textual information. Peter indicated that one of their future forays will be in image analysis, both in photo and video, now that they have a huge library of those to work from in doing machine-generated image analysis.

Someone asked if there were plans to produce an open Google API that webmasters could use to automatically stop comment spam. Peter said no, but said he found the idea highly intriguing. So do I; sounds like an interesting viral marketing method "Comments protected by an anti-spam filter powered by XYZ".

Another question was around Google's efforts to measure true user satisfaction - IE, while they can understand whether result #3 got more clickthroughs than result #1, they have no real way of knowing if when I click on result #1, did I really like it when I got there? He said they tried having a Google toolbar that would let you rate the results, but people in general only give rankings when they don't like a result, so that wasn't very useful.









Everything Old is New Again

Mahalo launched its alpha this week.

"Mahalo is the world's first human-powered search engine."

Uh, isn't this how Yahoo started?

I'm beginning to feel old.

Google Developer Day

I'm at Google Developer Day and have a few random thoughts thus far:

I never knew there were so many people that were so passionate about maps and mapping technologies.

Between today and last week's salesforce.com developer's conference, I've learned that all the "hard" problems to solve of yesteryear should no longer be hard problems -- between the two vendors, good tools now exist for creating offline access, creating and testing mashups online, building AJAX user interfaces, creating hosted applications, and so forth.

Google's sense of design and branding still rock. Is it a coincidence that both Google and Apple believe in the power of a clean white background?

Sergey Brin is still weird. His talk was mostly about how the internet is making babies that now are creating the internet. "Mosaic started in '93, the first dating sites 2 years later. So the first babies from connections made on those dating sites are now 12 -- old enough to create mashups." Uh, yeah.

Is "Wonder Boy" more or less authoritative than "Sr. Director of Product Management"? And in what contexts?

No matter where I go, I can't avoid running into at least one Inxight partner.

And, the most poignant thing I've taken away thus far?

  • "Mapplets" is a really fun word to say.

I'm looking forward to Peter Norvig's talk later today.

Tuesday, May 22, 2007

A Retraction...

After a few weeks of consideration, I've decided that Business Objects has the best marketing -- ever!

And this has nothing to do with the fact that they've just announced their intent to acquire the company at which I work (Inxight).

Seriously, the marriage should be worthwhile for both companies. I've been touting for awhile the notion that in order to truly organize the world's information, you need both structured data and unstructured data. In order to truly have business intelligence, you can't just look at numbers; otherwise, you'd just have a lot of computers sitting in offices as opposed to highly paid executives. (Hmm...)

If we posit that text analytics (Inxight) organizes the world's unstructured information, and that traditional business intelligence (BOBJ) organizes the world's structured information, what challenge does that open?

Making it accessible and useful.

I'll be thinking about this for awhile. I probably won't be blogging about it for awhile, to prevent revealing information I'm not going to be able to share.

Bring it on!

Monday, May 21, 2007

Salesforce.com and Google?

Everyone's heard the rumours by now about salesforce.com and google. Although some have surmised some sort of deeper "partnership", Mark Benioff appeared positively giddy at the salesforce.com developer's conference today.

All signals to me point to an out-and-out merger. Combine the largest infrastucture in the world (google) with a world-class applications development environment (Salesforce) with a large quantity of "the world's information" (salesforce) with aspirations to create a wholly scalable hosted database (google and Salesforce) and you open up a whole host of possibilities for both companies that neither could do quite as well on their own.

The question is - will salesforce's search functionality still suck?

A Day In the Information Life

I thought it might be fun to document a typical day in my information life. Some details have been changed to protect the innocent.

Original goals for today:

  • Dress appropriately for the weather. Accomplished.
  • Find a mid-length, A-line, floral cotton lawn skirt. Not accomplished.
  • Figure out what the heck is going on today with strategic initiative X. Accomplished, I think.
  • Reorganize files to make it easier to find things – both for myself and others. Accomplished, I hope.
  • Follow up on two big contracts-in-progress. Accomplished.
  • Attend BOBJ webex presentation to gather info. Half-accomplished. I got interrupted and have to go visit the site to view an archived version tomorrow, when it’s been posted.
  • Book hotel and airfare for SLA trip to Denver. Accomplished.

I walked in to overhear our lawyer talking about some problem with distribution that was causing an issue with a contract. I spoke to him and followed up in person with two people, got some information, and walked around and updated two more people. Along the way, I found out more information about the status of the contracts.

Since the weather is turning warmer for the last two days, I’ve been looking for a mid-length, A-line, floral cotton lawn skirt. Whenever an email comes in to my Yahoo mail account advertising some sale on clothing, I’ve been checking out the site looking forone. As a note, I have never purchased beachwear, formal wear, sleepwear or casual shoes online, so it drives me nuts to see ads on these things. There’s no good way to shop across sites for clothing items I need – Froogle, eBates, nothing works for this. It’s very frustrating. Measure of success: Find whatever clothing item is of interest to me (summer skirts in the spring, sweaters/wool skirts/boots in the fall) and present it to me so I can order and receive!

Interested in keeping tabs on product announcements, M&A activity, customer wins, and executive job changes involving our competitors and the company that is rumored to be buying us. Also interested in learning about new companies entering our space (unstructured data management). Today, I get this information through Google News and Blog alerts – keywords with names of our competitors and also the phrases “entity extraction” and “unstructured data management”. Sometimes if information is new and exciting, I email it to our “market watch” and/or “exec staff” mailing lists through Outlook. Today I visited the google blog, google enterprise blog, and google tech blog. Nothing of interest there.

I visited the ClearForest site yesterday to see if I could learn more about the Reuters acquisition. (I have a standing Google alert for ClearForest). I also did a google Blogs search, since I don’t have the blogs set to search (I should fix that).

I also comb through sites of our competitors sometimes to see if they have new product data sheets, webinars, white papers, etc. This is tedious, and I’m lucky if I remember to do it once every few weeks, when I have spare time. Some I have on a “page watch” from SD Awareness Server. If I find a new one, I download it, read it, and file it on pubshare for others to see.

Sometimes I annotate them using Adobe Acrobat Pro before I save them. Same for analyst reports of interest (which I generally get forwarded to me from our PR person, who has the analyst login information)

Want to know what accounts our sales team is working on most frenetically, pipeline status, etc. I check this through salesforce.com. Sometimes I cross check salesforce when I get a news alert on an “unstructured data” company – to see if we’re already calling on them or not. When I find an overlap, I email the appropriate salesperson.

In response to a late-night email sent by our CEO, I had to do a call with three execs today. One of them was in the office, the other two weren’t. I could find one guy’s cell but not his home number in Outlook, so I called his cell and then he gave me his home number. The other guy had to give me an alternate land line number, which I wrote on a sticky note. I took notes on that call in my paper notebook, then emailed a summary to them. Later, I had a physical meeting with two other people about it, took more notes in my notebook, and emailed the others about the discussion later. I also worked on revising a presentation related to this area and emailed that presentation to the execs. In a related development, I then visited the Basis, Teragram, Attensity, and ClearForest sites to see if they have offerings for this initative. I copied and pasted the relevant information into an email and sent out.

As another followup, I needed to know what verticals/applications each of their stage 2-5 accounts were in. Some of them were already in salesforce, but for a lot, I had to copy and paste the name of the company into Google, find the company website, and read what they were involved in. I put this information into an excel spreadsheet I was putting together (I had exported the original info from salesforce), but I didn’t actually enter it into the right fields in salesforce; would have taken too long to do that. Wouldn’t it be nice if somehow that field would auto-populate based on the “about” information on their website or something.

I visited expedia to check on airfares to Denver and then booked my flight directly through United.com (after checking out iflyswa.com. I thought maybe Southwest would be cheaper, but no) I then went back to expedia to look up hotels, found a few, went to Trip Advisor to read reviews on them, and then booked a hotel directly with Fairfield Inns. Back to expedia again to check rental car rates, ended up booking directly on Avis.com. This sort of pattern happens every time I plan a trip, usually once or twice a month.

There are a bunch of mails flying into my Outlook today with people complaining about a listserv that I am on being newly “membership restricted”. I should really set an auto-delete on these today, since I keep deleting them without reading them, but I’m too lazy to set that up.

I also spent a lot of time today reorganizing my files in an attempt to make them easier to find. All Inxight-related artwork in one folder, all data sheet source files in another, all presentations in a third. I started out organizing things by topic, but I find more and more that I tend to think of things in terms of filetype first, then topic. Too bad no search engine I have does a good job of finding things. Outlook search is hopeless, and for some reason my Google Desktop search won’t find emails newer than last November. I played around with Koral desktop for awhile; I thought it might be cool, but now that they’ve been bought by salesforce, I’m less enthusiastic. Plus, the free version didn’t have auto-tagging, I had to manually tag. I also reorganized the same fileset on pubshare, making sure to duplicate a lot of stuff on my hard drive that no one else has access to.

I listen to Pandora every day, which recommends new music to me on-the-fly based on what I thumbs-up or down.

I evaluated the UI of an application we’re developing (got a notice in my outlook of its location). Then wrote down my comments in outlook and emailed them to the lead engineer and lead PM.

I also read on Guy Kawasaki’s blog about a new book by Seth Godin about “The Dip”, so I had to go to Amazon.com to buy that. Guy also wrote about a presentation contest on Slideshare.net. I had to go check out the winners, and after seeing the techniques they used, went into powerpoint and reworked my whole presentation to be more “cool” looking.

Oh, and our DirecTivo went belly up last night, so I helped the spousal unit look up error codes. He ended up buying a refurb unit from weaknees. He was going to buy from eBay (where I have standing searches for Disneyland memorabilia, James Foley poetry, and other silly things), but it would have taken too long for the auction to close and then get delivery.

Every day, I go to mercurynews.com to read three comics: Dilbert, For Better or For Worse, and Luann. I visit uexpress.com to read Dear Abby most days, and “Focus on the Family” once a week. I also stumbled on an ethics column there I think I will read every week as well.

I visited MSNBC.com to check up on the latest news. This augments my Google news headlines gadget that I have on my iGoogle page. Although I also have a weather gadget, since my home machine doesn’t auto-login, I don’t use it at night to check the weather. I go to weather.com every night if the weather seems unsettled (ie, not in the middle of summer, but most other times), so I can pick out what clothes I want to wear the next day.

Interesting factoid – I almost never use bookmarks. I either type from memory or use Google to find something again.

I needed to find out how long it would take my mother-in-law to drive from her house to mine, since she was supposed to be at my house at 6 and when I called her at 5:30 she hadn’t left yet. Google maps told me I had nothing to worry about.

My husband IM’d me using Yahoo about car repairs he had done today.

I got a Yahoo IM from a former colleague today asking me if I was free for lunch; I checked my Outlook calendar and I wasn’t, so we scheduled a follow-up phone call instead. It reminded me that I hadn’t checked out his blog in awhile, so I checked it out. Then that reminded me of another blog I hadn’t looked at in a while (Mark Logic CEO blog), so I went and looked at that. Then I read Valleywag, learned about yet another search engine company, and went to go look at their beta site as well as looking at Techcrunch, which was also mentioned in the article.

Later that day, the colleague called me and we chatted for about 40 minutes about a book he’d given me to read and about future opportunities. I took notes in my paper notebook.

I got a Yahoo mail instant alert that an email had come in (I get these alerts every time an email comes in). I glanced at it, saw that it was from my older son’s school, and clicked on it to read it. The email was about participating in a parade that Saturday, so I IM’d my husband to remind me to ask them if they wanted to participate.

Every Thursday, I get a Yahoo email from the public library telling me what library books are due on Saturday. They also alert me when a book I have on hold (which I have to reserve directly thru their website) comes in.

Whew!