Thursday, February 18, 2016

The N questions you always need to answer for any research project, all the time

Especially when it's a new project, people will always ask you a lot of questions:

- What's your Research Question? (similarly, what's your Hypothesis?)
- What's your Contribution?
- Why is it Research?
- Who will it help? (or, who cares?)
- What are you doing?
- What problem are you addressing?
- Is that a real/important problem?
- Why will this solve that problem?
- How is your work different than (any one of 1000 tangentially-related things)?
- How is it done today, and what's wrong with that?
- How will you do it?
- How will you evaluate it?

It really helps if you can answer them all, all the time. You will get instant cred and people will let you do your thing. Unfortunately, it's kind of like air bubbles in a plastic sheet: when you squeeze some of them out, then some of them reappear. Like, if you nail down "how will you do it?" then people will ask "well if you know how to do it, then why is it research?" And if you nail down "Who cares?" down to a small subset, then people will ask "Is that an important problem?"

I'm going to order them from most to least important, in my mind: (note! I am not a grant funder.)

1. What are you doing? (Please, be concise. You get one sentence. Now try it in three sentences.)
2. What problem are you addressing? -- only if it's not obvious.
3. How is your work different than (100 closely-related things)? This is an important question if someone is actually bringing up something that they think is the same thing. This is not an important question if someone is just trying to sound clever.
4. Is this problem a real problem? -- downgraded because in HCI we solve lots of non-real-problems. And it's hard to tell what's a "real problem." If you mean "is it malaria?" then no, we're not solving malaria. You can always play problem-one-upmanship, and it's usually not a fun game to play.
5. Why will your work solve this problem? -- only worth asking if it's not obvious.
6. How is it done today, and what's wrong with that? -- downgraded because usually the answer is "it's not done today."
7. How will you evaluate it? -- again, sometimes it's hard to know until you do it.

These are sometimes not worth asking but people will anyway:
... 10. What's your Research Question or Hypothesis? -- This is valid for some kinds of research, like psychology. This is less valid in the more inventor-ish types of research. People will still ask it anyway.
11. How is your work different than (900 not-really-related things)? -- Sometimes people will ask this to try to sound smart.
12. How will you do it? -- if I knew, it wouldn't be research, would it? Still, people will ask this, and it helps to be able to wave your arms.

These are often not worth asking but people will anyway:
... 100. Why is it Research? -- Ugh. Academics love to ask this. Basically, why aren't you starting a company and doing this? And "because it's goddamn hard to start a company" or "this should be done but nobody wants to pay for it" don't count.
101. What's your Contribution? -- This is a thinly veiled version of "Why Is It Research?"

But yeah, I guess if you want to be good at research, answer all of them all the time.

Monday, February 8, 2016

Welcome to Domo

EDIT: the Domo server is now shut down. If you want access to any of this data, ask me to put you in touch with Shuguan Yang and Sean Qian who are running a server that has this data now. Or, ask me about the S3 bucket that it's all stored in.

I've told this to a lot of people so I've decided to store it all in one place. This guide will range from super-basic to kinda-complicated, so apologies if it's obvious in parts, and apologies if you get lost in parts. ALSO, if you're reading this and you're not new to our group and/or server, then you may have some advice for me, and I'd appreciate it!

Domo is our Amazon Web Services server. It's named after this guy:

On Domo, we have some coordinate geotagged tweets in some cities: (all stored in PostgreSQL database "tweet")
Pittsburgh: since Jan 22, 2014. (table tweet_pgh)
SF, NY: since June 13, 2014. (tweet_sf, tweet_ny)
Houston, Cleveland, Seattle, Miami, Detroit, Chicago, London: since November 7, 2014
Minneapolis: since March 18, 2015
San Antonio, Austin, and Dallas: since June 15, 2015
(everything after SF and NY is stored in tweet_(cityname)) where cityname is lowercase, all one word)
We also have tweets in Pittsburgh beyond just coordinate-geotagged tweets, in table tweet_pgh_all.
We also have Instagrams in Pittsburgh from fall 2014 to May 2016 (when Instagram shut off access to public geotagged Instagrams.) - table instagram_pgh
And some flickr photos and other misc data sets. (not in PostgreSQL; in /data/datasets/)

We really only interact with Domo via terminal windows, so if that's not your forte, you may have some difficulty. To log in, use "ssh (your username on Domo)@(Domo's hostname)"
If you want to make it easier, you can open ~/.ssh/config and add an entry:

Host domo
Hostname (Domo's hostname)
User (your username on Domo)

We store the tweets in PostgreSQL. If you've used other SQLs, it's pretty similar, but not the same. Things to know about Postgres and our DB in particular:
  • psql tweet to connect to our database (which is called "tweet").
  • \d to list all relations (aka tables, kinda)
  • \d tablename to get more info about a certain table.
  • The tweets go in basically direct from the Twitter 1% public feed (using this script). They're all stored as text and integers except for some things that are "hstores" - basically key-value sets - and the "coordinates", which are stored using PostGIS as Points.
  • To access those Points, use some of the PostGIS functions. For example, SELECT ST_AsText to get it in a semi-readable format. ST_AsGeoJSON has been the most useful for me.
  • To query all tweets within an area: SELECT * FROM tweet_pgh WHERE ST_MakeEnvelope(-79.9, 40.44, -79.899, 40.441, 4326) ~ coordinates; 
    • (that "4326" is, for current purposes, a magic number. It means EPSG 4326/WGS-84 which is pretty much a standard for everything I do. So I always just leave it as 4326, and if you don't know better, I suggest you do too.)
Things to know about Domo:
  • Change your password right away. Do this by typing "passwd" after you SSH in.
  • Don't store things in your homedir! Our whole homedir partition only has about 8Gb. Obviously, that fills up fast. Store anything you can in /data - that has 1Tb. I might bug you sometimes to clean up your homedir if you end up using a lot of space.
When I add you to Domo, I'll tell you:
  • your username on Domo
  • your temporary password (change this as soon as possible)
  • Domo's hostname (not shown here so we get attacked as little as possible)
You should tell me:
  • if you want a username that's different than your email address, tell me ASAP and I'll create that and delete your old one.
  • your github username so I can add you to our github organization.
Dan's note to himself:
  • give the new person an account with sudo adduser username
  • give them a postgresql account (CREATE USER username;) and give them permission to read all the tables (GRANT SELECT ON ALL TABLES IN SCHEMA public TO username)
  • get their github username and give them access to the CMUChimpsLab organization too.