A serious sin in science -- because it slows the production of new knowledge and technology -- is to be overly polite in pretending that all theories or hypotheses are equally good or any good at all. Science curricula from the undergraduate stage onward should teach students how to recognize good from bad theories and hypotheses because theories and hypotheses anticipate new knowledge, and science is about the gaining of that knowledge.
The following short list of criteria builds primarily upon the work of Lakatos (1970) but is tailored to current practice in biological oceanography in particular and ecology in general. I share with Lakatos the view that science is a three-cornered fight among observation and rival theories -- that in practice theories or hypotheses are not rejected until better ones come along. Another way to say it is that theories or hypotheses are best evaluated ordinally, by comparison with competitors. Nevertheless, some explicit criteria make the comparison easier.
To avoid potential confusion, because the term usually is used pejoratively, I need to explain that "excess" empirical content is a desirable characteristic. This first determinant of value in any new or old hypothesis is whether it predicts phenomena that have not been observed heretofore. The criterion easily can be relaxed and extended to phenomena that have been observed before but ignored or swept aside for lack of explanation. Classic examples of the latter in oceanography are anomalous values in sparse offshore sections, often attributed at the time to measurement errors but later attributable to the existence of mesoscales eddies. If the hypothesis does not predict new phenomena that can be observed, it may still have minor heuristic value (like a mnemonic helping one to remember how to cluster previously learned "facts" or observations and thereby making them easier to teach or learn) but has little prospect of yielding new knowledge. Lakatos (1970) calls hypotheses without excess empirical content "ad hoc 1."
The second determinant of value is whether tests reveal the excess empirical content to be true or false. Clearly lacking in any utility is the theory or hypothesis all of whose new predictions are falsified, and clearly best is the converse. Those that fail completely are called "ad hoc 2" by Lakatos (1970). Problematic but still valuable are those theories or hypotheses that make some insightful (corroborated) predictions but fail elsewhere. They merit close scrutiny to determine whether successes are coincidental or whether some minor or major change of the hypothesis might yield greater success.
The third criterion is perhaps most difficult to explain and also most frequently and severely violated in ecology, so please pardon the protracted explanation. Hypotheses or theories that fail this criterion are called "ad hoc 3" by Lakatos (1970); more commonly they are called empirical, semi-empirical, formal, arbitrary, or simply ad hoc.
One access to the problem is through the etymology of "hypothesis." To have a hypothesis (literally, a smaller idea drawn or deduced from a larger one), one must have a thesis. A hypothesis drawn from the air, without connection to a bigger idea, is of less value than one drawn from a bigger idea because more predictions can be drawn from the bigger idea and because the hypothesis is a partial test, usually at a more feasible scale. A second access to this criterion comes directly from Lakatos' (1970) explanation of a requirement for continuous growth of knowledge from a successful research program. A research program that fails this criterion uses "patched-up, unimaginative series of pedestrian `empirical' adjustments which are so frequent, for instance, in modern social psychology [and biological oceanography or ecology]. Such adjustments may, with the help of so-called `statistical techniques,' make some `novel' predictions and may even conjure up some irrelevant gains of truth in them. But this theorizing has no unifying idea, no heuristic power, no continuity. They do not add up to a genuine research programme and are, on the whole, worthless."
There is strong potential for self deception about achieved understanding when dealing with a nonlinear, feedback-rich, statistically nonstationary system, such as any ecosystem. It may be helpful to step outside of ecology and consider another such complicated, explicit system. Take the stock market, which has some characteristics in common with ecosystems. One can come up with "explanations," after the fact of a market adjustment, crash or rise. One can make limited "predictions" from past behavior based either on explicit variables or neural-network learning of that past behavior, but can one make "predictions" grounded in understanding of mechanism? Would you bet your fortune on them?
One frequent problem is the confusion of good statistical null hypotheses with good ecological hypotheses. There is no mapping between the two, just as there is no equation of statistical significance with ecological importance; it is very easy to have one without the other. Statistical hypotheses are not evaluated on criteria 1-3 above, whereas scientific hypotheses are. A prime criterion for statistical null hypotheses is that they can be rejected with data that have been or can be collected and with the statistical tests in hand (sufficient statistical power). It is easy to confuse the ability to reject statistical null hypotheses with falsifiability in scientific hypotheses, which supersedes criteria 1-3 above in "naive" methodological falsificationism (Lakatos'  somewhat pejorative term). Naive falsification is the outdated philosophy of science that associated the most rapid scientific progress with the most rapid rejection of hypotheses or theories (whether or not there were any hypotheses or theories remaining after such rejection). Lakatos (1970) convincingly demonstrates by historical evaluation of many examples that in a progressing science theories or hypotheses are not rejected until better ones comes along.
Proposals in biological oceanography and ecology often comprise a swarm of loosely connected "gnats" or "strawperson" hypotheses that the proposal is designed to swat or do away with. Some are resurrected routinely, even within the same proposal, only to be killed again. For example, take the (null) hypothesis that organisms behave as passive particles. It clearly has some merit in the form of predictive ability and underlies the definition of plankton. It has been tested and rejected countless times in specific combinations of organisms and environments, only to be resurrected again. It still appears routinely in proposals. What would be more productive (might constitute genuine and more continous advance) after so many rejections are explicit theories and hypotheses that suggest particular departures from passive behavior in particular kinds of species and environments (e.g., based on environmental and sensory constraints).
The good news associated with this evaluation of good versus bad hypotheses is that for the practitioner of science (peer reviewer) the most effective criticism of a hypothesis constitutes construction of a better one -- compatible with extant observations and also predicting additional, interesting phenomena. Peer criticism of hypotheses thus ideally is in the form of such an explicit alternative hypothesis. Whereas any hypothesis can be criticized on the basis of the criteria presented above, the fact that hypotheses and theories are not rejected but replaced makes contructive criticism by far the most effective and honest means of peer review.